zeek/doc/user-manual/scripting.rst


=========
Scripting
=========

.. toctree::
    :maxdepth: 2
    :numbered:

    Understanding Bro Scripts
    The Event Queue and Event Handlers
    The Connection Record Datatype
    Data Types and Data Structures
    Scope
    Global Variables
    Constants
    Local Variables
    Data Structures


Understanding Bro Scripts
=========================

Bro includes an event queue driven scripting language that provides the primary means for an organization to extend and customize Bro's functionality.  An overwhelming amount of the output generated by Bro is, in fact, generated by Bro scripts.  It's almost easier to consider Bro to be an entity behind-the-scenes processing connections and generating events while Bro's scripting language is the medium through which we mere mortals can achieve communication.  Bro scripts effectively notify Bro that should there be an event of a type we define, then let us have the information about the connection so we can perform some function on it.  For example, the ssl.log file is generated by a Bro script that walks the entire certificate chain and issues notifications if any of the steps along the certificate chain are invalid.  This entire process is setup by telling Bro that should it see a server or client issue an SSL HELLO message, we want to know about the information about that connection.

It's often easier to understand Bro's scripting language by looking at a complete script and breaking it down into its identifiable components.  In this example, we'll take a look at how Bro queries the Team Cymru Malware hash registry for detected downloads via HTTP. Part of the Team Cymru Malware Hash registry includes the ability to do a host lookup on a domain with the format MALWARE_HASH.malware.hash.cymru.com where MALWARE_HASH is the md5 or sha1 hash of a file.  Team Cymru also populates the TXT record of their DNS responses with both a "last seen" timestamp and a numerical "detection rate".  The important aspect to understand is Bro already generates hashes for files it can parse from HTTP streams, but the script detect-MHR.bro is responsible for generating the appropriate DNS lookup and parsing the response.


.. literalinclude:: ../scripts/policy/protocols/http/detect-MHR.bro
   :language: bro
   :linenos:

Visually, there are three distinct sections of the script.  A base level with no indentation followed by an indented and formatted section explaining the custom variables being set (export) and another indented and formatted section describing the instructions for a specific event (event log_http).  Don't get discouraged if you don't understand every section of the script; we'll cover the basics of the script and much more in following sections.

.. literalinclude:: ../scripts/policy/protocols/http/detect-MHR.bro
   :language: bro
   :linenos:
   :lines: 7-11

Lines 7 and 8 of the script process the __load__.bro script in the respective directories being loaded.  In a full production deployment of Bro it's likely that these files would already be loaded and their contents made available to the script, but including them explicitly ensures that Bro can be run in modes such as "bare" and still have scripts that reliably work.  Consider it a "best practice" to include appropriate @load statments when possible.

.. literalinclude:: ../scripts/policy/protocols/http/detect-MHR.bro
   :language: bro
   :linenos:
   :lines: 12-24

The export section redefines an enumerable constant that describes the type of notice we will generate with the logging framework.  The notice type listed allows for the use of the NOTICE() function to generate notices of type Malware_Hash_Registry_Match as done in the next section.

.. literalinclude:: ../scripts/policy/protocols/http/detect-MHR.bro
   :language: bro
   :linenos:
   :lines: 26-44

The workhorse of the script is contained in the event handler for log_http.  The log_http event is defined as an event-hook in the base/protocols/http/main.bro script and allows scripts to handle a connection as it is being passed to the logging framework. The event handler is passed an HTTP::Info data structure which will be referred to as "rec" in body of the event handler.

An if statement is used to check for the existence of a data structure named "md5" nested within the rec data structure.  Bro uses the "$" as a deference operator and as such, and it is employed in this script to check if rec$md5 is present by including the "?" operator within the path.  If the rec data structure includes a nested data structure named "md5", the statement is processed as true and a local variable named "hash_domain" is provisioned and given a format string based on the contents of rec$md5 to produce a valid DNS lookup.

The rest of the script is contained within a when() block.  In short, a when block is used when Bro needs to perform asynchronous actions, such a DNS lookup, to ensure that performance isn't effected.  The when() block performs a DNS TXT lookup and stores the result in the local variable MHR_result.  Effectively, processing for this event continues and upon receipt of the values returned by lookup_hostname_txt(), the when block is executed.  The when block splits the string returned into two seperate values and checks to ensure an expected format.  If the format is invalid, the script assumes that the hash wasn't found in the respository and processing is concluded.  If the format is as expected and the detection rate is above the threshold set by MHR_threshold, two new local variables are created and used in the notice issued by NOTICE().

In approximately 15 lines of actual code, Bro provides an amazing utility that would be incredibly difficult to implement and deploy with other products.  In truth, claiming that Bro does this in 15 lines is a misdirection; There is a truly massive number of things going on behind-the-scenes in Bro, but it is the inclusion of the scripting language that gives analysts access to those underlying layers in a succinct and well defined manner.

The Event Queue and Event Handlers
==================================

Bro's scripting language is event driven which is a gear change from the majority of scripting languages with which most users will have previous experience.  Scripting in Bro depends on handling the events generated by Bro as it processes network traffic, altering the state of data structures through those events, and making decisions on the information provided.  This approach to scripting can often cause confusion to users who come to Bro from a procedural or functional language, but once the initial shock wears off it becomes more clear with each exposure.

Bro's core acts to place events into an ordered "Event Queue", allowing event handlers to process them on a first-come-first-serve basis.  In effect, this is Bro's core functionality as without the scripts written to perform discrete actions on events, there would be little to no usable output.  As such, a basic understanding of the Event Queue, the Events being generated, and the way in which Event Handlers process those events is a basis for not only learning to write scripts for Bro but for understanding Bro itself.

Gaining familiarity with the specific events generated by Bro is a big step towards building a mind set for working with Bro scripts.  The majority of events generated by Bro are defined in the built-in-function files or .bif files which also act as the basis for online event documentation.  Whether starting a script from scratch or reading and maintaining someone else's script, having the built-in event definitions available is an excellent resource to have on hand.  Before release version 2.0 the Bro developers put significant effort into organization and documentation of every event.  This effort resulted in built-in-function files organized such that each entry contains a descriptive event name, the arguments passed to the event, and a concise explanation of the functions use.

.. literalinclude:: ../../../../build/src/base/event.bif.bro
   :language: bro
   :linenos:
   :lines: 4124-4149

Above is a segment of event.bif.bro showing the documentation for the event dns_request().  It's organized such that the documentation, commentary, and list of arguments precede the actual event definition used by Bro.  As Bro detects DNS requests being issued by an originator, it issues this event and any number of scripts then have access to the data Bro passes along with the event.  In this example, Bro passes not only the message, the query, query type and query class for the DNS request, but also a then record used for the connection itself.

The Connection Record Data Type
===============================

Of all the events defined in Bro's event.bif.bro file, an overwhelmingly large number of them are passed the connection record data type, in effect, making it the backbone of many scripting solutions.  The connection record itself, as we will see in a moment, is a mass of nested data types used to track state on a connection through its lifetime.  Let's walk through the process of selecting an appropriate event, generating some output to standard out and dissecting the connection record so as to get an overview of it.  We will cover data types in more detail later.

While Bro is capable of packet level processing, its strengths lay in the context of a connection between an originator and a responder.  As such, there are events defined for the primary parts of the connection life-cycle as you'll see from the small selection of connection-related events below.

.. literalinclude:: ../../../../build/src/base/event.bif.bro
   :language: bro
   :linenos:
   :lines: 135-138,154,204-208,218,255-256,266,335-340,351

Of the events listed, the event that will give us the best insight into the connection record data type will be connection_state_remove().  As detailed in the in-line documentation, Bro generates this event just before it decides to remove this event from memory, effectively forgetting about it.  Let's take a look at a simple script that will output the connection record for a single connection.

.. literalinclude:: ../../../../testing/btest/doc/manual/connection_record_01.bro
   :language: bro
   :linenos:
   :lines: 4-9

Again, we start with @load, this time importing the base/protocols/conn scripts which supply the tracking and logging of general information and state of connections.  We handle the connection_state_remove() event and simply print the contents of the argument passed to it.  For this example we're going to run Bro in "bare mode" which loads only the minimum number of scripts to retain operability and leaves the burden of loading required scripts to the script being run.  This will give us a chance to see the contents of the connection record without it being overly populated.

.. btest:: connection-record-01

    @TEST-EXEC: btest-rst-cmd bro -b -r ${TRACES}/dns-session.trace ${TESTBASE}/doc/manual/connection_record_01.bro

As you can see from the output, the connection record is something of a jumble when printed on its own.  Regularly taking a peek at a populated connection record helps to understand the relationship between its fields as well as allowing an opportunity to build a frame of reference for accessing data in a script.

Bro uses the dollar sign as it's field delimiter and a direct correlation exists between the output of the connection record and the proper format of a dereferenced variable in scripts. In the output of the script above, groups of information are collected between brackets, which would correspond to the $-delimiter in a Bro script.  For example, the originating host is referenced by c$id$orig_h which breaks down to "orig_h which is a member of id which is a member of the data structure referred to as c that was passed into the event handler."  Given that the responder port (c$id$resp_p) is 53/tcp, it's likely that Bro's base DNS scripts can further populate the connection record.  Let's load the base/protocols/dns scripts and check the output of our script.

.. literalinclude:: ../../../../testing/btest/doc/manual/connection_record_02.bro
   :language: bro
   :linenos:
   :lines: 4-10

.. btest:: connection-record-02

    @TEST-EXEC: btest-rst-cmd bro -b -r ${TRACES}/dns-session.trace ${TESTBASE}/doc/manual/connection_record_02.bro

The addition of the base/protocols/dns scripts populates the dns=[] member of the connection record.  While Bro is doing a massive amount of work in the background, it is in what is commonly called "script land" that details are being refined and decisions being made.  Were we to continue running in "bare mode" we could slowly keep adding infrastructure through @load statements.  For example, were we to load base/frameworks/logging, Bro would generate a conn.log and dns.log for us in the current working directory.  Not only is it good practice to include appropriate load statements in your scripts, but it also helps to illuminate the various functionalities within Bro's scripting language.

Data Types and Data Structures
==============================

Scope
-----

Before embarking on a exploration of Bro's native Data Types and Data Structures, it's important to have a good grasp of the different levels of scope available in Bro and the appropriate times to use them within a script.  The declarations of variables in Bro come in two forms.  Variables can be declared with or without a definition in the form "SCOPE name: TYPE" or "SCOPE name = EXPRESSION" respectively; each of which produce the same result if EXPRESSION evaluates to the same type as TYPE.  The decision as to which type of declaration to use is likely to be dictated by personal preference and readability.

.. literalinclude:: ../../../../testing/btest/doc/manual/data_type_declaration.bro
   :language: bro
   :linenos:
   :lines: 4-14

Global Variables
~~~~~~~~~~~~~~~~

A global variable is used when the state of variable needs to be tracked, not surprisingly, globally.  While there are some caveats, when a script declares a variable using the global scope, that script is granting access to that variable from other scripts.  The declaration below, is a taken from the known-hosts.bro script and declares a variable called "known_hosts"" as a global set of unique ip addresses(line 32) within the "Known" namespace (line 8) and exports it (line 10) for use outside of the "Known" namespace.  Were we to want to use the "known_hosts" variable we'd be able to access it through "Known::known_hosts".

.. literalinclude:: ../scripts/policy/protocols/conn/known-hosts.bro
   :language: bro
   :linenos:
   :lines: 8-10, 32, 37

The sample above also makes use of an export{} block.  When the module keyword is used in a script, the variables declared are said to be in that module's "namespace".  Where as a global variable can be accessed by its name alone when it is not declared within a module, a global variable declared within a module must be exported and then accessed via MODULE_NAME::VARIABLE_NAME.  As in the example above, we would be able to access the "known_hosts" in a separate script variable via "Known::known_hosts" due to the fact that known_hosts was declared as a global variable within an export block under the "Known" namespace.

Constants
~~~~~~~~~

The next level of scoping available in Bro are constants which are denoted by the "const" keyword.  Unlike globals, constants can only be set or altered at parse time, afterwards (in runtime) the constants are unalterable.  In most cases, constants are used in Bro scripts as containers for configuration options.  The majority of constants defined are defined with the "&redef" attribute, making it such that the constant can be redefined.  While the idea of a redefinable constant might be odd, the constraint that constants can only be altered at parse-time remains even with the "&redef" attribute.  In the code snippet below, a table of strings indexed by ports is declared as a constant before two values are added to the table through redef statements.  The table is then printed in a bro_init() event.  Were we to try to alter the table in an event handler, Bro would notify the user of an error and the script would fail.

.. literalinclude:: ../../../../testing/btest/doc/manual/data_type_const.bro
   :language: bro
   :linenos:
   :lines: 4-12

.. btest:: data_type_const.bro

    @TEST-EXEC: btest-rst-cmd bro -b ${TESTBASE}/doc/manual/data_type_const.bro

Local Variables
~~~~~~~~~~~~~~~

Whereas globals and constants are widely available in scriptland through various means, when a variable is defined with a local scope, its availability is restricted to the body of the event or function in which it was declared.  Local variables tend to be used for values that are only needed within a specific scope and once the processing of a script passes beyond that scope, the variable is deleted.  It is possible for a function to return a locally scoped variable but in doing so, it is the value that is returned.  Bro maintains a difference between variables and values.

.. literalinclude:: ../../../../testing/btest/doc/manual/data_type_local.bro
   :language: bro
   :linenos:
   :lines: 4-14


Data Structures
---------------

It's difficult to talk about Bro's data types in a practical manner without first covering the data structures available in Bro.  Some of the more interesting characteristics of data types are revealed when used inside of a data structure, but given that data structures are made up of data types, it devolved rather quickly into a "chicken-and-egg"" problem.  As such, we'll introduce data types from bird's eye view before diving into data structures and from there a more complete exploration of data types.

The table below shows the common data types used in Bro, of which, the first four should seem familiar if you have some scripting experience, while the remaining six are less common in other languages. It should come as no surprise that a scripting language for a Network Security Monitoring platform has a fairly robust set of network centric data types and taking note of them here may well save you a late night of reinventing the wheel.

+-----------+-------------------------------------+
| Data Type | Description                         |
+===========+=====================================+
| int       | 64 bit signed integer               |
+-----------+-------------------------------------+
| count     | 64 bit unsigned integer             |
+-----------+-------------------------------------+
| double    | double precision floating precision |
+-----------+-------------------------------------+
| bool      | boolean (T/F)                       |
+-----------+-------------------------------------+
| addr      | ip address, ipv4 and ipv6           |
+-----------+-------------------------------------+
| port      | transport layer port                |
+-----------+-------------------------------------+
| subnet    | CIDR subnet mask                    |
+-----------+-------------------------------------+
| time      | absolute epoch time                 |
+-----------+-------------------------------------+
| interval  | a time interval                     |
+-----------+-------------------------------------+
| pattern   | regular expression                  |
+-----------+-------------------------------------+

Data Structures
---------------

Vectors
~~~~~~~

Similar to arrays, vectors in Bro use contiguous storage for their contents and as such the contents can be accessed using a zero indexed numerical offset.  Each element in a vector is indexed by a count, starting and zero and progressing up to the current length of the list.  While members of a vector must all be of the same type, the vector itself can grow dynamically.  The format for the declaration of a vector follows the pattern of other declarations, namely, SCOPE v: vector of T where v is the name of your vector of T is the data type of its members.  For example, the following snippet shows an explicit and implicit declaration of a vector.

.. literalinclude:: ../../../../testing/btest/doc/manual/data_struct_vector.bro
   :language: bro
   :linenos:
   :lines: 6,7


Sets
~~~~

Tables
~~~~~~

 While each of the network centric data types should be be familiar in the type of data they represent, seeing the data types in action can be a better starting point, to do so a basic understanding of Bro's data structures will be helpful and


subnet
------

Bro has full support for CIDR notation subnets as a base data type.  There is no need to manage the IP and the subnet mask as two seperate entities when you can provide the same information in CIDR notation in your scripts.  The following example below uses a Bro script to determine if a series of IP addresses are within a set of subnets using a 20 bit subnet mask.  We'll be using some structures we have not yet introduced, as well as some operationts, but we'll touch on them here before going into more detail with them later on.

.. literalinclude:: ../../../../testing/btest/doc/manual/data_type_subnets.bro
   :language: bro
   :linenos:
   :lines: 4-19

Because this is a script that doesn't use any kind of network analysis, we can handle the event bro_init() which is always generated by Bro's core upon startup.  On lines six and seven, two locally scoped vectors are created to hold our lists of subnets and IP addresses respectively.  Then, using a set of nested for loops, we iterate over every subnet and every IP address and use an if statement to compare an IP address against a subnet using the in operator.  The in operator returns true if the IP address falls within a given subnet based on the longest prefix match calculation.  For example, 10.0.0.1 in 10.0.0.0/8 would return true while 192.168.2.1 in 192.168.1.0/24 would return false.  When we run the script, we get the output listing the IP address and the subnet in which it belongs.


.. btest:: data_type_subnets

    @TEST-EXEC: btest-rst-cmd bro ${TESTBASE}/doc/manual/data_type_subnets.bro