Restructuring the main documentation index.

I'm merging in the remaining pieces from the former doc directory and
restructuring things into sub-directories.
This commit is contained in:
Robin Sommer 2013-04-01 17:30:12 -07:00
parent 12e4dd8066
commit 25bf563e1c
41 changed files with 7679 additions and 100 deletions

503
doc/scripting/index.rst Normal file
View file

@ -0,0 +1,503 @@
===================
Writing Bro Scripts
===================
.. toctree::
:maxdepth: 2
:numbered:
Understanding Bro Scripts
The Event Queue and Event Handlers
The Connection Record Datatype
Data Types and Data Structures
Scope
Global Variables
Constants
Local Variables
Data Structures
Understanding Bro Scripts
=========================
Bro includes an event queue driven scripting language that provides the primary means for an organization to extend and customize Bro's functionality. An overwhelming amount of the output generated by Bro is, in fact, generated by Bro scripts. It's almost easier to consider Bro to be an entity behind-the-scenes processing connections and generating events while Bro's scripting language is the medium through which we mere mortals can achieve communication. Bro scripts effectively notify Bro that should there be an event of a type we define, then let us have the information about the connection so we can perform some function on it. For example, the ssl.log file is generated by a Bro script that walks the entire certificate chain and issues notifications if any of the steps along the certificate chain are invalid. This entire process is setup by telling Bro that should it see a server or client issue an SSL HELLO message, we want to know about the information about that connection.
It's often easier to understand Bro's scripting language by looking at a complete script and breaking it down into its identifiable components. In this example, we'll take a look at how Bro queries the Team Cymru Malware hash registry for detected downloads via HTTP. Part of the Team Cymru Malware Hash registry includes the ability to do a host lookup on a domain with the format MALWARE_HASH.malware.hash.cymru.com where MALWARE_HASH is the md5 or sha1 hash of a file. Team Cymru also populates the TXT record of their DNS responses with both a "last seen" timestamp and a numerical "detection rate". The important aspect to understand is Bro already generates hashes for files it can parse from HTTP streams, but the script ``detect-MHR.bro`` is responsible for generating the appropriate DNS lookup and parsing the response.
.. literalinclude:: ../scripts/policy/protocols/http/detect-MHR.bro
:language: bro
:linenos:
Visually, there are three distinct sections of the script. A base level with no indentation followed by an indented and formatted section explaining the custom variables being set (export) and another indented and formatted section describing the instructions for a specific event (event log_http). Don't get discouraged if you don't understand every section of the script; we'll cover the basics of the script and much more in following sections.
.. literalinclude:: ../scripts/policy/protocols/http/detect-MHR.bro
:language: bro
:linenos:
:lines: 7-11
Lines 7 and 8 of the script process the ``__load__.bro`` script in the respective directories being loaded. The ``@load`` directives are often considered good practice or even just good manners when writing Bro scripts that might be distributed. While it's unlikely that in a full production deployment of Bro these additional resources wouldn't already be loaded, it's not a bad habit to try to get into as you get more experienced with Bro scripting. If you're just starting out, this level of granularity might not be entirely necessary though.
.. literalinclude:: ../scripts/policy/protocols/http/detect-MHR.bro
:language: bro
:linenos:
:lines: 12-24
The export section redefines an enumerable constant that describes the type of notice we will generate with the logging framework. Bro allows for redefinable constants, which at first, might seem counter-intuitive. We'll get more indepth with constants in a later chapter, for now, think of them as variables that can only be altered before Bro starts running. The notice type listed allows for the use of the NOTICE() function to generate notices of type Malware_Hash_Registry_Match as done in the next section. Notices allow Bro to generate some kind of extra notification beyond its default log types. Often times, this extra notification comes in the form of an email generated and sent to a pre-configured address.
.. literalinclude:: ../scripts/policy/protocols/http/detect-MHR.bro
:language: bro
:linenos:
:lines: 26-44
The workhorse of the script is contained in the event handler for ``log_http``. The ``log_http`` event is defined as an event-hook in the base/protocols/http/main.bro script and allows scripts to handle a connection as it is being passed to the logging framework. The event handler is passed an HTTP::Info data structure which will be referred to as "rec" in body of the event handler.
An if statement is used to check for the existence of a data structure named "md5" nested within the rec data structure. Bro uses the "$" as a deference operator and as such, and it is employed in this script to check if rec$md5 is present by including the "?" operator within the path. If the rec data structure includes a nested data structure named "md5", the statement is processed as true and a local variable named "hash_domain" is provisioned and given a format string based on the contents of rec$md5 to produce a valid DNS lookup.
The rest of the script is contained within a when() block. In short, a when block is used when Bro needs to perform asynchronous actions, such a DNS lookup, to ensure that performance isn't effected. The when() block performs a DNS TXT lookup and stores the result in the local variable MHR_result. Effectively, processing for this event continues and upon receipt of the values returned by lookup_hostname_txt(), the when block is executed. The when block splits the string returned into two seperate values and checks to ensure an expected format. If the format is invalid, the script assumes that the hash wasn't found in the respository and processing is concluded. If the format is as expected and the detection rate is above the threshold set by MHR_threshold, two new local variables are created and used in the notice issued by NOTICE().
In approximately 15 lines of actual code, Bro provides an amazing utility that would be incredibly difficult to implement and deploy with other products. In truth, claiming that Bro does this in 15 lines is a misdirection; There is a truly massive number of things going on behind-the-scenes in Bro, but it is the inclusion of the scripting language that gives analysts access to those underlying layers in a succinct and well defined manner.
The Event Queue and Event Handlers
==================================
Bro's scripting language is event driven which is a gear change from the majority of scripting languages with which most users will have previous experience. Scripting in Bro depends on handling the events generated by Bro as it processes network traffic, altering the state of data structures through those events, and making decisions on the information provided. This approach to scripting can often cause confusion to users who come to Bro from a procedural or functional language, but once the initial shock wears off it becomes more clear with each exposure.
Bro's core acts to place events into an ordered "event queue", allowing event handlers to process them on a first-come-first-serve basis. In effect, this is Bro's core functionality as without the scripts written to perform discrete actions on events, there would be little to no usable output. As such, a basic understanding of the event queue, the events being generated, and the way in which event handlers process those events is a basis for not only learning to write scripts for Bro but for understanding Bro itself.
Gaining familiarity with the specific events generated by Bro is a big step towards building a mind set for working with Bro scripts. The majority of events generated by Bro are defined in the built-in-function files or .bif files which also act as the basis for online event documentation. These in-line comments are compiled into an online documentation system using Broxygen. Whether starting a script from scratch or reading and maintaining someone else's script, having the built-in event definitions available is an excellent resource to have on hand. For the 2.0 release the Bro developers put significant effort into organization and documentation of every event. This effort resulted in built-in-function files organized such that each entry contains a descriptive event name, the arguments passed to the event, and a concise explanation of the functions use.
.. literalinclude:: ../../../../build/src/base/event.bif.bro
:language: bro
:linenos:
:lines: 4124-4149
Above is a segment of the documentation for the event dns_request(). It's organized such that the documentation, commentary, and list of arguments precede the actual event definition used by Bro. As Bro detects DNS requests being issued by an originator, it issues this event and any number of scripts then have access to the data Bro passes along with the event. In this example, Bro passes not only the message, the query, query type and query class for the DNS request, but also a then record used for the connection itself.
The Connection Record Data Type
===============================
Of all the events defined by Bro, an overwhelmingly large number of them are passed the connection record data type, in effect, making it the backbone of many scripting solutions. The connection record itself, as we will see in a moment, is a mass of nested data types used to track state on a connection through its lifetime. Let's walk through the process of selecting an appropriate event, generating some output to standard out and dissecting the connection record so as to get an overview of it. We will cover data types in more detail later.
While Bro is capable of packet level processing, its strengths lay in the context of a connection between an originator and a responder. As such, there are events defined for the primary parts of the connection life-cycle as you'll see from the small selection of connection-related events below.
.. literalinclude:: ../../../../build/src/base/event.bif.bro
:language: bro
:linenos:
:lines: 135-138,154,204-208,218,255-256,266,335-340,351
Of the events listed, the event that will give us the best insight into the connection record data type will be connection_state_remove(). As detailed in the in-line documentation, Bro generates this event just before it decides to remove this event from memory, effectively forgetting about it. Let's take a look at a simple script that will output the connection record for a single connection.
.. rootedliteralinclude:: ${BRO_SRC_ROOT}/testing/btest/doc/manual/connection_record_01.bro
:language: bro
:linenos:
:lines: 4-9
Again, we start with @load, this time importing the base/protocols/conn scripts which supply the tracking and logging of general information and state of connections. We handle the connection_state_remove() event and simply print the contents of the argument passed to it. For this example we're going to run Bro in "bare mode" which loads only the minimum number of scripts to retain operability and leaves the burden of loading required scripts to the script being run. While bare mode is a low level functionality incorporated into Bro, in this case, we're going to use it to demonstrate how different features of Bro add more and more layers of information about a connection. This will give us a chance to see the contents of the connection record without it being overly populated.
.. btest:: connection-record-01
@TEST-EXEC: btest-rst-cmd bro -b -r ${TRACES}/dns-session.trace ${TESTBASE}/doc/manual/connection_record_01.bro
As you can see from the output, the connection record is something of a jumble when printed on its own. Regularly taking a peek at a populated connection record helps to understand the relationship between its fields as well as allowing an opportunity to build a frame of reference for accessing data in a script.
Bro makes extensive use of nested data structures to store state and information gleaned from the analysis of a connection as a complete unit. To break down this collection of information, you will have to make use of use Bro's field delimiter the "$". For example, the originating host is referenced by c$id$orig_h which if given a narritive relates to "``orig_h`` which is a member of ``id`` which is a member of the data structure referred to as ``c`` that was passed into the event handler." Given that the responder port (``c$id$resp_p``) is 53/tcp, it's likely that Bro's base DNS scripts can further populate the connection record. Let's load the ``base/protocols/dns`` scripts and check the output of our script.
Bro uses the dollar sign as its field delimiter and a direct correlation exists between the output of the connection record and the proper format of a dereferenced variable in scripts. In the output of the script above, groups of information are collected between brackets, which would correspond to the $-delimiter in a Bro script.
.. rootedliteralinclude:: ${BRO_SRC_ROOT}/testing/btest/doc/manual/connection_record_02.bro
:language: bro
:linenos:
:lines: 4-10
.. btest:: connection-record-02
@TEST-EXEC: btest-rst-cmd bro -b -r ${TRACES}/dns-session.trace ${TESTBASE}/doc/manual/connection_record_02.bro
The addition of the ``base/protocols/dns`` scripts populates the dns=[] member of the connection record. While Bro is doing a massive amount of work in the background, it is in what is commonly called "script land" that details are being refined and decisions being made. Were we to continue running in "bare mode" we could slowly keep adding infrastructure through ``@load`` statements. For example, were we to ``@load base/frameworks/logging``, Bro would generate a conn.log and dns.log for us in the current working directory. As mentioned above, including the appropriate ``@load`` statements is not only good practice, but can also help to indicate which functionalities are being used in a script. Take a second to run the script with the ``-b`` flag and check the output when all of Bro's functionality is applied to the tracefile.
Data Types and Data Structures
==============================
Scope
-----
Before embarking on a exploration of Bro's native Data Types and Data Structures, it's important to have a good grasp of the different levels of scope available in Bro and the appropriate times to use them within a script. The declarations of variables in Bro come in two forms. Variables can be declared with or without a definition in the form "SCOPE name: TYPE" or "SCOPE name = EXPRESSION" respectively; each of which produce the same result if EXPRESSION evaluates to the same type as TYPE. The decision as to which type of declaration to use is likely to be dictated by personal preference and readability.
.. rootedliteralinclude:: ${BRO_SRC_ROOT}/testing/btest/doc/manual/data_type_declaration.bro
:language: bro
:linenos:
:lines: 4-14
Global Variables
~~~~~~~~~~~~~~~~
A global variable is used when the state of variable needs to be tracked, not surprisingly, globally. While there are some caveats, when a script declares a variable using the global scope, that script is granting access to that variable from other scripts. However, when a script uses the ``module`` keyword to give the script a namespace, more care must be given to the declaration of globals to ensure the intended result. When a global is declared in a script with a namespace there are two possible outcomes. First, the variable is available only within the context of the namespace. In this scenario, other scripts within the same namespace will have access to the varaible declared while scripts using a different namespace or no namespace altogether will not have access to the variable. Alternatively, if a global variable is declared within an ``export{}`` block that variable is available to any other script through the naming convention of ``MODULE::variable_name``.
The declaration below, is a taken from the ``known-hosts.bro`` script and declares a variable called "known_hosts"" as a global set of unique ip addresses(line 32) within the "Known" namespace (line 8) and exports it (line 10) for use outside of the "Known" namespace. Were we to want to use the "known_hosts" variable we'd be able to access it through ``Known::known_hosts``.
.. literalinclude:: ../scripts/policy/protocols/conn/known-hosts.bro
:language: bro
:linenos:
:lines: 8-10, 32, 37
The sample above also makes use of an export{} block. When the module keyword is used in a script, the variables declared are said to be in that module's "namespace". Where as a global variable can be accessed by its name alone when it is not declared within a module, a global variable declared within a module must be exported and then accessed via MODULE_NAME::VARIABLE_NAME. As in the example above, we would be able to access the "known_hosts" in a separate script variable via "Known::known_hosts" due to the fact that known_hosts was declared as a global variable within an export block under the "Known" namespace.
Constants
~~~~~~~~~
Bro also makes use of constants which are denoted by the "const" keyword. Unlike globals, constants can only be set or altered at parse time if the &redef attribute has been used. Afterwards (in runtime) the constants are unalterable. In most cases, redefinable constants are used in Bro scripts as containers for configuration options. For example, the configuration option to log password decrypted from HTTP streams is stored in ``HTTP::default_capture_password`` as shown in the stripped down excerpt from ``http/main.bro`` below.
.. rootedliteralinclude:: ${BRO_SRC_ROOT}/scripts/base/protocols/http/main.bro
:language: bro
:linenos:
:lines: 8-10,19,20,74
Because the constant was declared with the ``&redef`` attribute, if we needed to turn this option on globally, we could do so by adding the following line to our ``site/local.bro`` file before firing up Bro.
.. rootedliteralinclude:: ${BRO_SRC_ROOT}/testing/btest/doc/manual/data_type_const_simple.bro
:language: bro
:lines: 6
While the idea of a redefinable constant might be odd, the constraint that constants can only be altered at parse-time remains even with the "&redef" attribute. In the code snippet below, a table of strings indexed by ports is declared as a constant before two values are added to the table through redef statements. The table is then printed in a bro_init() event. Were we to try to alter the table in an event handler, Bro would notify the user of an error and the script would fail.
.. rootedliteralinclude:: ${BRO_SRC_ROOT}/testing/btest/doc/manual/data_type_const.bro
:language: bro
:linenos:
:lines: 4-12
.. btest:: data_type_const.bro
@TEST-EXEC: btest-rst-cmd bro -b ${TESTBASE}/doc/manual/data_type_const.bro
Local Variables
~~~~~~~~~~~~~~~
Whereas globals and constants are widely available in scriptland through various means, when a variable is defined with a local scope, its availability is restricted to the body of the event or function in which it was declared. Local variables tend to be used for values that are only needed within a specific scope and once the processing of a script passes beyond that scope and no longer used, the variable is deleted. While it is possible for a function to return a locally scoped variable, in doing so it retuns the value instead of the variable. Bro maintains a difference between variables an example of which is illustrated below. The script executes the event handler "bro_init()" which in turn calls the function "add_two(i: count)" with an argument of 10. Once Bro enters the add_two function, it provisions a locally scoped variable called "added_two" to hold the value of i+2, in this case, 12. The add_two function then prints the value of the added_two variable and returns its value to the bro_init() event handler. At this point, the variable "added_two" has fallen out of scope and no longer exists while the value 12 is still in use and stored in the locally scoped variable "test". When Bro finishes processing the bro_init function, the variable called "test" is no longer in scope and, since there exist no other references to the value "12", the value is also deleted.
.. rootedliteralinclude:: ${BRO_SRC_ROOT}/testing/btest/doc/manual/data_type_local.bro
:language: bro
:linenos:
:lines: 4-14
Data Structures
---------------
It's difficult to talk about Bro's data types in a practical manner without first covering the data structures available in Bro. Some of the more interesting characteristics of data types are revealed when used inside of a data structure, but given that data structures are made up of data types, it devolved rather quickly into a "chicken-and-egg"" problem. As such, we'll introduce data types from bird's eye view before diving into data structures and from there a more complete exploration of data types.
The table below shows the atomic types used in Bro, of which, the first four should seem familiar if you have some scripting experience, while the remaining six are less common in other languages. It should come as no surprise that a scripting language for a Network Security Monitoring platform has a fairly robust set of network centric data types and taking note of them here may well save you a late night of reinventing the wheel.
+-----------+-------------------------------------+
| Data Type | Description |
+===========+=====================================+
| int | 64 bit signed integer |
+-----------+-------------------------------------+
| count | 64 bit unsigned integer |
+-----------+-------------------------------------+
| double | double precision floating precision |
+-----------+-------------------------------------+
| bool | boolean (T/F) |
+-----------+-------------------------------------+
| addr | ip address, ipv4 and ipv6 |
+-----------+-------------------------------------+
| port | transport layer port |
+-----------+-------------------------------------+
| subnet | CIDR subnet mask |
+-----------+-------------------------------------+
| time | absolute epoch time |
+-----------+-------------------------------------+
| interval | a time interval |
+-----------+-------------------------------------+
| pattern | regular expression |
+-----------+-------------------------------------+
Sets
~~~~
Sets in Bro are used to stored a unique elements of the same data type. In essence, you can think of them as "a unique set of integers" or "a unique set of ip addresses". While the declaration of a set may differ based on the data type being collected, the set will always contain unique elements and the elements in the set will always be of the same data type. Such requirements make the set data type perfect for information that is already naturally unique such as ports or ip addresses. The code snippet below shows both an explicit and implicit declaration of a locally scoped set.
.. rootedliteralinclude:: ${BRO_SRC_ROOT}/testing/btest/doc/manual/data_struct_set_declaration.bro
:language: bro
:linenos:
:lines: 6,7
As you can see, sets are declared using the format "SCOPE var_name: set[TYPE]". Adding and removing elements in a set is achieved using the add and delete statements. Once you have elements inserted into the set, it's likely that you'll need to either iterate over that set or test for membership within the set, both of which are covered by the in operator. In the case of iterating over a set, combining the for statement and the in operator will allow you to sequentially process each element of the set as seen below.
.. rootedliteralinclude:: ${BRO_SRC_ROOT}/testing/btest/doc/manual/data_struct_set_declaration.bro
:language: bro
:linenos:
:lines: 21-29
Here, the for statement loops over the contents of the set storing each element in the temporary variable "i". With each iteration of the for loop, the next element is chosen. Since sets are not an ordered data type, you cannot guarantee the order of the elements as the for loop processes.
To test for membership in a set the in statment can be combined with an if statement to return a true or false value. If the exact element in the condition is already in the set, the condition returns true and the body executes. The in statement can also be negated by the ! operator to create the inverse of the condition. While line 16 of the code snippet below could be rewrite as "if (!( 587/tcp in ssl_ports ))" try to avoid using this construct; instead, negate the in operator itself. While the functionality is the same, using the "!in" is more efficient as well as a more natural construct which will aid in the readability of your script.
.. rootedliteralinclude:: ${BRO_SRC_ROOT}/testing/btest/doc/manual/data_struct_set_declaration.bro
:language: bro
:linenos:
:lines: 15-19
You can see the full script and its output below.
.. rootedliteralinclude:: ${BRO_SRC_ROOT}/testing/btest/doc/manual/data_struct_set_declaration.bro
:language: bro
:linenos:
:lines: 4-21
.. btest:: data_struct_set_declaration
@TEST-EXEC: btest-rst-cmd bro -b ${TESTBASE}/doc/manual/data_struct_set_declaration.bro
Tables
~~~~~~
A table in Bro is a mapping of a key to a value or yield. While the values don't have to be unique, each key in the table must be unique to preserve a one-to-one mapping of keys to values. In the example below, we've compiled a table of SSL enable services and their common ports. The explicit declaration and constructor for the table on lines 3 and 4 lay out the data types of the keys (strings) and the data types of the yields (ports) and then fill in some sample key and yeild pairs. Line 5 shows how to use a table accessor to insert one key-yield pair into the table. When using the in operator on a table, you are effectively working with the keys of the table. In the case of an inf statement, the in operator will check for membership among the set of keys and return a true or false value. As seen on line 7, we are checking if "SMTPS" is not in the set of keys for the ssl_services table and if the condition holds true, we add the key-yield pair to the table. Line 12 shows the use of a for statement to iterate over each key currently in the table.
.. rootedliteralinclude:: ${BRO_SRC_ROOT}/testing/btest/doc/manual/data_struct_table_declaration.bro
:language: bro
:linenos:
:lines: 4-21
.. btest:: data_struct_table_declaration
@TEST-EXEC: btest-rst-cmd bro -b ${TESTBASE}/doc/manual/data_struct_table_declaration.bro
Simple examples aside, tables can become extremely complex as the keys and values for the table become more intricate. Tables can have keys comprised of multiple data types and even a series of elements called a 'tuple'. The flexibility gained with the use of complex tables in Bro implies a cost in complexity for the person writing the scripts but pays off in effectiveness given the power of Bro as a network security platform.
The script below shows a sample table of strings indexed by two strings, a count, and a final string. With a tuple acting as an aggregate key, the order is the important as a change in order would result in a new key. Here, we're using the table to track the Director, Studio, year or release, and lead actor in a series of samurai flicks. It's important to note that in the case of the for statement, it's an all or nothing kind of iteration. We cannot iterate over, say, the directors; We have to iterate with the exact format as the keys themselves. In this case, we need squared brackets surrounding four temporary variables to act as a collection for our iteration. While this is a contrived example, we could easily have had keys containin ip addresses(addr), ports(port) and even a string calculated as the result of a reverse hostname lookup.
.. rootedliteralinclude:: ${BRO_SRC_ROOT}/testing/btest/doc/manual/data_struct_table_complex.bro
:language: bro
:linenos:
:lines: 4-21
.. btest:: data_struct_table_complex
@TEST-EXEC: btest-rst-cmd bro -b ${TESTBASE}/doc/manual/data_struct_table_complex.bro
Vectors
~~~~~~~
If you're coming to Bro with a programming background, you may or may not be familiar with the Vector data type depending on your language of choice. On the surface, vectors perform much of the same functionality as arrays, but in truth they are more like associative arrays with zero indexed unsigned integers as the indecies. As such any time you need to sequentially store data of the same type, in Bro you'll have to reach for a vector. Vectors are a collection of objects, all of which are of the same data type, to which elements can be dynamically added or removed. Since Vectors use contiguous storage for their elements, the contents of a vector can be accessed through a zero indexed numerical offset.
The format for the declaration of a Vector follows the pattern of other declarations, namely, "SCOPE v: vector of T" where v is the name of your vector of T is the data type of its members. For example, the following snippet shows an explicit and implicit declaration of two locally scoped vectors. The script populates the first vector by inserting values at the end by placing the vector name between two vertical pipes to get the Vector's current length before printing the contents of both Vectors and their current lengths.
.. rootedliteralinclude:: ${BRO_SRC_ROOT}/testing/btest/doc/manual/data_struct_vector_declaration.bro
:language: bro
:linenos:
:lines: 4-16
.. btest:: data_struct_vector_declaration
@TEST-EXEC: btest-rst-cmd bro -b ${TESTBASE}/doc/manual/data_struct_vector_declaration.bro
In a lot of cases, storing elements in a vector is simply a precursor to then iterating over them. Iterating over a vector is easy with the for keyword. The sample below iterates over a vector of IP addresses and for each IP address, masks that address with 18 bits. The for keyword is used to generate a locally scoped variable called "i" which will hold the index of the current element in the vector. Using i as an index to addr_vector we can access the current item in the vector with addr_vector[i].
.. rootedliteralinclude:: ${BRO_SRC_ROOT}/testing/btest/doc/manual/data_struct_vector_iter.bro
:language: bro
:linenos:
:lines: 4-12
.. btest:: data_struct_vector_iter
@TEST-EXEC: btest-rst-cmd bro -b ${TESTBASE}/doc/manual/data_struct_vector_iter.bro
Data Types Revisited
--------------------
addr
~~~~
The addr, or address, data type manages to cover a surprisingly large amount of ground while remaining succinct. IPv4, IPv6 and even hostname constants are included in the addr data type. While IPv4 addresses use the default dotted quad formatting, IPv6 addresses use RFC 2373 defined notation with the addition of squared brackets wrapping the entire address. When you venture into hostname constants, Bro performs a little slight of hand for the benefit of the user; a hostname constant is, in fact, a set of addresses. Bro will issue a DNS request when it sees a hostname constant in use and return a set whose elements are the answers to the DNS request. For example, if you were to use ``local google = www.google.com;`` you would end up with a locally scoped set of addr elements that represent the current set of round robin DNS entries for google. At first blush, this seems trivial, but it is yet another example of Bro making the life of the common Bro scripter a little easier through abstraction applied in a practical manner.
port
~~~~
Transport layer port numbers in Bro are represented in the format of ``unsigned integer/protocol name`` e.g. ``22/tcp`` or ``53/udp``. Bro supports TCP( /tcp ), UDP( /udp ) , ICMP( /icmp ) and UNKNOWN( /unknown ) as protocol designations. While ICMP doesn't have an actual port, Bro supports the concept of ICMP "ports" by using the ICMP message type and ICMP message code as the source and destination port respectively. Ports can be compared for equality using the == or != operators and can even be compared for ordering. Bro gives the protocol designations the following "order": unknown < tcp < udp < icmp.
Ports can be compared for equality and also for ordering. When comparing order across transport-level protocols, unknown < tcp < udp < icmp, for example 65535/tcp is smaller than 0/udp.
subnet
~~~~~~
Bro has full support for CIDR notation subnets as a base data type. There is no need to manage the IP and the subnet mask as two seperate entities when you can provide the same information in CIDR notation in your scripts. The following example below uses a Bro script to determine if a series of IP addresses are within a set of subnets using a 20 bit subnet mask.
.. rootedliteralinclude:: ${BRO_SRC_ROOT}/testing/btest/doc/manual/data_type_subnets.bro
:language: bro
:linenos:
:lines: 4-19
Because this is a script that doesn't use any kind of network analysis, we can handle the event bro_init() which is always generated by Bro's core upon startup. On lines six and seven, two locally scoped vectors are created to hold our lists of subnets and IP addresses respectively. Then, using a set of nested for loops, we iterate over every subnet and every IP address and use an if statement to compare an IP address against a subnet using the in operator. The in operator returns true if the IP address falls within a given subnet based on the longest prefix match calculation. For example, 10.0.0.1 in 10.0.0.0/8 would return true while 192.168.2.1 in 192.168.1.0/24 would return false. When we run the script, we get the output listing the IP address and the subnet in which it belongs.
.. btest:: data_type_subnets
@TEST-EXEC: btest-rst-cmd bro ${TESTBASE}/doc/manual/data_type_subnets.bro
time
~~~~
While there is currently no supported way to add a time constant in Bro, two built-in functions exist to make use of the ``time`` data type. Both ``network_time()`` and ``current_time()`` return a ``time`` data type but they each return a time based on different criteria. The ``current_time()`` function returns what is called the wall-clock time as defined by the operating system. However, ``network_time()`` returns the timestamp of the last packet processed be it from a live data stream or saved packet capture. Both functions return the time in epoch seconds, meaning ``strftime`` must be used to turn the output into human readable output. The script below makes use of the ``connection_established`` event handler to generate text every time a SYN/ACK packet is seen responding to a SYN packet as part of a TCP handshake. The text generated, is in the format of a timestamp and an indication of who the originator and responder were. We use the ``strftime`` format string of ``%Y%M%d %H:%m:%S`` to produce a common date time formatted time stamp.
.. rootedliteralinclude:: ${BRO_SRC_ROOT}/testing/btest/doc/manual/data_type_time.bro
:language: bro
:linenos:
:lines: 4-7
When the script is executed we get an output showing the details of established connections.
.. btest:: data_type_time
@TEST-EXEC: btest-rst-cmd bro -r ${TRACES}/wikipedia.trace ${TESTBASE}/doc/manual/data_type_time.bro
interval
~~~~~~~~
The interval data type is another area in Bro where rational application of abstraction made perfect sense. As a data type, the interval represents a relative time as denoted by a numeric constant followed by a unit of time. For example, 2.2 seconds would be ``2.2sec`` and thirty-one days would be represented by ``31days``. Bro supports usec, msec, sec, min, hr, or day which represent microseconds, milliseconds, seconds, minutes, hours, and days respectively. In fact, the interval data type allows for a surprising amount of variation in its definitions. There can be a space between the numeric constant or they can crammed together like a temporal portmanteau. The time unit can be either singular or plural. All of this adds up to to the fact that both ``42hrs`` and ``42 hr`` are perfectly valid and logically equivalent in Bro. The point, however, is to increase the readability and thus maintainability of a script. Intervals can even be negated, allowing for ``- 10mins`` to represent "ten minutes ago".
Intervals in Bro can have mathematical operations performed against them allowing the user to perform addition, subtraction, multiplication, division, and comparison operations. As well, Bro returns an interval when comparing two ``time`` values using the ``-`` operator. The script below amends the script started in the section above to include a time delta value printed along with the connection establishment report.
.. rootedliteralinclude:: ${BRO_SRC_ROOT}/testing/btest/doc/manual/data_type_interval.bro
:language: bro
:linenos:
:lines: 4-20
This time, when we execute the script we see an additional line in the output to display the time delta since the last fully established connection.
.. btest:: data_type_interval
@TEST-EXEC: btest-rst-cmd bro -r ${TRACES}/wikipedia.trace ${TESTBASE}/doc/manual/data_type_interval.bro
Pattern
~~~~~~~
Bro has support for fast text searching operations using regular expressions and even goes so far as to declare a native data type for the patterns used in regular expressions. A pattern constant is created by enclosing text within the forward slash characters. Bro supports syntax very similar to the flex lexical analyzer syntax. The most common use of patterns in Bro you are likely to come across is embedded matching using the ``in`` operator. Embedded matching adheres to a strict format, requiring the regular expression or pattern constant to be on the left side of the ``in`` operator and the string against which it will be tested to be on the right.
.. rootedliteralinclude:: ${BRO_SRC_ROOT}/testing/btest/doc/manual/data_type_pattern_01.bro
:language: bro
:linenos:
:lines: 4-13
In the sample above, two local variables are declared to hold our sample sentence and regular expression. Our regular expression in this case will return true if the string contains either the word "quick" or the word "fox." The ``if`` statement on line six uses embedded matching and the ``in`` operator to check for the existence of the pattern within the string. If the statement resolves to true, split is called to break the string into separate pieces. Split takes a string and a pattern as its arguments and returns a table of strings indexed by a count. Each element of the table will be the segments before and after any matches against the pattern but excluding the actual matches. In this case, our pattern matches twice, and results in a table with three entries. Lines 11 through 13 print the contents of the table in order.
.. btest:: data_type_pattern
@TEST-EXEC: btest-rst-cmd bro ${TESTBASE}/doc/manual/data_type_pattern_01.bro
Patterns can also be used to compare strings using equality and inequality operators through the ``==`` and ``!=`` operators respectively. When used in this manner however, the string must match entirely to resolve to true. For example, the script below uses two ternary conditional statements to illustrate the use of the ``==`` operators with patterns. On lines 5 and 8 the output is altered based on the result of the comparison between the pattern and the string.
.. rootedliteralinclude:: ${BRO_SRC_ROOT}/testing/btest/doc/manual/data_type_pattern_02.bro
:language: bro
:linenos:
:lines: 4-13
.. btest:: data_type_pattern_02
@TEST-EXEC: btest-rst-cmd bro ${TESTBASE}/doc/manual/data_type_pattern_02.bro
Record Data Type
================
With Bro's support for a wide array of data types and data structures an obvious extension of is to include the ability to create custom data types consisting of the atomic types and ddata structures. To accomplish this, Bro introduces the ``record`` type and the ``type`` keyword. Similar to how you would define a new data structure in C with the ``typedef`` and ``struct`` keywords, Bro allows you to cobble together new data types to suit the needs of your situation.
When combined with the ``type`` keyword, ``record`` can generate a composite type. We have, in fact, already encountered a a complex example of the ``record`` data type in the earlier sections. The excerpt below shows the definition of the connection record data type as defined in the ``Conn`` module.
.. rootedliteralinclude:: ${BRO_SRC_ROOT}/scripts/base/protocols/conn/main.bro
:language: bro
:linenos:
:lines: 10-12,16,17,19,21,23,25,28,31,35,37,56,62,68,90,93,97,100,104,108,109,114
While it might be surprising that Bro is tracking and passing around this much data for each connection, it shouldn't be too surprising, given our exploration of it earlier, that the connection record consists of a collection of atomic data types, simple data types and even another ``record``. Looking at the structure of the definition, a new collection of data types is being defined as a type called ``Info``. Since this type definition is within the confines of an export block, what is defined is, in fact, ``Conn::Info``.
The formatting for a declaration of a record type in Bro includes the descriptive name of the type being defined and the seperate fields that make up the record. The individual fields that make up the new record are not limited in type or number as long as the name for each field is unique.
.. rootedliteralinclude:: ${BRO_SRC_ROOT}/testing/btest/doc/manual/data_struct_record_01.bro
:language: bro
:linenos:
:lines: 4-25
.. btest:: data_struct_record_01
@TEST-EXEC: btest-rst-cmd bro ${TESTBASE}/doc/manual/data_struct_record_01.bro
The sample above shows a simple type definition that includes a string, a set of ports, and a count to define a service type. Also included is a function to print each field of a record in a formatted fashion and a bro_init event handler to show some functionality of working with records. The defintions of the dns and http services are both done inline using squared brackets before being passed to the ``print_service`` function. The ``print_service`` function makes use of the ``$`` dereference operator to access the fields within the newly defined Service record type.
As you saw in the definition for the connection record, other records are even valid as fields within another record. We can extend the example above to include another record that contains a Service record.
.. rootedliteralinclude:: ${BRO_SRC_ROOT}/testing/btest/doc/manual/data_struct_record_02.bro
:language: bro
:linenos:
:lines: 4-25
.. btest:: data_struct_record_02
@TEST-EXEC: btest-rst-cmd bro ${TESTBASE}/doc/manual/data_struct_record_02.bro
The example above includes a second record type in which a field is used as the data type for a set. Records can be reapeatedly nested within other records, their fields reachable through repeated chains of the ``$`` dereference operator.
It's also common to see a ``type`` used to simply alias a data structure to a more descriptive name. The example below shows an example of this from Bro's own type definitions file.
.. rootedliteralinclude:: ${BRO_SRC_ROOT}/scripts/base/init-bare.bro
:language: bro
:linenos:
:lines: 12,19,26
The three lines above alias a type of data structure to a descriptive name. Functionally, the operations are the same, however, each of the types above are named such that their function is instantly identifiable. This is another place in Bro scripting where consideration can lead to better readability of your code and thus easier maintainability in the future.
Logging Framework
=================
Armed with a decent understanding of the data types and data structures in Bro, exploring the various frameworks available is a much more rewarding effort. The framework with which most users are likely to have the most interaction is the Logging Framework. Designed in such a way to so as to abstract much of the process of creating a file and appending ordered and organized data into it, the Logging Framework makes use of some potentially unfamiliar nomenclature. Specifically, Log streams, Filters and Writers are simply abstractions of the processes required to manage a high rate of incoming logs while maintaining full operability. If you've seen Bro employed in an environment with a large connection, you know that logs are produced incredibly quickly; the ability to process a large set of data and write it to disk is due to the design of the Logging Framework.
Data is written to a Log stream based on decision making processes in Bro's scriptland. Log streams correspond to a single log as defined by the set of name value pairs that make up its fields. That data can then be filtered, modified, or redirected with Logging Filters which, by default, are set to log everything. Filters can be used to break log files into subsets or duplicate that information to another output. The final output of the data is defined by the writer. Bro's default writer is simple tab separated ASCII files but Bro also includes support for DataSeries and Elasticsearch outputs as well as additional writers currently in development. While these new terms and ideas may give the impression that the Logging Framework is difficult to work with, the actual learning curve is, in actuality, not very steep at all. The abstraction built into the Logging Framework makes it such that a vast majority of scripts needs not go past the basics. In effect, writing to a log file is as simple as defining the format of your data, letting Bro know that you wish to create a new log, and then calling the ``Log::write()`` method to output log records.
The Logging Framework is an are in Bro where, the more you see it used and the more you use it yourself, the more second nature the boilerplate parts of the code will become. As such, let's work through a contrived example of simply logging the digits 1 through 10 and their corresponding factorial to the default ASCII log writer. It's always best to work through the problem once, simulating the desired output with ``print`` and ``fmt`` before attempting to dive into the Logging Framework. Below is a script that defines a factorial function to recursively calculate the factorial of a unsigned integer passed as an argument to the function. Using ``print`` and ``fmt`` we can ensure that Bro can perform these calculations correctly as well get an idea of the answers ourselves.
.. rootedliteralinclude:: ${BRO_SRC_ROOT}/testing/btest/doc/manual/framework_logging_factorial_01.bro
:language: bro
:linenos:
:lines: 4-25
.. btest:: framework_logging_factorial_01
@TEST-EXEC: btest-rst-cmd bro ${TESTBASE}/doc/manual/framework_logging_factorial_01.bro
The output of the script aligns with what we expect so now it's time to integrate the Logging Framework. As mentioned above we have to perform a few steps before we can issue the ``Log::write()`` method and produce a logfile. As we are working within a namespace and informing an outside entity of workings and data internal to the namespace, we use an ``export{}`` block. First we need to inform Bro that we are going to be adding another Log stream by adding a value to the ``Log::ID`` enumerable. In line 3 of the script, we append the value ``LOG`` to the Log::ID enumerable, however due to this being in an export block the value appended to ``Log::ID`` is actually ``Factor::Log``. Next, we need to define the name and value pairs that make up the data of our logs and dictate its format. Lines 5 through 9 define a new datatype called an Info record (actually, ``Factor::Info``) with two fields, both unsigned integers. Each of the fields in the ``Factor::Log`` record type include the ``&log`` attribute, indicating that these fields should be passed to the Logging Framework when ``Log::write()`` is called. Were there to be any name value pairs without the ``&log`` attribute, those fields would simply be ignored during logging but remain available for the lifespan of the variable. The next step is to create the logging stream with ``Log::create_stream()`` which takes a Log::ID and a record as its arguments. In this example, on line 28, we call the ``Log::create_stream()`` method and pass ``Factor::LOG`` and the ``Factor::Info`` record as arguments. From here on out, if we issue the ``Log::write()`` command with the correct Log::ID and a properly formatted ``Factor::Info`` record, a log entry will be generated.
.. rootedliteralinclude:: ${BRO_SRC_ROOT}/testing/btest/doc/manual/framework_logging_factorial_02.bro
:language: bro
:linenos:
:lines: 4-40
Now, if we run the new version of the script, instead of generating logging information to stdout, no output is created. Instead the output is all in factor.log, properly formatted and organized.
.. btest:: framework_logging_factorial_02
@TEST-EXEC: btest-rst-cmd bro ${TESTBASE}/doc/manual/framework_logging_factorial_02.bro
While the previous example is a simplistic one, it serves to demonstrate the small pieces of script that need to be in place in order to generate logs. For example, it's common to call ``Log::create_stream()`` in ``bro_init()`` and while in a live example, determining when to call ``Log::write()`` would likely be done in an event handler, in this case we use ``bro_done()``.
If you've already spent time with a deployment of Bro, you've likely had the opportunity to view, search through, or manipulate the logs produced by the Logging Framework. The log output from a default installation of Bro is substantial to say the least, however, there are times in which the way the Logging Framework by default isn't ideal for the situation. This can range from needing to log more or less data with each call to ``Log::write()`` or even the need to split log files based on arbitrary logic. In the later case, Filters come into play along with the Logging Framework. Filters grant a level of customization to Bro's scriptland, allowing the script writer to include or exclude fields in the log and even make alterations to the path of the file in which the logs are being placed. Each stream, when created, is given a default filter called, not surprisingly, ``default``. When using the ``default`` filter, every key value pair with the ``&log`` attribute is written to a single file. For the example we've been using, let's extend it so as to write any factorial which is a factor of 5 to an alternate file, while writing the remaining logs to factor.log.
.. rootedliteralinclude:: ${BRO_SRC_ROOT}/testing/btest/doc/manual/framework_logging_factorial_03.bro
:language: bro
:linenos:
:lines: 43-60
To dynamically alter the file in which a stream writes its logs a filter can specify function returns a string to be used as the filename for the current call to ``Log::write()``. The definition for this function has to take as its parameters a Log::ID called id, a string called path and the appropriate record type for the logs called "rec". You can see the definition of ``mod5`` used in this example on line one conforms to that requirement. The function simply returns "factor-mod5" if the factorial is divisible evenly by 5, otherwise, it returns "factor-non5". In the additional ``bro_init()`` event handler, we define a locally scoped ``Log::Filter`` and assign it a record that defines the ``name`` and ``path_func`` fields. We then call ``Log::add_filter()`` to add the filter to the ``Factor::LOG`` Log::ID and call ``Log::remove_filter()`` to remove the ``default`` filter for Factor::LOG. Had we not removed the ``default`` filter, we'd have ended up with three log files: factor-mod5.log with all the factorials that are a factors of 5, factor-non5.log with the factorials that are not factors of 5, and factor.log which would have included all factorials.
The ability of Bro to generate easily customizable and extensible logs which remain easily parsable is a big part of the reason Bro has gained a large measure of respect. In fact, it's difficult at times to think of something that Bro doesn't log and as such, it is often advantageous for analysts and systems architects to instead hook into the logging framework to be able to perform custom actions based upon the data being sent to the Logging Frame. To that end, every default log stream in Bro generates a custom event that can be handled by anyone wishing to act upon the data being sent to the stream. By convention these events are usually in the format ``log_x`` where x is the name of the logging stream; as such the event raised for every log sent to the Logging Framework by the HTTP parser would be ``log_http``. In fact, we've already seen a script handle the ``log_http`` event when we broke down how the ``detect-MHR.bro`` script worked. In that example, as each log entry was sent to the logging framework, post-processing was taking place in the ``log_http`` event. Instead of using an external script to parse the ``http.log`` file and do post-processing for the entry, post-processing can be done in real time in Bro.
Telling Bro to raise an event in your own Logging stream is as simple as exporting that event name and then adding that event in the call to ``Log::create_stream``. Going back to our simple example of logging the factorial of an integer, we add ``log_factor`` to the ``export`` block and define the value to be passed to it, in this case the ``Factor::Info`` record. We then list the ``log_factor`` function as the ``$ev`` field in the call to ``Log::create_stream``
.. rootedliteralinclude:: ${BRO_SRC_ROOT}/testing/btest/doc/manual/framework_logging_factorial_04.bro
:language: bro
:linenos:
:lines: 4-62