Merge remote-tracking branch 'origin/topic/srunnels/documentation'

* origin/topic/srunnels/documentation:
  Spelling corrections.
  Include a better description for detect-MHR.bro
  Rewrite the MHR detection description.
  Spelling corrections.
  Update the lines included from events.bif.bro.
This commit is contained in:
Robin Sommer 2013-09-20 14:18:30 -07:00
commit 589a0239be
8 changed files with 18881 additions and 114 deletions

View file

@ -10,13 +10,6 @@ Writing Bro Scripts
Understanding Bro Scripts
=========================
.. todo::
The MHR integration has changed significantly since the text was
written. We need to update it, however I'm actually not sure this
script is a good introductory example anymore unfortunately.
-Robin
Bro includes an event-driven scripting language that provides
the primary means for an organization to extend and customize Bro's
functionality. Virtually all of the output generated by Bro
@ -33,100 +26,113 @@ are invalid. This entire process is setup by telling Bro that should
it see a server or client issue an SSL ``HELLO`` message, we want to know
about the information about that connection.
It's often the easiest to understand Bro's scripting language by
It's often easiest to understand Bro's scripting language by
looking at a complete script and breaking it down into its
identifiable components. In this example, we'll take a look at how
Bro queries the `Team Cymru Malware hash registry
<http://www.team-cymru.org/Services/MHR/>`_ for downloads via
HTTP. Part of the Team Cymru Malware Hash registry includes the
ability to do a host lookup on a domain with the format
``MALWARE_HASH.malware.hash.cymru.com`` where ``MALWARE_HASH`` is the MD5 or
SHA1 hash of a file. Team Cymru also populates the TXT record of
their DNS responses with both a "last seen" timestamp and a numerical
"detection rate". The important aspect to understand is Bro already
generates hashes for files it can parse from HTTP streams, but the
script ``detect-MHR.bro`` is responsible for generating the
appropriate DNS lookup and parsing the response.
Bro checks the SHA1 hash of various files extracted from network traffic
against the `Team Cymru Malware hash registry
<http://www.team-cymru.org/Services/MHR/>`_. Part of the Team Cymru Malware
Hash registry includes the ability to do a host lookup on a domain with the format
``<MALWARE_HASH>.malware.hash.cymru.com`` where ``<MALWARE_HASH>`` is the SHA1 hash of a file.
Team Cymru also populates the TXT record of their DNS responses with both a "first seen"
timestamp and a numerical "detection rate". The important aspect to understand is Bro already
generating hashes for files via the Files framework, but it is the
script ``detect-MHR.bro`` that is responsible for generating the
appropriate DNS lookup, parsing the response, and generating a notice if appropriate.
.. btest-include:: ${BRO_SRC_ROOT}/scripts/policy/frameworks/files/detect-MHR.bro
Visually, there are three distinct sections of the script. A base
level with no indentation followed by an indented and formatted
section explaining the custom variables being provided (``export``) and another
indented and formatted section describing the instructions for a
specific event (``event log_http``). Don't get discouraged if you don't
Visually, there are three distinct sections of the script. First, there is a base
level with no indentation where libraries are included in the script through ``@load``
and a namespace is defined with ``module``. This is followed by an indented and formatted
section explaining the custom variables being provided (``export``) as part of the script's namespace.
Finally there is a second indented and formatted section describing the instructions to take for a
specific event (``event file_hash``). Don't get discouraged if you don't
understand every section of the script; we'll cover the basics of the
script and much more in following sections.
.. btest-include:: ${BRO_SRC_ROOT}/scripts/policy/frameworks/files/detect-MHR.bro
:lines: 7-11
:lines: 4-6
Lines 7 and 8 of the script process the ``__load__.bro`` script in the
respective directories being loaded. The ``@load`` directives are
often considered good practice or even just good manners when writing
Bro scripts to make sure they can be
used on their own. While it's unlikely that in a
Bro scripts to make sure they can be used on their own. While it's unlikely that in a
full production deployment of Bro these additional resources wouldn't
already be loaded, it's not a bad habit to try to get into as you get
more experienced with Bro scripting. If you're just starting out,
this level of granularity might not be entirely necessary though.
this level of granularity might not be entirely necessary. The ``@load`` directives
are ensuring the Files framework, the Notice framework and the script to hash all files has
been loaded by Bro.
.. btest-include:: ${BRO_SRC_ROOT}/scripts/policy/frameworks/files/detect-MHR.bro
:lines: 12-24
:lines: 10-31
The export section redefines an enumerable constant that describes the
type of notice we will generate with the logging framework. Bro
allows for redefinable constants, which at first, might seem
type of notice we will generate with the Notice framework. Bro
allows for re-definable constants, which at first, might seem
counter-intuitive. We'll get more in-depth with constants in a later
chapter, for now, think of them as variables that can only be altered
before Bro starts running. The notice type listed allows for the use
of the :bro:id:`NOTICE` function to generate notices of type
``Malware_Hash_Registry_Match`` as done in the next section. Notices
``TeamCymruMalwareHashRegistry::Match`` as done in the next section. Notices
allow Bro to generate some kind of extra notification beyond its
default log types. Often times, this extra notification comes in the
form of an email generated and sent to a pre-configured address.
form of an email generated and sent to a preconfigured address, but can be altered
depending on the needs of the deployment. The export section is finished off with
the definition of two constants that list the kind of files we want to match against and
the minimum percentage of detection threshold in which we are interested.
Up until this point, the script has merely done some basic setup. With the next section,
the script starts to define instructions to take in a given event.
.. btest-include:: ${BRO_SRC_ROOT}/scripts/policy/frameworks/files/detect-MHR.bro
:lines: 26-44
:lines: 33-57
The workhorse of the script is contained in the event handler for
``log_http``. The ``log_http`` event is defined as an event-hook in
the :doc:`/scripts/base/protocols/http/main` script and allows scripts
to handle a connection as it is being passed to the logging framework.
The event handler is passed an :bro:type:`HTTP::Info` data structure
which will be referred to as ``rec`` in body of the event handler.
``file_hash``. The ``file_hash`` event is defined in the
:doc:`/scripts/base/bif/plugins/Bro_FileHash.events.bif.bro` script and allows scripts to access
the information associated with a file for which Bro's file analysis framework has
generated a hash. The event handler is passed the file itself as ``f``, the type of digest
algorithm used as ``kind`` and the hash generated as ``hash``.
An ``if`` statement is used to check for the existence of a data structure
named ``md5`` nested within the ``rec`` data structure. Bro uses the ``$`` as
a deference operator and as such, and it is employed in this script to
check if ``rec$md5`` is present by including the ``?`` operator within the
path. If the ``rec`` data structure includes a nested data structure
named ``md5``, the statement is processed as true and a local variable
named ``hash_domain`` is provisioned and given a format string based on
the contents of ``rec$md5`` to produce a valid DNS lookup.
On line 35, an ``if`` statement is used to check for the correct type of hash, in this case
a SHA1 hash. It also checks for a mime type we've defined as being of interest as defined in the
constant ``match_file_types``. The comparison is made against the expression ``f$mime_type``, which uses
the ``$`` dereference operator to check the value ``mime_type`` inside the variable ``f``. Once both
values resolve to true, a local variable is defined to hold a string comprised of the SHA1 hash concatenated
with ``.malware.hash.cymru.com``; this value will be the domain queried in the malware hash registry.
The rest of the script is contained within a ``when`` block. In
short, a ``when`` block is used when Bro needs to perform asynchronous
actions, such a DNS lookup, to ensure that performance isn't effected.
actions, such as a DNS lookup, to ensure that performance isn't effected.
The ``when`` block performs a DNS TXT lookup and stores the result
in the local variable ``MHR_result``. Effectively, processing for
this event continues and upon receipt of the values returned by
:bro:id:`lookup_hostname_txt`, the ``when`` block is executed. The
``when`` block splits the string returned into two seperate values and
checks to ensure an expected format. If the format is invalid, the
script assumes that the hash wasn't found in the respository and
processing is concluded. If the format is as expected and the
detection rate is above the threshold set by ``MHR_threshold``, two
new local variables are created and used in the notice issued by
:bro:id:`NOTICE`.
``when`` block splits the string returned into a portion for the date on which
the malware was first detected and the detection rate by splitting on an text space
and storing the values returned in a local table variable. In line 42, if the table
returned by ``split1`` has two entries, indicating a successful split, we store the detection
date in ``mhr_first_detect`` and the rate in ``mhr_detect_rate`` on lines 45 and 45 respectively
using the appropriate conversion functions. From this point on, Bro knows it has seen a file
transmitted which has a hash that has been seen by the Team Cymru Malware Hash Registry, the rest
of the script is dedicated to producing a notice.
In approximately 15 lines of actual code, Bro provides an amazing
On line 47, the detection time is processed into a string representation and stored in
``readable_first_detected``. The script then compares the detection rate against the
``notice_threshold`` that was defined on line 30. If the detection rate is high enough, the script
creates a concise description of the notice on line 50, a possible URL to check the sample against
virustotal.com's database, and makes the call to :bro:id:`NOTICE` to hand the relevant information
off to the Notice framework.
In approximately 25 lines of code, Bro provides an amazing
utility that would be incredibly difficult to implement and deploy
with other products. In truth, claiming that Bro does this in 15
with other products. In truth, claiming that Bro does this in 25
lines is a misdirection; there is a truly massive number of things
going on behind-the-scenes in Bro, but it is the inclusion of the
scripting language that gives analysts access to those underlying
layers in a succinct and well defined manner.
layers in a succinct and well defined manner.
The Event Queue and Event Handlers
==================================
@ -168,7 +174,7 @@ the event, and a concise explanation of the functions use.
:lines: 29-54
Above is a segment of the documentation for the event
:bro:id:`dns_request` (and the preceeding link points to the
:bro:id:`dns_request` (and the preceding link points to the
documentation generated out of that). It's organized such that the
documentation, commentary, and list of arguments precede the actual
event definition used by Bro. As Bro detects DNS requests being
@ -197,13 +203,8 @@ such, there are events defined for the primary parts of the connection
life-cycle as you'll see from the small selection of
connection-related events below.
.. todo::
Update the line numbers, this isn't pulling in the right events
anymore but I don't know which ones it were.
.. btest-include:: ${BRO_SRC_ROOT}/build/scripts/base/bif/event.bif.bro
:lines: 135-138,154,204-208,218,255-256,266,335-340,351
:lines: 69-72,88,106-109,129,132-137,148
Of the events listed, the event that will give us the best insight
into the connection record data type will be
@ -245,7 +246,7 @@ information gleaned from the analysis of a connection as a complete
unit. To break down this collection of information, you will have to
make use of use Bro's field delimiter ``$``. For example, the
originating host is referenced by ``c$id$orig_h`` which if given a
narritive relates to ``orig_h`` which is a member of ``id`` which is
narrative relates to ``orig_h`` which is a member of ``id`` which is
a member of the data structure referred to as ``c`` that was passed
into the event handler." Given that the responder port
(``c$id$resp_p``) is ``53/tcp``, it's likely that Bro's base DNS scripts
@ -343,7 +344,7 @@ Constants
Bro also makes use of constants, which are denoted by the ``const``
keyword. Unlike globals, constants can only be set or altered at
parse time if the ``&redef`` attribute has been used. Afterwards (in
runtime) the constants are unalterable. In most cases, redefinable
runtime) the constants are unalterable. In most cases, re-definable
constants are used in Bro scripts as containers for configuration
options. For example, the configuration option to log password
decrypted from HTTP streams is stored in
@ -359,7 +360,7 @@ following line to our ``site/local.bro`` file before firing up Bro.
.. btest-include:: ${DOC_ROOT}/scripting/data_type_const_simple.bro
While the idea of a redefinable constant might be odd, the constraint
While the idea of a re-definable constant might be odd, the constraint
that constants can only be altered at parse-time remains even with the
``&redef`` attribute. In the code snippet below, a table of strings
indexed by ports is declared as a constant before two values are added
@ -417,7 +418,7 @@ The table below shows the atomic types used in Bro, of which the
first four should seem familiar if you have some scripting experience,
while the remaining six are less common in other languages. It should
come as no surprise that a scripting language for a Network Security
Monitoring platform has a fairly robust set of network centric data
Monitoring platform has a fairly robust set of network-centric data
types and taking note of them here may well save you a late night of
reinventing the wheel.
@ -479,7 +480,7 @@ the ``for`` loop, the next element is chosen. Since sets are not an
ordered data type, you cannot guarantee the order of the elements as
the ``for`` loop processes.
To test for membership in a set the ``in`` statment can be combined
To test for membership in a set the ``in`` statement can be combined
with an ``if`` statement to return a true or false value. If the
exact element in the condition is already in the set, the condition
returns true and the body executes. The ``in`` statement can also be
@ -546,7 +547,7 @@ iterate over, say, the directors; we have to iterate with the exact
format as the keys themselves. In this case, we need squared brackets
surrounding four temporary variables to act as a collection for our
iteration. While this is a contrived example, we could easily have
had keys containin IP addresses (``addr``), ports (``port``) and even a ``string``
had keys containing IP addresses (``addr``), ports (``port``) and even a ``string``
calculated as the result of a reverse hostname lookup.
.. btest-include:: ${DOC_ROOT}/scripting/data_struct_table_complex.bro
@ -647,7 +648,7 @@ subnet
~~~~~~
Bro has full support for CIDR notation subnets as a base data type.
There is no need to manage the IP and the subnet mask as two seperate
There is no need to manage the IP and the subnet mask as two separate
entities when you can provide the same information in CIDR notation in
your scripts. The following example below uses a Bro script to
determine if a series of IP addresses are within a set of subnets
@ -807,7 +808,7 @@ composite type. We have, in fact, already encountered a a complex
example of the ``record`` data type in the earlier sections, the
:bro:type:`connection` record passed to many events. Another one,
:bro:type:`Conn::Info`, which corresponds to the fields logged into
``conn.log``, is shown by the exerpt below.
``conn.log``, is shown by the excerpt below.
.. btest-include:: ${BRO_SRC_ROOT}/scripts/base/protocols/conn/main.bro
:lines: 10-12,16,17,19,21,23,25,28,31,35,37,56,62,68,90,93,97,100,104,108,109,114
@ -818,7 +819,7 @@ definition is within the confines of an export block, what is defined
is, in fact, ``Conn::Info``.
The formatting for a declaration of a record type in Bro includes the
descriptive name of the type being defined and the seperate fields
descriptive name of the type being defined and the separate fields
that make up the record. The individual fields that make up the new
record are not limited in type or number as long as the name for each
field is unique.
@ -834,7 +835,7 @@ string, a set of ports, and a count to define a service type. Also
included is a function to print each field of a record in a formatted
fashion and a :bro:id:`bro_init` event handler to show some
functionality of working with records. The definitions of the DNS and
HTTP services are both done inline using squared brackets before being
HTTP services are both done in-line using squared brackets before being
passed to the ``print_service`` function. The ``print_service``
function makes use of the ``$`` dereference operator to access the
fields within the newly defined Service record type.
@ -851,7 +852,7 @@ record.
@TEST-EXEC: btest-rst-cmd bro ${DOC_ROOT}/scripting/data_struct_record_02.bro
The example above includes a second record type in which a field is
used as the data type for a set. Records can be reapeatedly nested
used as the data type for a set. Records can be repeatedly nested
within other records, their fields reachable through repeated chains
of the ``$`` dereference operator.
@ -1128,7 +1129,7 @@ which we will cover shortly.
+---------------------+------------------------------------------------------------------+----------------+----------------------------------------+
| policy_items | set[count] | &log &optional | Policy items that have been applied |
+---------------------+------------------------------------------------------------------+----------------+----------------------------------------+
| email_body_sections | vector | &optinal | Body of the email for email notices. |
| email_body_sections | vector | &optional | Body of the email for email notices. |
+---------------------+------------------------------------------------------------------+----------------+----------------------------------------+
| email_delay_tokens | set[string] | &optional | Delay functionality for email notices. |
+---------------------+------------------------------------------------------------------+----------------+----------------------------------------+
@ -1142,7 +1143,7 @@ has been heuristically detected and the originating hostname is one
that would raise suspicion. Effectively, the script attempts to
define a list of hosts from which you would never want to see SSH
traffic originating, like DNS servers, mail servers, etc. To
accomplish this, the script adhere's to the seperation of detection
accomplish this, the script adheres to the separation of detection
and reporting by detecting a behavior and raising a notice. Whether
or not that notice is acted upon is decided by the local Notice
Policy, but the script attempts to supply as much information as
@ -1226,7 +1227,7 @@ Bro.
In the :doc:`/scripts/policy/protocols/ssl/expiring-certs` script
which identifies when SSL certificates are set to expire and raises
notices when it crosses a pre-defined threshold, the call to
notices when it crosses a predefined threshold, the call to
``NOTICE`` above also sets the ``$identifier`` entry by concatenating
the responder IP, port, and the hash of the certificate. The
selection of responder IP, port and certificate hash fits perfectly
@ -1262,7 +1263,7 @@ In short, there will be notice policy considerations where a broad
decision can be made based on the ``Notice::Type`` alone. To
facilitate these types of decisions, the Notice Framework supports
Notice Policy shortcuts. These shortcuts are implemented through the
means of a group of data structures that map specific, pre-defined
means of a group of data structures that map specific, predefined
details and actions to the effective name of a notice. Primarily
implemented as a set or table of enumerables of :bro:type:`Notice::Type`,
Notice Policy shortcuts can be placed as a single directive in your
@ -1308,5 +1309,3 @@ Notice::emailed_types set while the shortcut below alters the length
of time for which those notices will be suppressed.
.. btest-include:: ${DOC_ROOT}/scripting/framework_notice_shortcuts_02.bro