zeek/doc/scripting/basics.rst


.. _writing-scripts:

==========
The Basics
==========

Understanding Scripts
=====================

Zeek includes an event-driven scripting language that provides
the primary means for an organization to extend and customize Zeek's
functionality. Virtually all of the output generated by Zeek
is, in fact, generated by Zeek scripts.  It's almost easier to consider
Zeek to be an entity behind-the-scenes processing connections and
generating events while Zeek's scripting language is the medium through
which we mere mortals can achieve communication.  Zeek scripts
effectively notify Zeek that should there be an event of a type we
define, then let us have the information about the connection so we
can perform some function on it.  For example, the :file:`ssl.log` file is
generated by a Zeek script that walks the entire certificate chain and
issues notifications if any of the steps along the certificate chain
are invalid.  This entire process is setup by telling Zeek that should
it see a server or client issue an SSL ``HELLO`` message, we want to know
about the information about that connection.

It's often easiest to understand Zeek's scripting language by
looking at a complete script and breaking it down into its
identifiable components.  In this example, we'll take a look at how
Zeek checks the SHA1 hash of various files extracted from network traffic
against the `Team Cymru Malware hash registry
<http://www.team-cymru.org/Services/MHR/>`_.  Part of the Team Cymru Malware
Hash registry includes the ability to do a host lookup on a domain with the format
``<MALWARE_HASH>.malware.hash.cymru.com`` where ``<MALWARE_HASH>`` is the SHA1 hash of a file.
Team Cymru also populates the TXT record of their DNS responses with both a "first seen"
timestamp and a numerical "detection rate".  The important aspect to understand is Zeek already
generating hashes for files via the Files framework, but it is the
script :doc:`/scripts/policy/frameworks/files/detect-MHR.zeek`
that is responsible for generating the
appropriate DNS lookup, parsing the response, and generating a notice if appropriate.

.. code-block:: zeek
   :caption: detect-MHR.zeek

   ##! Detect file downloads that have hash values matching files in Team
   ##! Cymru's Malware Hash Registry (http://www.team-cymru.org/Services/MHR/).

   @load base/frameworks/files
   @load base/frameworks/notice
   @load frameworks/files/hash-all-files

   module TeamCymruMalwareHashRegistry;

   export {
       redef enum Notice::Type += {
           ## The hash value of a file transferred over HTTP matched in the
           ## malware hash registry.
           Match
       };

       ## File types to attempt matching against the Malware Hash Registry.
       option match_file_types = /application\/x-dosexec/ |
                                /application\/vnd.ms-cab-compressed/ |
                                /application\/pdf/ |
                                /application\/x-shockwave-flash/ |
                                /application\/x-java-applet/ |
                                /application\/jar/ |
                                /video\/mp4/;

       ## The Match notice has a sub message with a URL where you can get more
       ## information about the file. The %s will be replaced with the SHA-1
       ## hash of the file.
       option match_sub_url = "https://www.virustotal.com/en/search/?query=%s";

       ## The malware hash registry runs each malware sample through several
       ## A/V engines.  Team Cymru returns a percentage to indicate how
       ## many A/V engines flagged the sample as malicious. This threshold
       ## allows you to require a minimum detection rate.
       option notice_threshold = 10;
   }

   function do_mhr_lookup(hash: string, fi: Notice::FileInfo)
       {
       local hash_domain = fmt("%s.malware.hash.cymru.com", hash);

       when ( local MHR_result = lookup_hostname_txt(hash_domain) )
           {
           # Data is returned as "<dateFirstDetected> <detectionRate>"
           local MHR_answer = split_string1(MHR_result, / /);

           if ( |MHR_answer| == 2 )
               {
               local mhr_detect_rate = to_count(MHR_answer[1]);

               if ( mhr_detect_rate >= notice_threshold )
                   {
                   local mhr_first_detected = double_to_time(to_double(MHR_answer[0]));
                   local readable_first_detected = strftime("%Y-%m-%d %H:%M:%S", mhr_first_detected);
                   local message = fmt("Malware Hash Registry Detection rate: %d%%  Last seen: %s", mhr_detect_rate, readable_first_detected);
                   local virustotal_url = fmt(match_sub_url, hash);
                   # We don't have the full fa_file record here in order to
                   # avoid the "when" statement cloning it (expensive!).
                   local n: Notice::Info = Notice::Info($note=Match, $msg=message, $sub=virustotal_url);
                   Notice::populate_file_info2(fi, n);
                   NOTICE(n);
                   }
               }
           }
       }

   event file_hash(f: fa_file, kind: string, hash: string)
       {
       if ( kind == "sha1" && f?$info && f$info?$mime_type &&
            match_file_types in f$info$mime_type )
           do_mhr_lookup(hash, Notice::create_file_info(f));
       }

Visually, there are three distinct sections of the script.  First, there is a base
level with no indentation where libraries are included in the script through ``@load``
and a namespace is defined with ``module``.  This is followed by an indented and formatted
section explaining the custom variables being provided (``export``) as part of the script's namespace.
Finally there is a second indented and formatted section describing the instructions to take for a
specific event (``event file_hash``).  Don't get discouraged if you don't
understand every section of the script; we'll cover the basics of the
script and much more in following sections.

.. code-block:: zeek
   :caption: detect-MHR.zeek

   @load base/frameworks/files
   @load base/frameworks/notice
   @load frameworks/files/hash-all-files

The first part of the script consists of ``@load`` directives which
process the ``__load__.zeek`` script in the
respective directories being loaded.  The ``@load`` directives are
often considered good practice or even just good manners when writing
Zeek scripts to make sure they can be used on their own. While it's unlikely that in a
full production deployment of Zeek these additional resources wouldn't
already be loaded, it's not a bad habit to try to get into as you get
more experienced with Zeek scripting.  If you're just starting out,
this level of granularity might not be entirely necessary.  The ``@load`` directives
are ensuring the Files framework, the Notice framework and the script to hash all files has
been loaded by Zeek.

.. code-block:: zeek
   :caption: detect-MHR.zeek

   export {
       redef enum Notice::Type += {
           ## The hash value of a file transferred over HTTP matched in the
           ## malware hash registry.
           Match
       };

       ## File types to attempt matching against the Malware Hash Registry.
       option match_file_types = /application\/x-dosexec/ |
                                /application\/vnd.ms-cab-compressed/ |
                                /application\/pdf/ |
                                /application\/x-shockwave-flash/ |
                                /application\/x-java-applet/ |
                                /application\/jar/ |
                                /video\/mp4/;

       ## The Match notice has a sub message with a URL where you can get more
       ## information about the file. The %s will be replaced with the SHA-1
       ## hash of the file.
       option match_sub_url = "https://www.virustotal.com/en/search/?query=%s";

       ## The malware hash registry runs each malware sample through several
       ## A/V engines.  Team Cymru returns a percentage to indicate how
       ## many A/V engines flagged the sample as malicious. This threshold
       ## allows you to require a minimum detection rate.
       option notice_threshold = 10;
   }

The export section redefines an enumerable constant that describes the type of
notice we will generate with the Notice framework.  Zeek allows for
re-definable constants, which at first, might seem counter-intuitive.  We'll
get more in-depth with constants in a later chapter, for now, think of them as
variables that can only be altered before Zeek starts running.  By extending
the :zeek:see:`Notice::Type` as shown, this allows for the :zeek:see:`NOTICE`
function to generate notices with a ``$note`` field set as
``TeamCymruMalwareHashRegistry::Match``.  Notices allow Zeek to generate some
kind of extra notification beyond its default log types.  Often times, this
extra notification comes in the form of an email generated and sent to a
preconfigured address, but can be altered depending on the needs of the
deployment.  The export section is finished off with the definition of a few
constants that list the kind of files we want to match against and the minimum
percentage of detection threshold in which we are interested.

Up until this point, the script has merely done some basic setup.  With
the next section, the script starts to define instructions to take in
a given event.

.. code-block:: zeek
   :caption: detect-MHR.zeek

   function do_mhr_lookup(hash: string, fi: Notice::FileInfo)
       {
       local hash_domain = fmt("%s.malware.hash.cymru.com", hash);

       when ( local MHR_result = lookup_hostname_txt(hash_domain) )
           {
           # Data is returned as "<dateFirstDetected> <detectionRate>"
           local MHR_answer = split_string1(MHR_result, / /);

           if ( |MHR_answer| == 2 )
               {
               local mhr_detect_rate = to_count(MHR_answer[1]);

               if ( mhr_detect_rate >= notice_threshold )
                   {
                   local mhr_first_detected = double_to_time(to_double(MHR_answer[0]));
                   local readable_first_detected = strftime("%Y-%m-%d %H:%M:%S", mhr_first_detected);
                   local message = fmt("Malware Hash Registry Detection rate: %d%%  Last seen: %s", mhr_detect_rate, readable_first_detected);
                   local virustotal_url = fmt(match_sub_url, hash);
                   # We don't have the full fa_file record here in order to
                   # avoid the "when" statement cloning it (expensive!).
                   local n: Notice::Info = Notice::Info($note=Match, $msg=message, $sub=virustotal_url);
                   Notice::populate_file_info2(fi, n);
                   NOTICE(n);
                   }
               }
           }
       }

   event file_hash(f: fa_file, kind: string, hash: string)
       {
       if ( kind == "sha1" && f?$info && f$info?$mime_type &&
            match_file_types in f$info$mime_type )
           do_mhr_lookup(hash, Notice::create_file_info(f));
       }

The workhorse of the script is contained in the event handler for
``file_hash``.  The :zeek:see:`file_hash` event allows scripts to access
the information associated with a file for which Zeek's file analysis
framework has generated a hash.  The event handler is passed the
file itself as ``f``, the type of digest algorithm used as ``kind``
and the hash generated as ``hash``.

In the ``file_hash`` event handler, there is an ``if`` statement that is used
to check for the correct type of hash, in this case
a SHA1 hash.  It also checks for a mime type we've defined as
being of interest as defined in the constant ``match_file_types``.
The comparison is made against the expression ``f$info$mime_type``, which uses
the ``$`` dereference operator to check the value ``mime_type``
inside the variable ``f$info``.  If the entire expression evaluates to true,
then a helper function is called to do the rest of the work.  In that
function, a local variable is defined to hold a string comprised of
the SHA1 hash concatenated with ``.malware.hash.cymru.com``; this
value will be the domain queried in the malware hash registry.

The rest of the script is contained within a ``when`` block.  In
short, a ``when`` block is used when Zeek needs to perform asynchronous
actions, such as a DNS lookup, to ensure that performance isn't effected.
The ``when`` block performs a DNS TXT lookup and stores the result
in the local variable ``MHR_result``.  Effectively, processing for
this event continues and upon receipt of the values returned by
:zeek:id:`lookup_hostname_txt`, the ``when`` block is executed.  The
``when`` block splits the string returned into a portion for the date on which
the malware was first detected, and the detection rate, by splitting the text
on space and storing the values returned in a local table variable.
In the ``do_mhr_lookup`` function, if the table
returned by ``split1`` has two entries, indicating a successful split, we
store the detection
date in ``mhr_first_detected`` and the rate in ``mhr_detect_rate``
using the appropriate conversion functions.  From this point on, Zeek knows it has seen a file
transmitted which has a hash that has been seen by the Team Cymru Malware Hash Registry, the rest
of the script is dedicated to producing a notice.

The detection time is processed into a string representation and stored in
``readable_first_detected``.  The script then compares the detection rate
against the ``notice_threshold`` that was defined earlier.  If the
detection rate is high enough, the script creates a concise description
of the notice and stores it in the ``message`` variable.  It also
creates a possible URL to check the sample against
``virustotal.com``'s database, and makes the call to :zeek:id:`NOTICE`
to hand the relevant information off to the Notice framework.

In approximately a few dozen lines of code, Zeek provides an amazing
utility that would be incredibly difficult to implement and deploy
with other products.  In truth, claiming that Zeek does this in such a small
number of lines is a misdirection; there is a truly massive number of things
going on behind-the-scenes in Zeek, but it is the inclusion of the
scripting language that gives analysts access to those underlying
layers in a succinct and well defined manner.

The Event Queue and Event Handlers
==================================

Zeek's scripting language is event driven which is a gear change from
the majority of scripting languages with which most users will have
previous experience.  Scripting in Zeek depends on handling the events
generated by Zeek as it processes network traffic, altering the state
of data structures through those events, and making decisions on the
information provided.  This approach to scripting can often cause
confusion to users who come to Zeek from a procedural or functional
language, but once the initial shock wears off it becomes more clear
with each exposure.

Zeek's core acts to place events into an ordered "event queue",
allowing event handlers to process them on a first-come-first-serve
basis.  In effect, this is Zeek's core functionality as without the
scripts written to perform discrete actions on events, there would be
little to no usable output.  As such, a basic understanding of the
event queue, the events being generated, and the way in which event
handlers process those events is a basis for not only learning to
write scripts for Zeek but for understanding Zeek itself.

Gaining familiarity with the specific events generated by Zeek is a big
step towards building a mind set for working with Zeek scripts.  The
majority of events generated by Zeek are defined in the
built-in-function (``*.bif``) files which also act as the basis for
online event documentation.  These in-line comments are compiled into
an online documentation system using Zeekygen.  Whether starting a
script from scratch or reading and maintaining someone else's script,
having the built-in event definitions available is an excellent
resource to have on hand.  For the 2.0 release the Zeek developers put
significant effort into organization and documentation of every event.
This effort resulted in built-in-function files organized such that
each entry contains a descriptive event name, the arguments passed to
the event, and a concise explanation of the functions use.

.. code-block:: zeek

   ## Generated for DNS requests. For requests with multiple queries, this event
   ## is raised once for each.
   ##
   ## See `Wikipedia <http://en.wikipedia.org/wiki/Domain_Name_System>`__ for more
   ## information about the DNS protocol. Zeek analyzes both UDP and TCP DNS
   ## sessions.
   ##
   ## c: The connection, which may be UDP or TCP depending on the type of the
   ##    transport-layer session being analyzed.
   ##
   ## msg: The parsed DNS message header.
   ##
   ## query: The queried name.
   ##
   ## qtype: The queried resource record type.
   ##
   ## qclass: The queried resource record class.
   ##
   ## .. zeek:see:: dns_AAAA_reply dns_A_reply dns_CNAME_reply dns_EDNS_addl
   ##    dns_HINFO_reply dns_MX_reply dns_NS_reply dns_PTR_reply dns_SOA_reply
   ##    dns_SRV_reply dns_TSIG_addl dns_TXT_reply dns_WKS_reply dns_end
   ##    dns_full_request dns_mapping_altered dns_mapping_lost_name dns_mapping_new_name
   ##    dns_mapping_unverified dns_mapping_valid dns_message dns_query_reply
   ##    dns_rejected non_dns_request dns_max_queries dns_session_timeout dns_skip_addl
   ##    dns_skip_all_addl dns_skip_all_auth dns_skip_auth
   event dns_request%(c: connection, msg: dns_msg, query: string, qtype: count, qclass: count%);

Above is a segment of the documentation for the event
:zeek:id:`dns_request` (and the preceding link points to the
documentation generated out of that).  It's organized such that the
documentation, commentary, and list of arguments precede the actual
event definition used by Zeek.  As Zeek detects DNS requests being
issued by an originator, it issues this event and any number of
scripts then have access to the data Zeek passes along with the event.
In this example, Zeek passes not only the message, the query, query
type and query class for the DNS request, but also a record used
for the connection itself.

.. _writing-scripts-connection-record:

The Connection Record Data Type
===============================

Of all the events defined by Zeek, an overwhelmingly large number of
them are passed the :zeek:type:`connection` record data type, in effect,
making it the backbone of many scripting solutions.  The connection
record itself, as we will see in a moment, is a mass of nested data
types used to track state on a connection through its lifetime.  Let's
walk through the process of selecting an appropriate event, generating
some output to standard out and dissecting the connection record so as
to get an overview of it.  We will cover data types in more detail
later.

While Zeek is capable of packet level processing, its strengths lay in
the context of a connection between an *originator* and a *responder*.

.. note::

   Zeek's notions of originator and responder aim to capture the
   natural roles of connection endpoints given the protocol
   information observed. They differ from the packet-level concepts of
   source and destination, as well as from higher-level abstractions
   such as client and server.

   Zeek's protocol analyzers determine originator and responder when
   establishing connection state, with the sender of the initial
   packet usually becoming the originator and the recipient becoming
   the responder. However, analyzers may subsequently *flip* the roles
   if protocol semantics suggest it. For example, in the presence of
   packet loss the first observed packet in a DNS transaction may
   indicate that it is in fact the response to a missing query. Zeek's
   DNS analyzer will flip the endpoint roles, making the sender of
   this packet the connection's responder.

Zeek defines events for the primary parts of the connection life-cycle,
such as the following:

* :zeek:see:`new_connection`
* :zeek:see:`connection_timeout`
* :zeek:see:`connection_state_remove`

Of the events listed, the event that will give us the best insight
into the connection record data type will be
:zeek:id:`connection_state_remove` .  As detailed in the in-line
documentation, Zeek generates this event just before it decides to
remove this event from memory, effectively forgetting about it.  Let's
take a look at a simple example script, that will output the connection record
for a single connection.

.. literalinclude:: connection_record_01.zeek
   :caption:
   :language: zeek
   :linenos:
   :tab-width: 4

Again, we start with ``@load``, this time importing the
:doc:`base/protocols/conn </scripts/base/protocols/conn/index>` scripts which
supply the tracking and logging of general information and state of
connections.  We handle the :zeek:id:`connection_state_remove` event and simply
print the contents of the argument passed to it.  For this example we're
going to run Zeek in "bare mode" which loads only the minimum number of
scripts to retain operability and leaves the burden of loading
required scripts to the script being run.  While bare mode is a low
level functionality incorporated into Zeek, in this case, we're going
to use it to demonstrate how different features of Zeek add more and
more layers of information about a connection.  This will give us a
chance to see the contents of the connection record without it being
overly populated.

.. code-block:: console

   $ zeek -b -r http/get.trace connection_record_01.zeek
   [id=[orig_h=141.142.228.5, orig_p=59856/tcp, resp_h=192.150.187.43, resp_p=80/tcp], orig=[size=136, state=5, num_pkts=7, num_bytes_ip=512, flow_label=0, l2_addr=c8:bc:c8:96:d2:a0], resp=[size=5007, state=5, num_pkts=7, num_bytes_ip=5379, flow_label=0, l2_addr=00:10:db:88:d2:ef], start_time=1362692526.869344, duration=0.211484, service={

   }, history=ShADadFf, uid=CHhAvVGS1DHFjwGM9, tunnel=<uninitialized>, vlan=<uninitialized>, inner_vlan=<uninitialized>, conn=[ts=1362692526.869344, uid=CHhAvVGS1DHFjwGM9, id=[orig_h=141.142.228.5, orig_p=59856/tcp, resp_h=192.150.187.43, resp_p=80/tcp], proto=tcp, service=<uninitialized>, duration=0.211484, orig_bytes=136, resp_bytes=5007, conn_state=SF, local_orig=<uninitialized>, local_resp=<uninitialized>, missed_bytes=0, history=ShADadFf, orig_pkts=7, orig_ip_bytes=512, resp_pkts=7, resp_ip_bytes=5379, tunnel_parents=<uninitialized>], extract_orig=F, extract_resp=F, thresholds=<uninitialized>]

As you can see from the output, the connection record is something of
a jumble when printed on its own.  Regularly taking a peek at a
populated connection record helps to understand the relationship
between its fields as well as allowing an opportunity to build a frame
of reference for accessing data in a script.

Zeek makes extensive use of nested data structures to store state and
information gleaned from the analysis of a connection as a complete
unit.  To break down this collection of information, you will have to
make use of Zeek's field delimiter ``$``.  For example, the
originating host is referenced by ``c$id$orig_h`` which if given a
narrative relates to ``orig_h`` which is a member of ``id`` which is
a member of the data structure referred to as ``c`` that was passed
into the event handler. Given that the responder port
``c$id$resp_p`` is ``80/tcp``, it's likely that Zeek's base HTTP scripts
can further populate the connection record.  Let's load the
``base/protocols/http`` scripts and check the output of our script.

Zeek uses the dollar sign as its field delimiter and a direct
correlation exists between the output of the connection record and the
proper format of a dereferenced variable in scripts. In the output of
the script above, groups of information are collected between
brackets, which would correspond to the ``$``-delimiter in a Zeek script.

.. literalinclude:: connection_record_02.zeek
   :caption:
   :language: zeek
   :linenos:
   :tab-width: 4

.. code-block:: console

   $ zeek -b -r http/get.trace connection_record_02.zeek
   [id=[orig_h=141.142.228.5, orig_p=59856/tcp, resp_h=192.150.187.43, resp_p=80/tcp], orig=[size=136, state=5, num_pkts=7, num_bytes_ip=512, flow_label=0, l2_addr=c8:bc:c8:96:d2:a0], resp=[size=5007, state=5, num_pkts=7, num_bytes_ip=5379, flow_label=0, l2_addr=00:10:db:88:d2:ef], start_time=1362692526.869344, duration=0.211484, service={

   }, history=ShADadFf, uid=CHhAvVGS1DHFjwGM9, tunnel=<uninitialized>, vlan=<uninitialized>, inner_vlan=<uninitialized>, conn=[ts=1362692526.869344, uid=CHhAvVGS1DHFjwGM9, id=[orig_h=141.142.228.5, orig_p=59856/tcp, resp_h=192.150.187.43, resp_p=80/tcp], proto=tcp, service=<uninitialized>, duration=0.211484, orig_bytes=136, resp_bytes=5007, conn_state=SF, local_orig=<uninitialized>, local_resp=<uninitialized>, missed_bytes=0, history=ShADadFf, orig_pkts=7, orig_ip_bytes=512, resp_pkts=7, resp_ip_bytes=5379, tunnel_parents=<uninitialized>], extract_orig=F, extract_resp=F, thresholds=<uninitialized>, http=[ts=1362692526.939527, uid=CHhAvVGS1DHFjwGM9, id=[orig_h=141.142.228.5, orig_p=59856/tcp, resp_h=192.150.187.43, resp_p=80/tcp], trans_depth=1, method=GET, host=bro.org, uri=/download/CHANGES.bro-aux.txt, referrer=<uninitialized>, version=1.1, user_agent=Wget/1.14 (darwin12.2.0), request_body_len=0, response_body_len=4705, status_code=200, status_msg=OK, info_code=<uninitialized>, info_msg=<uninitialized>, tags={

   }, username=<uninitialized>, password=<uninitialized>, capture_password=F, proxied=<uninitialized>, range_request=F, orig_fuids=<uninitialized>, orig_filenames=<uninitialized>, orig_mime_types=<uninitialized>, resp_fuids=[FakNcS1Jfe01uljb3], resp_filenames=<uninitialized>, resp_mime_types=[text/plain], current_entity=<uninitialized>, orig_mime_depth=1, resp_mime_depth=1], http_state=[pending={

   }, current_request=1, current_response=1, trans_depth=1]]

The addition of the ``base/protocols/http`` scripts populates the
``http=[]`` member of the connection record.  While Zeek is doing a
massive amount of work in the background, it is in what is commonly
called "scriptland" that details are being refined and decisions
being made. Were we to continue running in "bare mode" we could slowly
keep adding infrastructure through ``@load`` statements.  For example,
were we to ``@load base/frameworks/logging``, Zeek would generate a
:file:`conn.log` and :file:`http.log` for us in the current working directory.
As mentioned above, including the appropriate ``@load`` statements is
not only good practice, but can also help to indicate which
functionalities are being used in a script.  Take a second to run the
script without the ``-b`` flag and check the output when all of Zeek's
functionality is applied to the trace file.

Data Types and Data Structures
==============================

Scope
-----

Before embarking on a exploration of Zeek's native data types and data
structures, it's important to have a good grasp of the different
levels of scope available in Zeek and the appropriate times to use them
within a script.  The declarations of variables in Zeek come in two
forms.  Variables can be declared with or without a definition in the
form ``SCOPE name: TYPE`` or ``SCOPE name = EXPRESSION`` respectively;
each of which produce the same result if ``EXPRESSION`` evaluates to the
same type as ``TYPE``.  The decision as to which type of declaration to
use is likely to be dictated by personal preference and readability.

.. literalinclude:: data_type_declaration.zeek
   :caption:
   :language: zeek
   :linenos:
   :tab-width: 4

Global Variables
~~~~~~~~~~~~~~~~

A global variable is used when the state of variable needs to be
tracked, not surprisingly, globally.  While there are some caveats,
when a script declares a variable using the global scope, that script
is granting access to that variable from other scripts.  However, when
a script uses the ``module`` keyword to give the script a namespace,
more care must be given to the declaration of globals to ensure the
intended result.  When a global is declared in a script with a
namespace there are two possible outcomes.  First, the variable is
available only within the context of the namespace.  In this scenario,
other scripts within the same namespace will have access to the
variable declared while scripts using a different namespace or no
namespace altogether will not have access to the variable.
Alternatively, if a global variable is declared within an ``export { ... }``
block that variable is available to any other script through the
naming convention of ``<module name>::<variable name>``, i.e. the variable
needs to be "scoped" by the name of the module in which it was declared.

When the ``module`` keyword is used in a script, the variables declared
are said to be in that module's "namespace".  Where as a global variable
can be accessed by its name alone when it is not declared within a
module, a global variable declared within a module must be exported and
then accessed via ``<module name>::<variable name>``.

Constants
~~~~~~~~~

Zeek also makes use of constants, which are denoted by the ``const``
keyword.  Unlike globals, constants can only be set or altered at
parse time if the ``&redef`` attribute has been used.  Afterwards (in
runtime) the constants are unalterable.  In most cases, re-definable
constants are used in Zeek scripts as containers for configuration
options.  For example, the configuration option to log passwords
decrypted from HTTP streams is stored in
:zeek:see:`HTTP::default_capture_password` as shown in the stripped down
excerpt from :doc:`/scripts/base/protocols/http/main.zeek` below.

.. literalinclude:: http_main.zeek
   :caption:
   :language: zeek
   :linenos:
   :tab-width: 4

Because the constant was declared with the ``&redef`` attribute, if we
needed to turn this option on globally, we could do so by adding the
following line to our ``site/local.zeek`` file before firing up Zeek.

.. literalinclude:: data_type_const_simple.zeek
   :caption:
   :language: zeek
   :linenos:
   :tab-width: 4

While the idea of a re-definable constant might be odd, the constraint
that constants can only be altered at parse-time remains even with the
``&redef`` attribute.  In the code snippet below, a table of strings
indexed by ports is declared as a constant before two values are added
to the table through ``redef`` statements.  The table is then printed
in a :zeek:id:`zeek_init` event.  Were we to try to alter the table in
an event handler, Zeek would notify the user of an error and the script
would fail.

.. literalinclude:: data_type_const.zeek
   :caption:
   :language: zeek
   :linenos:
   :tab-width: 4

.. code-block:: console

   $ zeek -b data_type_const.zeek
   {
   [80/tcp] = WWW,
   [6666/tcp] = IRC
   }

Local Variables
~~~~~~~~~~~~~~~

Whereas globals and constants are widely available in scriptland
through various means, when a variable is defined with a local scope,
its availability is restricted to the body of the event or function in
which it was declared.  Local variables tend to be used for values
that are only needed within a specific scope and once the processing
of a script passes beyond that scope and no longer used, the variable
is deleted. Zeek maintains names of locals separately from globally
visible ones, an example of which is illustrated below.

.. literalinclude:: data_type_local.zeek
   :caption:
   :language: zeek
   :linenos:
   :tab-width: 4

The script executes the event handler :zeek:id:`zeek_init` which in turn calls
the function ``add_two(i: count)`` with an argument of ``10``.  Once Zeek
enters the ``add_two`` function, it provisions a locally scoped
variable called ``added_two`` to hold the value of ``i+2``, in this
case, ``12``.  The ``add_two`` function then prints the value of the
``added_two`` variable and returns its value to the ``zeek_init`` event
handler.  At this point, the variable ``added_two`` has fallen out of
scope and no longer exists while the value ``12`` still in use and
stored in the locally scoped variable ``test``.  When Zeek finishes
processing the ``zeek_init`` function, the variable called ``test`` is
no longer in scope and, since there exist no other references to the
value ``12``, the value is also deleted.


Data Structures
---------------

It's difficult to talk about Zeek's data types in a practical manner
without first covering the data structures available in Zeek.  Some of
the more interesting characteristics of data types are revealed when
used inside of a data structure, but given that data structures are
made up of data types, it devolves rather quickly into a
"chicken-and-egg" problem.  As such, we'll introduce data types from
a bird's eye view before diving into data structures and from there a
more complete exploration of data types.

The table below shows the atomic types used in Zeek, of which the
first four should seem familiar if you have some scripting experience,
while the remaining six are less common in other languages. It should
come as no surprise that a scripting language for a Network Security
Monitoring platform has a fairly robust set of network-centric data
types and taking note of them here may well save you a late night of
reinventing the wheel.

.. list-table::
  :header-rows: 1

  * - Data Type
    - Description

  * - :zeek:see:`int`
    - 64 bit signed integer

  * - :zeek:see:`count`
    - 64 bit unsigned integer

  * - :zeek:see:`double`
    - double precision floating precision

  * - :zeek:see:`bool`
    - boolean (T/F)

  * - :zeek:see:`addr`
    - IP address, IPv4 and IPv6

  * - :zeek:see:`port`
    - transport layer port

  * - :zeek:see:`subnet`
    - CIDR subnet mask

  * - :zeek:see:`time`
    - absolute epoch time

  * - :zeek:see:`interval`
    - a time interval

  * - :zeek:see:`pattern`
    - regular expression

Sets
~~~~

Sets in Zeek are used to store unique elements of the same data
type.  In essence, you can think of them as "a unique set of integers"
or "a unique set of IP addresses".  While the declaration of a set may
differ based on the data type being collected, the set will always
contain unique elements and the elements in the set will always be of
the same data type.  Such requirements make the set data type perfect
for information that is already naturally unique such as ports or IP
addresses.  The code snippet below shows both an explicit and implicit
declaration of a locally scoped set.

.. literalinclude:: data_struct_set_declaration.zeek
   :caption:
   :language: zeek
   :linenos:
   :lines: 1-4,22
   :tab-width: 4

As you can see, sets are declared using the format ``SCOPE var_name:
set[TYPE]``.  Adding and removing elements in a set is achieved using
the ``add`` and ``delete`` statements.  Once you have elements inserted into
the set, it's likely that you'll need to either iterate over that set
or test for membership within the set, both of which are covered by
the ``in`` operator.  In the case of iterating over a set, combining the
``for`` statement and the ``in`` operator will allow you to sequentially
process each element of the set as seen below.

.. literalinclude:: data_struct_set_declaration.zeek
   :caption:
   :language: zeek
   :linenos:
   :lines: 17-21
   :lineno-start: 17
   :tab-width: 4

Here, the ``for`` statement loops over the contents of the set storing
each element in the temporary variable ``i``.  With each iteration of
the ``for`` loop, the next element is chosen.  Since sets are not an
ordered data type, you cannot guarantee the order of the elements as
the ``for`` loop processes.

To test for membership in a set the ``in`` statement can be combined
with an ``if`` statement to return a true or false value.  If the
exact element in the condition is already in the set, the condition
returns true and the body executes.  The ``in`` statement can also be
negated by the ``!`` operator to create the inverse of the condition.
While we could rewrite the corresponding line below as ``if ( !(
587/tcp in ssl_ports ))`` try to avoid using this construct; instead,
negate the in operator itself.  While the functionality is the same,
using the ``!in`` is more efficient as well as a more natural construct
which will aid in the readability of your script.

.. literalinclude:: data_struct_set_declaration.zeek
   :caption:
   :language: zeek
   :linenos:
   :lines: 13-15
   :lineno-start: 13
   :tab-width: 4

You can see the full script and its output below.

.. literalinclude:: data_struct_set_declaration.zeek
   :caption:
   :language: zeek
   :linenos:
   :tab-width: 4

.. code-block:: console

   $ zeek data_struct_set_declaration.zeek
   SSL Port: 22/tcp
   SSL Port: 443/tcp
   SSL Port: 587/tcp
   SSL Port: 993/tcp
   Non-SSL Port: 80/tcp
   Non-SSL Port: 25/tcp
   Non-SSL Port: 143/tcp
   Non-SSL Port: 23/tcp

Tables
~~~~~~

A table in Zeek is a mapping of a key to a value or yield.  While the
values don't have to be unique, each key in the table must be unique
to preserve a one-to-one mapping of keys to values.

.. literalinclude:: data_struct_table_declaration.zeek
   :caption:
   :language: zeek
   :linenos:
   :tab-width: 4

.. code-block:: console

   $ zeek data_struct_table_declaration.zeek
   Service Name:  SSH - Common Port: 22/tcp
   Service Name:  HTTPS - Common Port: 443/tcp
   Service Name:  SMTPS - Common Port: 587/tcp
   Service Name:  IMAPS - Common Port: 993/tcp

In this example,
we've compiled a table of SSL-enabled services and their common
ports.  The explicit declaration and constructor for the table are on
two different lines and lay out the data types of the keys (strings) and the
data types of the values (ports) and then fill in some sample key and
value pairs.  You can also use a table accessor to insert one
key-value pair into the table.  When using the ``in``
operator on a table, you are effectively working with the keys of the table.
In the case of an ``if`` statement, the ``in`` operator will check for
membership among the set of keys and return a true or false value.
The example shows how to check if ``SMTPS`` is not in the set
of keys for the ``ssl_services`` table and if the condition holds true,
we add the key-value pair to the table.  Finally, the example shows how
to use a ``for`` statement to iterate over each key currently in the table.

Simple examples aside, tables can become extremely complex as the keys
and values for the table become more intricate.  Tables can have keys
comprised of multiple data types and even a series of elements called
a "tuple".  The flexibility gained with the use of complex tables in
Zeek implies a cost in complexity for the person writing the scripts
but pays off in effectiveness given the power of Zeek as a network
security platform.

.. literalinclude:: data_struct_table_complex.zeek
   :caption:
   :language: zeek
   :linenos:
   :tab-width: 4

.. code-block:: console

   $ zeek -b data_struct_table_complex.zeek
   Harakiri was released in 1962 by Shochiku Eiga studios, directed by Masaki Kobayashi and starring Tatsuya Nakadai
   Goyokin was released in 1969 by Fuji studios, directed by Hideo Gosha and starring Tatsuya Nakadai
   Tasogare Seibei was released in 2002 by Eisei Gekijo studios, directed by Yoji Yamada and starring Hiroyuki Sanada
   Kiru was released in 1968 by Toho studios, directed by Kihachi Okamoto and starring Tatsuya Nakadai

This script shows a sample table of strings indexed by two
strings, a count, and a final string.  With a tuple acting as an
aggregate key, the order is important as a change in order would
result in a new key.  Here, we're using the table to track the
director, studio, year of release, and lead actor in a series of
samurai flicks.

In the case of the ``for`` statement above, iteration is done over all
parts of the key. When not all parts of a key are needed within the ``for``
loop's body, these can be ignored by using the blank identifier ``_``
instead of a variable.
It's important to note, however, that the structure of the key needs to
be reflected: All parts of the key need to be captured within the brackets
by a variable or the blank identifier.
As a special case, a single blank identifier allows to ignore the whole key.
In the previous example, we need squared brackets surrounding four temporary
variables to act as a collection for our iteration. While this
is a contrived example, we could easily have had keys containing IP addresses
(``addr``), ports (``port``) and even a ``string`` calculated as the result
of a reverse hostname lookup.

The example below continues with the ``samurai_flicks`` table and shows usage
of the blank identifier in combination with key-value iteration.
Using key-value iteration short-cuts the table access to lookup the value as
it provides the respective entry's value directly in addition to the key.

First, iteration is done by capturing the directors and movie names and
ignoring all other elements of the key. Second, the whole key is ignored
and only movie names used.

.. literalinclude:: data_struct_table_complex_blank_value.zeek
   :caption:
   :language: zeek
   :linenos:
   :tab-width: 4

.. code-block:: console

   $ zeek data_struct_table_complex_blank_value.zeek
   Kiru was directed by Kihachi Okamoto
   Harakiri was directed by Masaki Kobayashi
   Tasogare Seibei was directed by Yoji Yamada
   Goyokin was directed by Hideo Gosha
   Kiru is a movie
   Harakiri is a movie
   Tasogare Seibei is a movie
   Goyokin is a movie


Vectors
~~~~~~~

If you're coming to Zeek with a programming background, you may or may
not be familiar with a vector data type depending on your language of
choice.  On the surface, vectors perform much of the same
functionality as associative arrays with unsigned integers as their
indices. They are however more efficient than that and they allow for
ordered access. As such any time you need to sequentially store data of the
same type, in Zeek you should reach for a vector.  Vectors are a
collection of objects, all of which are of the same data type, to
which elements can be dynamically added or removed.  Since Vectors use
contiguous storage for their elements, the contents of a vector can be
accessed through a zero-indexed numerical offset.

The format for the declaration of a Vector follows the pattern of
other declarations, namely, ``SCOPE v: vector of T`` where ``v`` is
the name of your vector, and ``T`` is the data type of its members.
For example, the following snippet shows an explicit and implicit
declaration of two locally scoped vectors.  The script populates the
first vector by inserting values at the end; it does that by placing
the vector name between two vertical pipes to get the vector's current
length before printing the contents of both Vectors and their current
lengths.

.. literalinclude:: data_struct_vector_declaration.zeek
   :caption:
   :language: zeek
   :linenos:
   :tab-width: 4

.. code-block:: console

   $ zeek data_struct_vector_declaration.zeek
   contents of v1: [1, 2, 3, 4]
   length of v1: 4
   contents of v2: [1, 2, 3, 4]
   length of v2: 4

In a lot of cases, storing elements in a vector is simply a precursor
to then iterating over them.  Iterating over a vector is easy with the
``for`` keyword.  The sample below iterates over a vector of IP
addresses and for each IP address, masks that address with 18 bits.
The ``for`` keyword is used to generate a locally scoped variable
called ``i`` which will hold the index of the current element in the
vector. Using ``i`` as an index to addr_vector we can access the
current item in the vector with ``addr_vector[i]``.

.. literalinclude:: data_struct_vector_iter.zeek
   :caption:
   :language: zeek
   :linenos:
   :tab-width: 4

.. code-block:: console

   $ zeek -b data_struct_vector_iter.zeek
   1.2.0.0/18
   2.3.0.0/18
   3.4.0.0/18

Providing a value variable to the ``for`` loop allows skipping the extra
index operation. As the index variable is now is unused, the script below
uses ``_``, the blank identifier, to ignore it. This script is semantically
equivalent to the previous one, but does direct value iteration and therefore
potentially more performant for very large vectors.

.. literalinclude:: data_struct_vector_iter_value.zeek
   :caption:
   :language: zeek
   :linenos:
   :tab-width: 4

Data Types Revisited
--------------------

addr
~~~~

The ``addr``, or address, data type manages to cover a surprisingly
large amount of ground while remaining succinct.  IPv4, IPv6 and even
hostname constants are included in the ``addr`` data type.  While IPv4
addresses use the default dotted quad formatting, IPv6 addresses use
the RFC 2373 defined notation with the addition of squared brackets
wrapping the entire address.  When you venture into hostname
constants, Zeek performs a little slight of hand for the benefit of the
user; a hostname constant is, in fact, a set of addresses.  Zeek will
issue a DNS request when it sees a hostname constant in use and return
a set whose elements are the answers to the DNS request.  For example,
if you were to use ``local google = www.google.com;`` you would end up
with a locally scoped ``set[addr]`` with elements that represent the
current set of round robin DNS entries for google.  At first blush,
this seems trivial, but it is yet another example of Zeek making the
life of the common Zeek scripter a little easier through abstraction
applied in a practical manner. (Note however that these IP addresses
will never get updated during Zeek's processing, so often this
mechanism most useful for addresses that are expected to remain
static.).

port
~~~~

Transport layer port numbers in Zeek are represented in the format of
``<unsigned integer>/<protocol name>``, e.g., ``22/tcp`` or
``53/udp``.  Zeek supports TCP(``/tcp``), UDP(``/udp``),
ICMP(``/icmp``) and UNKNOWN(``/unknown``) as protocol designations.
While ICMP doesn't have an actual port, Zeek supports the concept of
ICMP "ports" by using the ICMP message type and ICMP message code as
the source and destination port respectively.  Ports can be compared
for equality using the ``==`` or ``!=`` operators and can even be
compared for ordering.  Zeek gives the protocol designations the
following "order": ``unknown`` < ``tcp`` < ``udp`` < ``icmp``. For
example ``65535/tcp`` is smaller than ``0/udp``.

subnet
~~~~~~

Zeek has full support for CIDR notation subnets as a base data type.
There is no need to manage the IP and the subnet mask as two separate
entities when you can provide the same information in CIDR notation in
your scripts.  The following example below uses a Zeek script to
determine if a series of IP addresses are within a set of subnets
using a 20 bit subnet mask.

.. literalinclude:: data_type_subnets.zeek
   :caption:
   :language: zeek
   :linenos:
   :tab-width: 4

Because this is a script that doesn't use any kind of network
analysis, we can handle the event :zeek:id:`zeek_init` which is always
generated by Zeek's core upon startup.  In the example script, two
locally scoped vectors are created to hold our lists of subnets and IP
addresses respectively.  Then, using a set of nested ``for`` loops, we
iterate over every subnet and every IP address and use an ``if``
statement to compare an IP address against a subnet using the ``in``
operator.  The ``in`` operator returns true if the IP address falls
within a given subnet based on the longest prefix match calculation.
For example, ``10.0.0.1 in 10.0.0.0/8`` would return true while
``192.168.2.1 in 192.168.1.0/24`` would return false.  When we run the
script, we get the output listing the IP address and the subnet in
which it belongs.

.. code-block:: console

   $ zeek data_type_subnets.zeek
   172.16.4.56 belongs to subnet 172.16.0.0/20
   172.16.47.254 belongs to subnet 172.16.32.0/20
   172.16.22.45 belongs to subnet 172.16.16.0/20
   172.16.1.1 belongs to subnet 172.16.0.0/20

time
~~~~

While there is currently no supported way to add a time constant in
Zeek, two built-in functions exist to make use of the ``time`` data
type.  Both :zeek:id:`network_time` and :zeek:id:`current_time` return a
``time`` data type but they each return a time based on different
criteria.  The ``current_time`` function returns what is called the
wall-clock time as defined by the operating system.  However,
``network_time`` returns the timestamp of the last packet processed
be it from a live data stream or saved packet capture.  Both functions
return the time in epoch seconds, meaning ``strftime`` must be used to
turn the output into human readable output.  The script below makes
use of the :zeek:id:`connection_established` event handler to generate text
every time a SYN/ACK packet is seen responding to a SYN packet as part
of a TCP handshake.  The text generated, is in the format of a
timestamp and an indication of who the originator and responder were.
We use the ``strftime`` format string of ``%Y-%m-%d %H:%M:%S`` to
produce a common date time formatted time stamp.

.. literalinclude:: data_type_time.zeek
   :caption:
   :language: zeek
   :linenos:
   :tab-width: 4

When the script is executed we get an output showing the details of
established connections.

.. code-block:: console

   $ zeek -r wikipedia.trace data_type_time.zeek
   2011/06/18 19:03:08:  New connection established from 141.142.220.118 to 208.80.152.118\x0a
   2011/06/18 19:03:08:  New connection established from 141.142.220.118 to 208.80.152.3\x0a
   2011/06/18 19:03:08:  New connection established from 141.142.220.118 to 208.80.152.3\x0a
   2011/06/18 19:03:08:  New connection established from 141.142.220.118 to 208.80.152.3\x0a
   2011/06/18 19:03:08:  New connection established from 141.142.220.118 to 208.80.152.3\x0a
   2011/06/18 19:03:08:  New connection established from 141.142.220.118 to 208.80.152.3\x0a
   2011/06/18 19:03:08:  New connection established from 141.142.220.118 to 208.80.152.3\x0a
   2011/06/18 19:03:08:  New connection established from 141.142.220.118 to 208.80.152.2\x0a
   2011/06/18 19:03:09:  New connection established from 141.142.220.235 to 173.192.163.128\x0a

interval
~~~~~~~~

The ``interval`` data type is another area in Zeek where rational
application of abstraction makes perfect sense.  As a data type, the
``interval`` represents a relative time as denoted by a numeric constant
followed by a unit of time.  For example, 2.2 seconds would be
``2.2sec`` and thirty-one days would be represented by ``31days``.
Zeek supports ``usec``, ``msec``, ``sec``, ``min``, ``hr``, or ``day`` which represent
microseconds, milliseconds, seconds, minutes, hours, and days
respectively.  In fact, the ``interval`` data type allows for a surprising
amount of variation in its definitions.  There can be a space between
the numeric constant or they can be crammed together like a temporal
portmanteau.  The time unit can be either singular or plural.  All of
this adds up to to the fact that both ``42hrs`` and ``42 hr`` are
perfectly valid and logically equivalent in Zeek.  The point, however,
is to increase the readability and thus maintainability of a script.
Intervals can even be negated, allowing for ``- 10mins`` to represent
"ten minutes ago".

Intervals in Zeek can have mathematical operations performed against
them allowing the user to perform addition, subtraction,
multiplication, division, and comparison operations. As well, Zeek
returns an ``interval`` when differencing two ``time`` values using the ``-``
operator.  The script below amends the script started in the section
above to include a time delta value printed along with the connection
establishment report.

.. literalinclude:: data_type_interval.zeek
   :caption:
   :language: zeek
   :linenos:
   :tab-width: 4

When we re-execute the script we see an additional line in the
output, displaying the time delta since the last fully established
connection.

.. code-block:: console

   $ zeek -r wikipedia.trace data_type_interval.zeek
   2011/06/18 19:03:08:  New connection established from 141.142.220.118 to 208.80.152.118
   2011/06/18 19:03:08:  New connection established from 141.142.220.118 to 208.80.152.3
        Time since last connection: 132.0 msecs 97.0 usecs
   2011/06/18 19:03:08:  New connection established from 141.142.220.118 to 208.80.152.3
        Time since last connection: 177.0 usecs
   2011/06/18 19:03:08:  New connection established from 141.142.220.118 to 208.80.152.3
        Time since last connection: 2.0 msecs 177.0 usecs
   2011/06/18 19:03:08:  New connection established from 141.142.220.118 to 208.80.152.3
        Time since last connection: 33.0 msecs 898.0 usecs
   2011/06/18 19:03:08:  New connection established from 141.142.220.118 to 208.80.152.3
        Time since last connection: 35.0 usecs
   2011/06/18 19:03:08:  New connection established from 141.142.220.118 to 208.80.152.3
        Time since last connection: 2.0 msecs 532.0 usecs
   2011/06/18 19:03:08:  New connection established from 141.142.220.118 to 208.80.152.2
        Time since last connection: 7.0 msecs 866.0 usecs
   2011/06/18 19:03:09:  New connection established from 141.142.220.235 to 173.192.163.128
        Time since last connection: 817.0 msecs 703.0 usecs


Pattern
~~~~~~~

Zeek has support for fast text searching operations using regular
expressions and even goes so far as to declare a native data type for
the patterns used in regular expressions.  A pattern constant is
created by enclosing text within the forward slash characters.  Zeek
supports syntax very similar to the Flex lexical analyzer syntax.  The
most common use of patterns in Zeek you are likely to come across is
embedded matching using the ``in`` operator.  Embedded matching
adheres to a strict format, requiring the regular expression or
pattern constant to be on the left side of the ``in`` operator and the
string against which it will be tested to be on the right.

.. literalinclude:: data_type_pattern_01.zeek
   :caption:
   :language: zeek
   :linenos:
   :tab-width: 4

In the sample above, two local variables are declared to hold our
sample sentence and regular expression.  Our regular expression in
this case will return true if the string contains either the word
``quick`` or the word ``lazy``. The ``if`` statement in the script uses
embedded matching and the ``in`` operator to check for the existence
of the pattern within the string.  If the statement resolves to true,
:zeek:id:`split_string` is called to break the string into separate pieces.
:zeek:id:`split_string` takes a string and a pattern as its arguments and returns a
vector of strings.  Each element of the vector represents
segments before and after any matches against the pattern but
excluding the actual matches.  In this case, our pattern matches
twice resulting in a vector with three elements.

.. code-block:: console

   $ zeek data_type_pattern_01.zeek
   The
    brown fox jumps over the
    dog.

Patterns can also be used to compare strings using equality and
inequality operators through the ``==`` and ``!=`` operators
respectively. When used in this manner however, the string must match
entirely to resolve to true.  For example, the script below uses two
ternary conditional statements to illustrate the use of the ``==``
operator with patterns.  The output is altered based
on the result of the comparison between the pattern and the string.

.. literalinclude:: data_type_pattern_02.zeek
   :caption:
   :language: zeek
   :linenos:
   :tab-width: 4

.. code-block:: console

   $ zeek data_type_pattern_02.zeek
   equality and /^?(equal)$?/ are not equal
   equality and /^?(equality)$?/ are equal

Record Data Type
----------------

With Zeek's support for a wide array of data types and data structures,
an obvious extension is to include the ability to create custom
data types composed of atomic types and further data structures.  To
accomplish this, Zeek introduces the ``record`` type and the ``type``
keyword.  Similar to how you would define a new data structure in C
with the ``typedef`` and ``struct`` keywords, Zeek allows you to cobble
together new data types to suit the needs of your situation.

When combined with the ``type`` keyword, ``record`` can generate a
composite type.  We have, in fact, already encountered a complex
example of the ``record`` data type in the earlier sections, the
:zeek:type:`connection` record passed to many events. Another one,
:zeek:type:`Conn::Info`, which corresponds to the fields logged into
:file:`conn.log`, is shown by the excerpt below.

.. literalinclude:: data_type_record.zeek
   :caption:
   :language: zeek
   :linenos:
   :tab-width: 4

Looking at the structure of the definition, a new collection of data
types is being defined as a type called ``Info``.  Since this type
definition is within the confines of an export block, what is defined
is, in fact, ``Conn::Info``.

The formatting for a declaration of a record type in Zeek includes the
descriptive name of the type being defined and the separate fields
that make up the record.  The individual fields that make up the new
record are not limited in type or number as long as the name for each
field is unique.

.. literalinclude:: data_struct_record_01.zeek
   :caption:
   :language: zeek
   :linenos:
   :tab-width: 4

.. code-block:: console

   $ zeek data_struct_record_01.zeek
   Service: dns(RFC1035)
     port: 53/udp
     port: 53/tcp
   Service: http(RFC2616)
     port: 8080/tcp
     port: 80/tcp

The sample above shows a simple type definition that includes a
string, a set of ports, and a count to define a service type.  Also
included is a function to print each field of a record in a formatted
fashion and a :zeek:id:`zeek_init` event handler to show some
functionality of working with records.  The definitions of the DNS and
HTTP services are both done in-line using squared brackets before being
passed to the ``print_service`` function.  The ``print_service``
function makes use of the ``$`` dereference operator to access the
fields within the newly defined Service record type.

As you saw in the definition for the ``Conn::Info`` record, other
records are even valid as fields within another record.  We can extend
the example above to include another record that contains a Service
record.

.. literalinclude:: data_struct_record_02.zeek
   :caption:
   :language: zeek
   :linenos:
   :tab-width: 4

.. code-block:: console

   $ zeek data_struct_record_02.zeek
   System: morlock
     Service: http(RFC2616)
       port: 8080/tcp
       port: 80/tcp
     Service: dns(RFC1035)
       port: 53/udp
       port: 53/tcp

The example above includes a second record type in which a field is
used as the data type for a set.  Records can be repeatedly nested
within other records, their fields reachable through repeated chains
of the ``$`` dereference operator.

It's also common to see a ``type`` used to simply alias a data
structure to a more descriptive name.  The example below shows an
example of this from Zeek's own type definitions file.

.. code-block:: zeek
   :caption: init-bare.zeek

   type string_array: table[count] of string;
   type string_set: set[string];
   type addr_set: set[addr];

The three lines above alias a type of data structure to a descriptive
name.  Functionally, the operations are the same, however, each of the
types above are named such that their function is instantly
identifiable.  This is another place in Zeek scripting where
consideration can lead to better readability of your code and thus
easier maintainability in the future.


Custom Logging
==============

Armed with a decent understanding of the data types and data
structures in Zeek, exploring the various frameworks available is a
much more rewarding effort.  The framework with which most users are
likely to have the most interaction is the Logging Framework.
Designed in such a way to so as to abstract much of the process of
creating a file and appending ordered and organized data into it, the
Logging Framework makes use of some potentially unfamiliar
nomenclature.  Specifically, Log Streams, Filters and Writers are
simply abstractions of the processes required to manage a high rate of
incoming logs while maintaining full operability.  If you've seen Zeek
employed in an environment with a large number of connections, you
know that logs are produced incredibly quickly; the ability to process
a large set of data and write it to disk is due to the design of the
Logging Framework.

Data is written to a Log Stream based on decision making processes in
Zeek's scriptland.  Log Streams correspond to a single log as defined
by the set of name/value pairs that make up its fields.  That data can
then be filtered, modified, or redirected with Logging Filters which,
by default, are set to log everything.  Filters can be used to break
log files into subsets or duplicate that information to another
output.  The final output of the data is defined by the writer.  Zeek's
default writer is simple tab separated ASCII files but Zeek also
includes support for `DataSeries <https://github.com/dataseries>`_
and `Elasticsearch <http://www.elasticsearch.org>`_ outputs as well as
additional writers currently in development.  While these new terms
and ideas may give the impression that the Logging Framework is
difficult to work with, the actual learning curve is, in actuality,
not very steep at all.  The abstraction built into the Logging
Framework makes it such that a vast majority of scripts needs not go
past the basics.  In effect, writing to a log file is as simple as
defining the format of your data, letting Zeek know that you wish to
create a new log, and then calling the :zeek:id:`Log::write` method to
output log records.

The Logging Framework is an area in Zeek where, the more you see it
used and the more you use it yourself, the more second nature the
boilerplate parts of the code will become.  As such, let's work
through a contrived example of simply logging the digits 1 through 10
and their corresponding factorial to the default ASCII log writer.
It's always best to work through the problem once, simulating the
desired output with ``print`` and ``fmt`` before attempting to dive
into the Logging Framework.

.. literalinclude:: framework_logging_factorial_01.zeek
   :caption:
   :language: zeek
   :linenos:
   :tab-width: 4

.. code-block:: console

   $ zeek framework_logging_factorial_01.zeek
   1
   2
   6
   24
   120
   720
   5040
   40320
   362880
   3628800

This script defines a factorial function to recursively calculate the
factorial of a unsigned integer passed as an argument to the function.  Using
``print`` and  :zeek:id:`fmt` we can ensure that Zeek can perform these
calculations correctly as well get an idea of the answers ourselves.

The output of the script aligns with what we expect so now it's time
to integrate the Logging Framework.

.. literalinclude:: framework_logging_factorial_02.zeek
   :caption:
   :language: zeek
   :linenos:
   :tab-width: 4

As mentioned above we have to perform a few steps before we can
issue the :zeek:id:`Log::write` method and produce a logfile.
As we are working within a namespace and informing an outside
entity of workings and data internal to the namespace, we use
an ``export`` block.  First we need to inform Zeek
that we are going to be adding another Log Stream by adding a value to
the :zeek:type:`Log::ID` enumerable.  In this script, we append the
value ``LOG`` to the ``Log::ID`` enumerable, however due to this being in
an export block the value appended to ``Log::ID`` is actually
``Factor::LOG``.  Next, we define the fields
that make up the data of our logs and dictate its format.  This script
defines a new record datatype called ``Info`` (actually,
``Factor::Info``) with two fields, both unsigned integers. Each of the
fields in the ``Factor::Info`` record type include the ``&log``
attribute, indicating that these fields should be passed to the
Logging Framework when ``Log::write`` is called.
Any record fields without the ``&log`` attribute are ignored by the
Logging Framework. The next step is to create the logging
stream with :zeek:id:`Log::create_stream` which takes a ``Log::ID`` and a
record as its arguments.  In this example, we call the
``Log::create_stream`` method and pass ``Factor::LOG`` and the
``Factor::Info`` record as arguments. From here on out, if we issue
the ``Log::write`` command with the correct ``Log::ID`` and a properly
formatted ``Factor::Info`` record, a log entry will be generated.

Now, if we run this script, instead of generating
logging information to stdout, no output is created.  Instead the
output is all in :file:`factor.log`, properly formatted and organized.

.. code-block:: console

   $ zeek framework_logging_factorial_02.zeek
   $ cat factor.log
   #separator \x09
   #set_separator    ,
   #empty_field      (empty)
   #unset_field      -
   #path     factor
   #open     2018-12-14-21-47-18
   #fields   num     factorial_num
   #types    count   count
   1 1
   2 2
   3 6
   4 24
   5 120
   6 720
   7 5040
   8 40320
   9 362880
   10        3628800
   #close    2018-12-14-21-47-18

While the previous example is a simplistic one, it serves to
demonstrate the small pieces of script code that need to be in place in
order to generate logs.  For example, it's common to call
``Log::create_stream`` in :zeek:id:`zeek_init` and while in a live
example, determining when to call ``Log::write`` would likely be
done in an event handler, in this case we use :zeek:id:`zeek_done` .

If you've already spent time with a deployment of Zeek, you've likely
had the opportunity to view, search through, or manipulate the logs
produced by the Logging Framework.  The log output from a default
installation of Zeek is substantial to say the least, however, there
are times in which the way the Logging Framework by default isn't
ideal for the situation.  This can range from needing to log more or
less data with each call to ``Log::write`` or even the need to split
log files based on arbitrary logic.  In the later case, Filters come
into play along with the Logging Framework.  Filters grant a level of
customization to Zeek's scriptland, allowing the script writer to
include or exclude fields in the log and even make alterations to the
path of the file in which the logs are being placed.  Each stream,
when created, is given a default filter called, not surprisingly,
``default``.  When using the ``default`` filter, every key value pair
with the ``&log`` attribute is written to a single file.  For the
example we've been using, let's extend it so as to write any factorial
which is a factor of 5 to an alternate file, while writing the
remaining logs to :file:`factor.log`.

.. literalinclude:: framework_logging_factorial_03.zeek
   :caption:
   :language: zeek
   :linenos:
   :tab-width: 4

To dynamically alter the file in which a stream writes its logs, a
filter can specify a function that returns a string to be used as the
filename for the current call to ``Log::write``. The definition for
this function has to take as its parameters a ``Log::ID`` called id, a
string called ``path`` and the appropriate record type for the logs called
``rec``.  You can see the definition of ``mod5`` used in this example
conforms to that requirement.  The function simply returns
``factor-mod5`` if the factorial is divisible evenly by 5, otherwise, it
returns ``factor-non5``.  In the additional ``zeek_init`` event
handler, we define a locally scoped ``Log::Filter`` and assign it a
record that defines the ``name`` and ``path_func`` fields.  We then
call ``Log::add_filter`` to add the filter to the ``Factor::LOG``
``Log::ID`` and call ``Log::remove_filter`` to remove the ``default``
filter for ``Factor::LOG``.  Had we not removed the ``default`` filter,
we'd have ended up with three log files: :file:`factor-mod5.log` with all the
factorials that are a factors of 5, :file:`factor-non5.log` with the
factorials that are not factors of 5, and :file:`factor.log` which would have
included all factorials.

.. code-block:: console

   $ zeek framework_logging_factorial_03.zeek
   $ cat factor-mod5.log
   #separator \x09
   #set_separator    ,
   #empty_field      (empty)
   #unset_field      -
   #path     factor-mod5
   #open     2018-12-14-21-47-18
   #fields   num     factorial_num
   #types    count   count
   5 120
   6 720
   7 5040
   8 40320
   9 362880
   10        3628800
   #close    2018-12-14-21-47-1

The ability of Zeek to generate easily customizable and extensible logs
which remain easily parsable is a big part of the reason Zeek has
gained a large measure of respect.  In fact, it's difficult at times
to think of something that Zeek doesn't log and as such, it is often
advantageous for analysts and systems architects to instead hook into
the logging framework to be able to perform custom actions based upon
the data being sent to the Logging Frame.  To that end, every default
log stream in Zeek generates a custom event that can be handled by
anyone wishing to act upon the data being sent to the stream.  By
convention these events are usually in the format ``log_x`` where x is
the name of the logging stream; as such the event raised for every log
sent to the Logging Framework by the HTTP parser would be ``log_http``.
Instead of using an external script to parse the :file:`http.log` file and
do post-processing for each entry, this can be done in real time inside
Zeek by defining an event handler for the ``log_http`` event.

Telling Zeek to raise an event in your own Logging stream is as simple
as exporting that event name and then adding that event in the call to
``Log::create_stream``.  Going back to our simple example of logging
the factorial of an integer, we add ``log_factor`` to the ``export``
block and define the value to be passed to it, in this case the
``Factor::Info`` record.  We then list the ``log_factor`` function as
the ``$ev`` field in the call to ``Log::create_stream``

.. literalinclude:: framework_logging_factorial_04.zeek
   :caption:
   :language: zeek
   :linenos:
   :tab-width: 4


Raising Notices
===============

While Zeek's Logging Framework provides an easy and systematic way to
generate logs, there still exists a need to indicate when a specific
behavior has been detected and a method to allow that detection to
come to someone's attention.  To that end, the Notice Framework is in
place to allow script writers a codified means through which they can
raise a notice, as well as a system through which an operator can
opt-in to receive the notice.  Zeek holds to the philosophy that it is
up to the individual operator to indicate the behaviors in which they
are interested and as such Zeek ships with a large number of policy
scripts which detect behavior that may be of interest but it does not
presume to guess as to which behaviors are "action-able".  In effect,
Zeek works to separate the act of detection and the responsibility of
reporting.  With the Notice Framework it's simple to raise a notice
for any behavior that is detected.

To raise a notice in Zeek, you only need to indicate to Zeek that you
are provide a specific :zeek:type:`Notice::Type` by exporting it and then
make a call to :zeek:id:`NOTICE` supplying it with an appropriate
:zeek:type:`Notice::Info` record.  Often times the call to ``NOTICE``
includes just the ``Notice::Type``, and a concise message.  There are
however, significantly more options available when raising notices as
seen in the definition of :zeek:type:`Notice::Info`.  The only field in
``Notice::Info`` whose
attributes make it a required field is the ``note`` field.  Still,
good manners are always important and including a concise message in
``$msg`` and, where necessary, the contents of the connection record
in ``$conn`` along with the ``Notice::Type`` tend to comprise the
minimum of information required for an notice to be considered useful.
If the ``$conn`` variable is supplied the Notice Framework will
auto-populate the ``$id`` and ``$src`` fields as well.  Other fields
that are commonly included, ``$identifier`` and ``$suppress_for`` are
built around the automated suppression feature of the Notice Framework
which we will cover shortly.

One of the default policy scripts raises a notice when an SSH login
has been heuristically detected and the originating hostname is one
that would raise suspicion.  Effectively, the script attempts to
define a list of hosts from which you would never want to see SSH
traffic originating, like DNS servers, mail servers, etc.  To
accomplish this, the script adheres to the separation of detection
and reporting by detecting a behavior and raising a notice.  Whether
or not that notice is acted upon is decided by the local Notice
Policy, but the script attempts to supply as much information as
possible while staying concise.

.. code-block:: zeek
   :caption: scripts/policy/protocols/ssh/interesting-hostnames.zeek

   ##! This script will generate a notice if an apparent SSH login originates
   ##! or heads to a host with a reverse hostname that looks suspicious.  By
   ##! default, the regular expression to match "interesting" hostnames includes
   ##! names that are typically used for infrastructure hosts like nameservers,
   ##! mail servers, web servers and ftp servers.

   @load base/frameworks/notice

   module SSH;

   export {
       redef enum Notice::Type += {
           ## Generated if a login originates or responds with a host where
           ## the reverse hostname lookup resolves to a name matched by the
           ## :zeek:id:`SSH::interesting_hostnames` regular expression.
           Interesting_Hostname_Login,
       };

       ## Strange/bad host names to see successful SSH logins from or to.
       option interesting_hostnames =
               /^d?ns[0-9]*\./ |
               /^smtp[0-9]*\./ |
               /^mail[0-9]*\./ |
               /^pop[0-9]*\./  |
               /^imap[0-9]*\./ |
               /^www[0-9]*\./  |
               /^ftp[0-9]*\./;
   }

   function check_ssh_hostname(id: conn_id, uid: string, host: addr)
       {
       when ( local hostname = lookup_addr(host) )
           {
           if ( interesting_hostnames in hostname )
               {
               NOTICE([$note=Interesting_Hostname_Login,
                       $msg=fmt("Possible SSH login involving a %s %s with an interesting hostname.",
                                Site::is_local_addr(host) ? "local" : "remote",
                                host == id$orig_h ? "client" : "server"),
                       $sub=hostname, $id=id, $uid=uid]);
               }
           }
       }

   event ssh_auth_successful(c: connection, auth_method_none: bool)
       {
       for ( host in set(c$id$orig_h, c$id$resp_h) )
           {
           check_ssh_hostname(c$id, c$uid, host);
           }
       }

While much of the script relates to the actual detection, the parts
specific to the Notice Framework are actually quite interesting in
themselves.  The script's ``export`` block adds the value
``SSH::Interesting_Hostname_Login`` to the enumerable constant
``Notice::Type`` to indicate to the Zeek core that a new type of notice
is being defined.  The script then calls ``NOTICE`` and defines the
``$note``, ``$msg``, ``$sub``, ``id``, and ``$uid`` fields of the
:zeek:type:`Notice::Info` record. (More commonly, one would set
``$conn`` instead, however this script avoids using the connection
record inside the when-statement for performance reasons.)
There are two ternary if
statements that modify the ``$msg`` text depending on whether the
host is a local address and whether it is the client or the server.
This use of :zeek:id:`fmt` and ternary operators is a concise way to
lend readability to the notices that are generated without the need
for branching ``if`` statements that each raise a specific notice.

The opt-in system for notices is managed through writing
:zeek:id:`Notice::policy` hooks.  A ``Notice::policy`` hook takes as
its argument a ``Notice::Info`` record which will hold the same
information your script provided in its call to ``NOTICE``.  With
access to the ``Notice::Info`` record for a specific notice you can
include logic such as in statements in the body of your hook to alter
the policy for handling notices on your system.  In Zeek, hooks are
akin to a mix of functions and event handlers: like functions, calls
to them are synchronous (i.e., run to completion and return); but like
events, they can have multiple bodies which will all execute. For
defining a notice policy, you define a hook and Zeek will take care of
passing in the ``Notice::Info`` record.  The simplest kind of
``Notice::policy`` hooks simply check the value of ``$note`` in the
``Notice::Info`` record being passed into the hook and performing an
action based on the answer.  The hook below adds the
:zeek:enum:`Notice::ACTION_EMAIL` action for the
``SSH::Interesting_Hostname_Login`` notice raised in the
:doc:`/scripts/policy/protocols/ssh/interesting-hostnames.zeek` script.

.. literalinclude:: framework_notice_hook_01.zeek
   :caption:
   :language: zeek
   :linenos:
   :tab-width: 4

In the example above we've added ``Notice::ACTION_EMAIL`` to the
``n$actions`` set.  This set, defined in the Notice Framework scripts,
can only have entries from the :zeek:type:`Notice::Action` type, which is
itself an enumerable that defines the values shown in the table below
along with their corresponding meanings.  The
:zeek:enum:`Notice::ACTION_LOG` action writes the notice to the
``Notice::LOG`` logging stream which, in the default configuration,
will write each notice to the :file:`notice.log` file and take no further
action.  The :zeek:enum:`Notice::ACTION_EMAIL` action will send an email
to the address or addresses defined in the :zeek:id:`Notice::mail_dest`
variable with the particulars of the notice as the body of the email.
The last action, :zeek:enum:`Notice::ACTION_ALARM` sends the notice to
the :zeek:enum:`Notice::ALARM_LOG` logging stream which is then rotated
hourly and its contents emailed in readable ASCII to the addresses in
``Notice::mail_dest``.

.. list-table::

  * - :zeek:see:`Notice::ACTION_NONE`
    - Take no action
  * - :zeek:see:`Notice::ACTION_LOG`
    - Send the notice to the Notice::LOG logging stream.
  * - :zeek:see:`Notice::ACTION_EMAIL`
    - Send an email with the notice in the body.
  * - :zeek:see:`Notice::ACTION_ALARM`
    - Send the notice to the Notice::Alarm_LOG stream.

While actions like the ``Notice::ACTION_EMAIL`` action have appeal for
quick alerts and response, a caveat of its use is to make sure the
notices configured with this action also have a suppression.  A
suppression is a means through which notices can be ignored after they
are initially raised if the author of the script has set an
identifier.  An identifier is a unique string of information collected
from the connection relative to the behavior that has been observed by
Zeek.

.. code-block:: zeek
   :caption: scripts/policy/protocols/ssl/expiring-certs.zeek

   NOTICE([$note=Certificate_Expires_Soon,
           $msg=fmt("Certificate %s is going to expire at %T", cert$subject, cert$not_valid_after),
           $conn=c, $suppress_for=1day,
           $identifier=cat(c$id$resp_h, c$id$resp_p, hash),
           $fuid=fuid]);

In the :doc:`/scripts/policy/protocols/ssl/expiring-certs.zeek` script
which identifies when SSL certificates are set to expire and raises
notices when it crosses a predefined threshold, the call to
``NOTICE`` above also sets the ``$identifier`` entry by concatenating
the responder IP, port, and the hash of the certificate.  The
selection of responder IP, port and certificate hash fits perfectly
into an appropriate identifier as it creates a unique identifier with
which the suppression can be matched. Were we to take out any of the
entities used for the identifier, for example the certificate hash, we
could be setting our suppression too broadly, causing an analyst to
miss a notice that should have been raised.  Depending on the
available data for the identifier, it can be useful to set the
``$suppress_for`` variable as well.  The ``expiring-certs.zeek`` script
sets ``$suppress_for`` to ``1day``, telling the Notice Framework to
suppress the notice for 24 hours after the first notice is raised.
Once that time limit has passed, another notice can be raised which
will again set the ``1day`` suppression time.  Suppressing for a
specific amount of time has benefits beyond simply not filling up an
analyst's email inbox; keeping the notice alerts timely and succinct
helps avoid a case where an analyst might see the notice and, due to
over exposure, ignore it.

The ``$suppress_for`` variable can also be altered in a
``Notice::policy`` hook, allowing a deployment to better suit the
environment in which it is be run.  Using the example of
``expiring-certs.zeek``, we can write a ``Notice::policy`` hook for
``SSL::Certificate_Expires_Soon`` to configure the ``$suppress_for``
variable to a shorter time.

.. literalinclude:: framework_notice_hook_suppression_01.zeek
   :caption:
   :language: zeek
   :linenos:
   :tab-width: 4

While ``Notice::policy`` hooks allow you to build custom
predicate-based policies for a deployment, there are bound to be times
where you don't require the full expressiveness that a hook allows.
In short, there will be notice policy considerations where a broad
decision can be made based on the ``Notice::Type`` alone.  To
facilitate these types of decisions, the Notice Framework supports
Notice Policy shortcuts.  These shortcuts are implemented through the
means of a group of data structures that map specific, predefined
details and actions to the effective name of a notice.  Primarily
implemented as a set or table of enumerables of :zeek:type:`Notice::Type`,
Notice Policy shortcuts can be placed as a single directive in your
``local.zeek`` file as a concise readable configuration.  As these
variables are all constants, it bears mentioning that these variables
are all set at parse-time before Zeek is fully up and running and not
set dynamically.

+------------------------------------+-----------------------------------------------------+-------------------------------------+
| Name                               | Description                                         | Data Type                           |
+====================================+=====================================================+=====================================+
| Notice::ignored_types              | Ignore the Notice::Type entirely                    | set[Notice::Type]                   |
+------------------------------------+-----------------------------------------------------+-------------------------------------+
| Notice::emailed_types              | Set Notice::ACTION_EMAIL to this Notice::Type       | set[Notice::Type]                   |
+------------------------------------+-----------------------------------------------------+-------------------------------------+
| Notice::alarmed_types              | Set Notice::ACTION_ALARM to this Notice::Type       | set[Notice::Type]                   |
+------------------------------------+-----------------------------------------------------+-------------------------------------+
| Notice::not_suppressed_types       | Remove suppression from this Notice::Type           | set[Notice::Type]                   |
+------------------------------------+-----------------------------------------------------+-------------------------------------+
| Notice::type_suppression_intervals | Alter the $suppress_for value for this Notice::Type | table[Notice::Type] of interval     |
+------------------------------------+-----------------------------------------------------+-------------------------------------+


The table above details the five Notice Policy shortcuts, their
meaning and the data type used to implement them.  With the exception
of ``Notice::type_suppression_intervals`` a ``set`` data type is
employed to hold the ``Notice::Type`` of the notice upon which a
shortcut should applied.  The first three shortcuts are fairly self
explanatory, applying an action to the ``Notice::Type`` elements in
the set, while the latter two shortcuts alter details of the
suppression being applied to the Notice.  The shortcut
``Notice::not_suppressed_types`` can be used to remove the configured
suppression from a notice while ``Notice::type_suppression_intervals``
can be used to alter the suppression interval defined by $suppress_for
in the call to ``NOTICE``.

.. literalinclude:: framework_notice_shortcuts_01.zeek
   :caption:
   :language: zeek
   :linenos:
   :tab-width: 4

The Notice Policy shortcut above adds the ``Notice::Type`` of
``SSH::Interesting_Hostname_Login`` to the
``Notice::emailed_types`` set while the shortcut below alters the length
of time for which those notices will be suppressed.

.. literalinclude:: framework_notice_shortcuts_02.zeek
   :caption:
   :language: zeek
   :linenos:
   :tab-width: 4