mirror of
https://github.com/zeek/zeek.git
synced 2025-10-02 06:38:20 +00:00

This is based on commit 2731def9159247e6da8a3191783c89683363689c from the zeek-docs repo.
1817 lines
83 KiB
ReStructuredText
1817 lines
83 KiB
ReStructuredText
|
|
.. _writing-scripts:
|
|
|
|
==========
|
|
The Basics
|
|
==========
|
|
|
|
Understanding Scripts
|
|
=====================
|
|
|
|
Zeek includes an event-driven scripting language that provides
|
|
the primary means for an organization to extend and customize Zeek's
|
|
functionality. Virtually all of the output generated by Zeek
|
|
is, in fact, generated by Zeek scripts. It's almost easier to consider
|
|
Zeek to be an entity behind-the-scenes processing connections and
|
|
generating events while Zeek's scripting language is the medium through
|
|
which we mere mortals can achieve communication. Zeek scripts
|
|
effectively notify Zeek that should there be an event of a type we
|
|
define, then let us have the information about the connection so we
|
|
can perform some function on it. For example, the :file:`ssl.log` file is
|
|
generated by a Zeek script that walks the entire certificate chain and
|
|
issues notifications if any of the steps along the certificate chain
|
|
are invalid. This entire process is setup by telling Zeek that should
|
|
it see a server or client issue an SSL ``HELLO`` message, we want to know
|
|
about the information about that connection.
|
|
|
|
It's often easiest to understand Zeek's scripting language by
|
|
looking at a complete script and breaking it down into its
|
|
identifiable components. In this example, we'll take a look at how
|
|
Zeek checks the SHA1 hash of various files extracted from network traffic
|
|
against the `Team Cymru Malware hash registry
|
|
<http://www.team-cymru.org/Services/MHR/>`_. Part of the Team Cymru Malware
|
|
Hash registry includes the ability to do a host lookup on a domain with the format
|
|
``<MALWARE_HASH>.malware.hash.cymru.com`` where ``<MALWARE_HASH>`` is the SHA1 hash of a file.
|
|
Team Cymru also populates the TXT record of their DNS responses with both a "first seen"
|
|
timestamp and a numerical "detection rate". The important aspect to understand is Zeek already
|
|
generating hashes for files via the Files framework, but it is the
|
|
script :doc:`/scripts/policy/frameworks/files/detect-MHR.zeek`
|
|
that is responsible for generating the
|
|
appropriate DNS lookup, parsing the response, and generating a notice if appropriate.
|
|
|
|
.. code-block:: zeek
|
|
:caption: detect-MHR.zeek
|
|
|
|
##! Detect file downloads that have hash values matching files in Team
|
|
##! Cymru's Malware Hash Registry (http://www.team-cymru.org/Services/MHR/).
|
|
|
|
@load base/frameworks/files
|
|
@load base/frameworks/notice
|
|
@load frameworks/files/hash-all-files
|
|
|
|
module TeamCymruMalwareHashRegistry;
|
|
|
|
export {
|
|
redef enum Notice::Type += {
|
|
## The hash value of a file transferred over HTTP matched in the
|
|
## malware hash registry.
|
|
Match
|
|
};
|
|
|
|
## File types to attempt matching against the Malware Hash Registry.
|
|
option match_file_types = /application\/x-dosexec/ |
|
|
/application\/vnd.ms-cab-compressed/ |
|
|
/application\/pdf/ |
|
|
/application\/x-shockwave-flash/ |
|
|
/application\/x-java-applet/ |
|
|
/application\/jar/ |
|
|
/video\/mp4/;
|
|
|
|
## The Match notice has a sub message with a URL where you can get more
|
|
## information about the file. The %s will be replaced with the SHA-1
|
|
## hash of the file.
|
|
option match_sub_url = "https://www.virustotal.com/en/search/?query=%s";
|
|
|
|
## The malware hash registry runs each malware sample through several
|
|
## A/V engines. Team Cymru returns a percentage to indicate how
|
|
## many A/V engines flagged the sample as malicious. This threshold
|
|
## allows you to require a minimum detection rate.
|
|
option notice_threshold = 10;
|
|
}
|
|
|
|
function do_mhr_lookup(hash: string, fi: Notice::FileInfo)
|
|
{
|
|
local hash_domain = fmt("%s.malware.hash.cymru.com", hash);
|
|
|
|
when ( local MHR_result = lookup_hostname_txt(hash_domain) )
|
|
{
|
|
# Data is returned as "<dateFirstDetected> <detectionRate>"
|
|
local MHR_answer = split_string1(MHR_result, / /);
|
|
|
|
if ( |MHR_answer| == 2 )
|
|
{
|
|
local mhr_detect_rate = to_count(MHR_answer[1]);
|
|
|
|
if ( mhr_detect_rate >= notice_threshold )
|
|
{
|
|
local mhr_first_detected = double_to_time(to_double(MHR_answer[0]));
|
|
local readable_first_detected = strftime("%Y-%m-%d %H:%M:%S", mhr_first_detected);
|
|
local message = fmt("Malware Hash Registry Detection rate: %d%% Last seen: %s", mhr_detect_rate, readable_first_detected);
|
|
local virustotal_url = fmt(match_sub_url, hash);
|
|
# We don't have the full fa_file record here in order to
|
|
# avoid the "when" statement cloning it (expensive!).
|
|
local n: Notice::Info = Notice::Info($note=Match, $msg=message, $sub=virustotal_url);
|
|
Notice::populate_file_info2(fi, n);
|
|
NOTICE(n);
|
|
}
|
|
}
|
|
}
|
|
}
|
|
|
|
event file_hash(f: fa_file, kind: string, hash: string)
|
|
{
|
|
if ( kind == "sha1" && f?$info && f$info?$mime_type &&
|
|
match_file_types in f$info$mime_type )
|
|
do_mhr_lookup(hash, Notice::create_file_info(f));
|
|
}
|
|
|
|
Visually, there are three distinct sections of the script. First, there is a base
|
|
level with no indentation where libraries are included in the script through ``@load``
|
|
and a namespace is defined with ``module``. This is followed by an indented and formatted
|
|
section explaining the custom variables being provided (``export``) as part of the script's namespace.
|
|
Finally there is a second indented and formatted section describing the instructions to take for a
|
|
specific event (``event file_hash``). Don't get discouraged if you don't
|
|
understand every section of the script; we'll cover the basics of the
|
|
script and much more in following sections.
|
|
|
|
.. code-block:: zeek
|
|
:caption: detect-MHR.zeek
|
|
|
|
@load base/frameworks/files
|
|
@load base/frameworks/notice
|
|
@load frameworks/files/hash-all-files
|
|
|
|
The first part of the script consists of ``@load`` directives which
|
|
process the ``__load__.zeek`` script in the
|
|
respective directories being loaded. The ``@load`` directives are
|
|
often considered good practice or even just good manners when writing
|
|
Zeek scripts to make sure they can be used on their own. While it's unlikely that in a
|
|
full production deployment of Zeek these additional resources wouldn't
|
|
already be loaded, it's not a bad habit to try to get into as you get
|
|
more experienced with Zeek scripting. If you're just starting out,
|
|
this level of granularity might not be entirely necessary. The ``@load`` directives
|
|
are ensuring the Files framework, the Notice framework and the script to hash all files has
|
|
been loaded by Zeek.
|
|
|
|
.. code-block:: zeek
|
|
:caption: detect-MHR.zeek
|
|
|
|
export {
|
|
redef enum Notice::Type += {
|
|
## The hash value of a file transferred over HTTP matched in the
|
|
## malware hash registry.
|
|
Match
|
|
};
|
|
|
|
## File types to attempt matching against the Malware Hash Registry.
|
|
option match_file_types = /application\/x-dosexec/ |
|
|
/application\/vnd.ms-cab-compressed/ |
|
|
/application\/pdf/ |
|
|
/application\/x-shockwave-flash/ |
|
|
/application\/x-java-applet/ |
|
|
/application\/jar/ |
|
|
/video\/mp4/;
|
|
|
|
## The Match notice has a sub message with a URL where you can get more
|
|
## information about the file. The %s will be replaced with the SHA-1
|
|
## hash of the file.
|
|
option match_sub_url = "https://www.virustotal.com/en/search/?query=%s";
|
|
|
|
## The malware hash registry runs each malware sample through several
|
|
## A/V engines. Team Cymru returns a percentage to indicate how
|
|
## many A/V engines flagged the sample as malicious. This threshold
|
|
## allows you to require a minimum detection rate.
|
|
option notice_threshold = 10;
|
|
}
|
|
|
|
The export section redefines an enumerable constant that describes the type of
|
|
notice we will generate with the Notice framework. Zeek allows for
|
|
re-definable constants, which at first, might seem counter-intuitive. We'll
|
|
get more in-depth with constants in a later chapter, for now, think of them as
|
|
variables that can only be altered before Zeek starts running. By extending
|
|
the :zeek:see:`Notice::Type` as shown, this allows for the :zeek:see:`NOTICE`
|
|
function to generate notices with a ``$note`` field set as
|
|
``TeamCymruMalwareHashRegistry::Match``. Notices allow Zeek to generate some
|
|
kind of extra notification beyond its default log types. Often times, this
|
|
extra notification comes in the form of an email generated and sent to a
|
|
preconfigured address, but can be altered depending on the needs of the
|
|
deployment. The export section is finished off with the definition of a few
|
|
constants that list the kind of files we want to match against and the minimum
|
|
percentage of detection threshold in which we are interested.
|
|
|
|
Up until this point, the script has merely done some basic setup. With
|
|
the next section, the script starts to define instructions to take in
|
|
a given event.
|
|
|
|
.. code-block:: zeek
|
|
:caption: detect-MHR.zeek
|
|
|
|
function do_mhr_lookup(hash: string, fi: Notice::FileInfo)
|
|
{
|
|
local hash_domain = fmt("%s.malware.hash.cymru.com", hash);
|
|
|
|
when ( local MHR_result = lookup_hostname_txt(hash_domain) )
|
|
{
|
|
# Data is returned as "<dateFirstDetected> <detectionRate>"
|
|
local MHR_answer = split_string1(MHR_result, / /);
|
|
|
|
if ( |MHR_answer| == 2 )
|
|
{
|
|
local mhr_detect_rate = to_count(MHR_answer[1]);
|
|
|
|
if ( mhr_detect_rate >= notice_threshold )
|
|
{
|
|
local mhr_first_detected = double_to_time(to_double(MHR_answer[0]));
|
|
local readable_first_detected = strftime("%Y-%m-%d %H:%M:%S", mhr_first_detected);
|
|
local message = fmt("Malware Hash Registry Detection rate: %d%% Last seen: %s", mhr_detect_rate, readable_first_detected);
|
|
local virustotal_url = fmt(match_sub_url, hash);
|
|
# We don't have the full fa_file record here in order to
|
|
# avoid the "when" statement cloning it (expensive!).
|
|
local n: Notice::Info = Notice::Info($note=Match, $msg=message, $sub=virustotal_url);
|
|
Notice::populate_file_info2(fi, n);
|
|
NOTICE(n);
|
|
}
|
|
}
|
|
}
|
|
}
|
|
|
|
event file_hash(f: fa_file, kind: string, hash: string)
|
|
{
|
|
if ( kind == "sha1" && f?$info && f$info?$mime_type &&
|
|
match_file_types in f$info$mime_type )
|
|
do_mhr_lookup(hash, Notice::create_file_info(f));
|
|
}
|
|
|
|
The workhorse of the script is contained in the event handler for
|
|
``file_hash``. The :zeek:see:`file_hash` event allows scripts to access
|
|
the information associated with a file for which Zeek's file analysis
|
|
framework has generated a hash. The event handler is passed the
|
|
file itself as ``f``, the type of digest algorithm used as ``kind``
|
|
and the hash generated as ``hash``.
|
|
|
|
In the ``file_hash`` event handler, there is an ``if`` statement that is used
|
|
to check for the correct type of hash, in this case
|
|
a SHA1 hash. It also checks for a mime type we've defined as
|
|
being of interest as defined in the constant ``match_file_types``.
|
|
The comparison is made against the expression ``f$info$mime_type``, which uses
|
|
the ``$`` dereference operator to check the value ``mime_type``
|
|
inside the variable ``f$info``. If the entire expression evaluates to true,
|
|
then a helper function is called to do the rest of the work. In that
|
|
function, a local variable is defined to hold a string comprised of
|
|
the SHA1 hash concatenated with ``.malware.hash.cymru.com``; this
|
|
value will be the domain queried in the malware hash registry.
|
|
|
|
The rest of the script is contained within a ``when`` block. In
|
|
short, a ``when`` block is used when Zeek needs to perform asynchronous
|
|
actions, such as a DNS lookup, to ensure that performance isn't effected.
|
|
The ``when`` block performs a DNS TXT lookup and stores the result
|
|
in the local variable ``MHR_result``. Effectively, processing for
|
|
this event continues and upon receipt of the values returned by
|
|
:zeek:id:`lookup_hostname_txt`, the ``when`` block is executed. The
|
|
``when`` block splits the string returned into a portion for the date on which
|
|
the malware was first detected, and the detection rate, by splitting the text
|
|
on space and storing the values returned in a local table variable.
|
|
In the ``do_mhr_lookup`` function, if the table
|
|
returned by ``split1`` has two entries, indicating a successful split, we
|
|
store the detection
|
|
date in ``mhr_first_detected`` and the rate in ``mhr_detect_rate``
|
|
using the appropriate conversion functions. From this point on, Zeek knows it has seen a file
|
|
transmitted which has a hash that has been seen by the Team Cymru Malware Hash Registry, the rest
|
|
of the script is dedicated to producing a notice.
|
|
|
|
The detection time is processed into a string representation and stored in
|
|
``readable_first_detected``. The script then compares the detection rate
|
|
against the ``notice_threshold`` that was defined earlier. If the
|
|
detection rate is high enough, the script creates a concise description
|
|
of the notice and stores it in the ``message`` variable. It also
|
|
creates a possible URL to check the sample against
|
|
``virustotal.com``'s database, and makes the call to :zeek:id:`NOTICE`
|
|
to hand the relevant information off to the Notice framework.
|
|
|
|
In approximately a few dozen lines of code, Zeek provides an amazing
|
|
utility that would be incredibly difficult to implement and deploy
|
|
with other products. In truth, claiming that Zeek does this in such a small
|
|
number of lines is a misdirection; there is a truly massive number of things
|
|
going on behind-the-scenes in Zeek, but it is the inclusion of the
|
|
scripting language that gives analysts access to those underlying
|
|
layers in a succinct and well defined manner.
|
|
|
|
The Event Queue and Event Handlers
|
|
==================================
|
|
|
|
Zeek's scripting language is event driven which is a gear change from
|
|
the majority of scripting languages with which most users will have
|
|
previous experience. Scripting in Zeek depends on handling the events
|
|
generated by Zeek as it processes network traffic, altering the state
|
|
of data structures through those events, and making decisions on the
|
|
information provided. This approach to scripting can often cause
|
|
confusion to users who come to Zeek from a procedural or functional
|
|
language, but once the initial shock wears off it becomes more clear
|
|
with each exposure.
|
|
|
|
Zeek's core acts to place events into an ordered "event queue",
|
|
allowing event handlers to process them on a first-come-first-serve
|
|
basis. In effect, this is Zeek's core functionality as without the
|
|
scripts written to perform discrete actions on events, there would be
|
|
little to no usable output. As such, a basic understanding of the
|
|
event queue, the events being generated, and the way in which event
|
|
handlers process those events is a basis for not only learning to
|
|
write scripts for Zeek but for understanding Zeek itself.
|
|
|
|
Gaining familiarity with the specific events generated by Zeek is a big
|
|
step towards building a mind set for working with Zeek scripts. The
|
|
majority of events generated by Zeek are defined in the
|
|
built-in-function (``*.bif``) files which also act as the basis for
|
|
online event documentation. These in-line comments are compiled into
|
|
an online documentation system using Zeekygen. Whether starting a
|
|
script from scratch or reading and maintaining someone else's script,
|
|
having the built-in event definitions available is an excellent
|
|
resource to have on hand. For the 2.0 release the Zeek developers put
|
|
significant effort into organization and documentation of every event.
|
|
This effort resulted in built-in-function files organized such that
|
|
each entry contains a descriptive event name, the arguments passed to
|
|
the event, and a concise explanation of the functions use.
|
|
|
|
.. code-block:: zeek
|
|
|
|
## Generated for DNS requests. For requests with multiple queries, this event
|
|
## is raised once for each.
|
|
##
|
|
## See `Wikipedia <http://en.wikipedia.org/wiki/Domain_Name_System>`__ for more
|
|
## information about the DNS protocol. Zeek analyzes both UDP and TCP DNS
|
|
## sessions.
|
|
##
|
|
## c: The connection, which may be UDP or TCP depending on the type of the
|
|
## transport-layer session being analyzed.
|
|
##
|
|
## msg: The parsed DNS message header.
|
|
##
|
|
## query: The queried name.
|
|
##
|
|
## qtype: The queried resource record type.
|
|
##
|
|
## qclass: The queried resource record class.
|
|
##
|
|
## .. zeek:see:: dns_AAAA_reply dns_A_reply dns_CNAME_reply dns_EDNS_addl
|
|
## dns_HINFO_reply dns_MX_reply dns_NS_reply dns_PTR_reply dns_SOA_reply
|
|
## dns_SRV_reply dns_TSIG_addl dns_TXT_reply dns_WKS_reply dns_end
|
|
## dns_full_request dns_mapping_altered dns_mapping_lost_name dns_mapping_new_name
|
|
## dns_mapping_unverified dns_mapping_valid dns_message dns_query_reply
|
|
## dns_rejected non_dns_request dns_max_queries dns_session_timeout dns_skip_addl
|
|
## dns_skip_all_addl dns_skip_all_auth dns_skip_auth
|
|
event dns_request%(c: connection, msg: dns_msg, query: string, qtype: count, qclass: count%);
|
|
|
|
Above is a segment of the documentation for the event
|
|
:zeek:id:`dns_request` (and the preceding link points to the
|
|
documentation generated out of that). It's organized such that the
|
|
documentation, commentary, and list of arguments precede the actual
|
|
event definition used by Zeek. As Zeek detects DNS requests being
|
|
issued by an originator, it issues this event and any number of
|
|
scripts then have access to the data Zeek passes along with the event.
|
|
In this example, Zeek passes not only the message, the query, query
|
|
type and query class for the DNS request, but also a record used
|
|
for the connection itself.
|
|
|
|
.. _writing-scripts-connection-record:
|
|
|
|
The Connection Record Data Type
|
|
===============================
|
|
|
|
Of all the events defined by Zeek, an overwhelmingly large number of
|
|
them are passed the :zeek:type:`connection` record data type, in effect,
|
|
making it the backbone of many scripting solutions. The connection
|
|
record itself, as we will see in a moment, is a mass of nested data
|
|
types used to track state on a connection through its lifetime. Let's
|
|
walk through the process of selecting an appropriate event, generating
|
|
some output to standard out and dissecting the connection record so as
|
|
to get an overview of it. We will cover data types in more detail
|
|
later.
|
|
|
|
While Zeek is capable of packet level processing, its strengths lay in
|
|
the context of a connection between an *originator* and a *responder*.
|
|
|
|
.. note::
|
|
|
|
Zeek's notions of originator and responder aim to capture the
|
|
natural roles of connection endpoints given the protocol
|
|
information observed. They differ from the packet-level concepts of
|
|
source and destination, as well as from higher-level abstractions
|
|
such as client and server.
|
|
|
|
Zeek's protocol analyzers determine originator and responder when
|
|
establishing connection state, with the sender of the initial
|
|
packet usually becoming the originator and the recipient becoming
|
|
the responder. However, analyzers may subsequently *flip* the roles
|
|
if protocol semantics suggest it. For example, in the presence of
|
|
packet loss the first observed packet in a DNS transaction may
|
|
indicate that it is in fact the response to a missing query. Zeek's
|
|
DNS analyzer will flip the endpoint roles, making the sender of
|
|
this packet the connection's responder.
|
|
|
|
Zeek defines events for the primary parts of the connection life-cycle,
|
|
such as the following:
|
|
|
|
* :zeek:see:`new_connection`
|
|
* :zeek:see:`connection_timeout`
|
|
* :zeek:see:`connection_state_remove`
|
|
|
|
Of the events listed, the event that will give us the best insight
|
|
into the connection record data type will be
|
|
:zeek:id:`connection_state_remove` . As detailed in the in-line
|
|
documentation, Zeek generates this event just before it decides to
|
|
remove this event from memory, effectively forgetting about it. Let's
|
|
take a look at a simple example script, that will output the connection record
|
|
for a single connection.
|
|
|
|
.. literalinclude:: connection_record_01.zeek
|
|
:caption:
|
|
:language: zeek
|
|
:linenos:
|
|
:tab-width: 4
|
|
|
|
Again, we start with ``@load``, this time importing the
|
|
:doc:`base/protocols/conn </scripts/base/protocols/conn/index>` scripts which
|
|
supply the tracking and logging of general information and state of
|
|
connections. We handle the :zeek:id:`connection_state_remove` event and simply
|
|
print the contents of the argument passed to it. For this example we're
|
|
going to run Zeek in "bare mode" which loads only the minimum number of
|
|
scripts to retain operability and leaves the burden of loading
|
|
required scripts to the script being run. While bare mode is a low
|
|
level functionality incorporated into Zeek, in this case, we're going
|
|
to use it to demonstrate how different features of Zeek add more and
|
|
more layers of information about a connection. This will give us a
|
|
chance to see the contents of the connection record without it being
|
|
overly populated.
|
|
|
|
.. code-block:: console
|
|
|
|
$ zeek -b -r http/get.trace connection_record_01.zeek
|
|
[id=[orig_h=141.142.228.5, orig_p=59856/tcp, resp_h=192.150.187.43, resp_p=80/tcp], orig=[size=136, state=5, num_pkts=7, num_bytes_ip=512, flow_label=0, l2_addr=c8:bc:c8:96:d2:a0], resp=[size=5007, state=5, num_pkts=7, num_bytes_ip=5379, flow_label=0, l2_addr=00:10:db:88:d2:ef], start_time=1362692526.869344, duration=0.211484, service={
|
|
|
|
}, history=ShADadFf, uid=CHhAvVGS1DHFjwGM9, tunnel=<uninitialized>, vlan=<uninitialized>, inner_vlan=<uninitialized>, conn=[ts=1362692526.869344, uid=CHhAvVGS1DHFjwGM9, id=[orig_h=141.142.228.5, orig_p=59856/tcp, resp_h=192.150.187.43, resp_p=80/tcp], proto=tcp, service=<uninitialized>, duration=0.211484, orig_bytes=136, resp_bytes=5007, conn_state=SF, local_orig=<uninitialized>, local_resp=<uninitialized>, missed_bytes=0, history=ShADadFf, orig_pkts=7, orig_ip_bytes=512, resp_pkts=7, resp_ip_bytes=5379, tunnel_parents=<uninitialized>], extract_orig=F, extract_resp=F, thresholds=<uninitialized>]
|
|
|
|
As you can see from the output, the connection record is something of
|
|
a jumble when printed on its own. Regularly taking a peek at a
|
|
populated connection record helps to understand the relationship
|
|
between its fields as well as allowing an opportunity to build a frame
|
|
of reference for accessing data in a script.
|
|
|
|
Zeek makes extensive use of nested data structures to store state and
|
|
information gleaned from the analysis of a connection as a complete
|
|
unit. To break down this collection of information, you will have to
|
|
make use of Zeek's field delimiter ``$``. For example, the
|
|
originating host is referenced by ``c$id$orig_h`` which if given a
|
|
narrative relates to ``orig_h`` which is a member of ``id`` which is
|
|
a member of the data structure referred to as ``c`` that was passed
|
|
into the event handler. Given that the responder port
|
|
``c$id$resp_p`` is ``80/tcp``, it's likely that Zeek's base HTTP scripts
|
|
can further populate the connection record. Let's load the
|
|
``base/protocols/http`` scripts and check the output of our script.
|
|
|
|
Zeek uses the dollar sign as its field delimiter and a direct
|
|
correlation exists between the output of the connection record and the
|
|
proper format of a dereferenced variable in scripts. In the output of
|
|
the script above, groups of information are collected between
|
|
brackets, which would correspond to the ``$``-delimiter in a Zeek script.
|
|
|
|
.. literalinclude:: connection_record_02.zeek
|
|
:caption:
|
|
:language: zeek
|
|
:linenos:
|
|
:tab-width: 4
|
|
|
|
.. code-block:: console
|
|
|
|
$ zeek -b -r http/get.trace connection_record_02.zeek
|
|
[id=[orig_h=141.142.228.5, orig_p=59856/tcp, resp_h=192.150.187.43, resp_p=80/tcp], orig=[size=136, state=5, num_pkts=7, num_bytes_ip=512, flow_label=0, l2_addr=c8:bc:c8:96:d2:a0], resp=[size=5007, state=5, num_pkts=7, num_bytes_ip=5379, flow_label=0, l2_addr=00:10:db:88:d2:ef], start_time=1362692526.869344, duration=0.211484, service={
|
|
|
|
}, history=ShADadFf, uid=CHhAvVGS1DHFjwGM9, tunnel=<uninitialized>, vlan=<uninitialized>, inner_vlan=<uninitialized>, conn=[ts=1362692526.869344, uid=CHhAvVGS1DHFjwGM9, id=[orig_h=141.142.228.5, orig_p=59856/tcp, resp_h=192.150.187.43, resp_p=80/tcp], proto=tcp, service=<uninitialized>, duration=0.211484, orig_bytes=136, resp_bytes=5007, conn_state=SF, local_orig=<uninitialized>, local_resp=<uninitialized>, missed_bytes=0, history=ShADadFf, orig_pkts=7, orig_ip_bytes=512, resp_pkts=7, resp_ip_bytes=5379, tunnel_parents=<uninitialized>], extract_orig=F, extract_resp=F, thresholds=<uninitialized>, http=[ts=1362692526.939527, uid=CHhAvVGS1DHFjwGM9, id=[orig_h=141.142.228.5, orig_p=59856/tcp, resp_h=192.150.187.43, resp_p=80/tcp], trans_depth=1, method=GET, host=bro.org, uri=/download/CHANGES.bro-aux.txt, referrer=<uninitialized>, version=1.1, user_agent=Wget/1.14 (darwin12.2.0), request_body_len=0, response_body_len=4705, status_code=200, status_msg=OK, info_code=<uninitialized>, info_msg=<uninitialized>, tags={
|
|
|
|
}, username=<uninitialized>, password=<uninitialized>, capture_password=F, proxied=<uninitialized>, range_request=F, orig_fuids=<uninitialized>, orig_filenames=<uninitialized>, orig_mime_types=<uninitialized>, resp_fuids=[FakNcS1Jfe01uljb3], resp_filenames=<uninitialized>, resp_mime_types=[text/plain], current_entity=<uninitialized>, orig_mime_depth=1, resp_mime_depth=1], http_state=[pending={
|
|
|
|
}, current_request=1, current_response=1, trans_depth=1]]
|
|
|
|
The addition of the ``base/protocols/http`` scripts populates the
|
|
``http=[]`` member of the connection record. While Zeek is doing a
|
|
massive amount of work in the background, it is in what is commonly
|
|
called "scriptland" that details are being refined and decisions
|
|
being made. Were we to continue running in "bare mode" we could slowly
|
|
keep adding infrastructure through ``@load`` statements. For example,
|
|
were we to ``@load base/frameworks/logging``, Zeek would generate a
|
|
:file:`conn.log` and :file:`http.log` for us in the current working directory.
|
|
As mentioned above, including the appropriate ``@load`` statements is
|
|
not only good practice, but can also help to indicate which
|
|
functionalities are being used in a script. Take a second to run the
|
|
script without the ``-b`` flag and check the output when all of Zeek's
|
|
functionality is applied to the trace file.
|
|
|
|
Data Types and Data Structures
|
|
==============================
|
|
|
|
Scope
|
|
-----
|
|
|
|
Before embarking on a exploration of Zeek's native data types and data
|
|
structures, it's important to have a good grasp of the different
|
|
levels of scope available in Zeek and the appropriate times to use them
|
|
within a script. The declarations of variables in Zeek come in two
|
|
forms. Variables can be declared with or without a definition in the
|
|
form ``SCOPE name: TYPE`` or ``SCOPE name = EXPRESSION`` respectively;
|
|
each of which produce the same result if ``EXPRESSION`` evaluates to the
|
|
same type as ``TYPE``. The decision as to which type of declaration to
|
|
use is likely to be dictated by personal preference and readability.
|
|
|
|
.. literalinclude:: data_type_declaration.zeek
|
|
:caption:
|
|
:language: zeek
|
|
:linenos:
|
|
:tab-width: 4
|
|
|
|
Global Variables
|
|
~~~~~~~~~~~~~~~~
|
|
|
|
A global variable is used when the state of variable needs to be
|
|
tracked, not surprisingly, globally. While there are some caveats,
|
|
when a script declares a variable using the global scope, that script
|
|
is granting access to that variable from other scripts. However, when
|
|
a script uses the ``module`` keyword to give the script a namespace,
|
|
more care must be given to the declaration of globals to ensure the
|
|
intended result. When a global is declared in a script with a
|
|
namespace there are two possible outcomes. First, the variable is
|
|
available only within the context of the namespace. In this scenario,
|
|
other scripts within the same namespace will have access to the
|
|
variable declared while scripts using a different namespace or no
|
|
namespace altogether will not have access to the variable.
|
|
Alternatively, if a global variable is declared within an ``export { ... }``
|
|
block that variable is available to any other script through the
|
|
naming convention of ``<module name>::<variable name>``, i.e. the variable
|
|
needs to be "scoped" by the name of the module in which it was declared.
|
|
|
|
When the ``module`` keyword is used in a script, the variables declared
|
|
are said to be in that module's "namespace". Where as a global variable
|
|
can be accessed by its name alone when it is not declared within a
|
|
module, a global variable declared within a module must be exported and
|
|
then accessed via ``<module name>::<variable name>``.
|
|
|
|
Constants
|
|
~~~~~~~~~
|
|
|
|
Zeek also makes use of constants, which are denoted by the ``const``
|
|
keyword. Unlike globals, constants can only be set or altered at
|
|
parse time if the ``&redef`` attribute has been used. Afterwards (in
|
|
runtime) the constants are unalterable. In most cases, re-definable
|
|
constants are used in Zeek scripts as containers for configuration
|
|
options. For example, the configuration option to log passwords
|
|
decrypted from HTTP streams is stored in
|
|
:zeek:see:`HTTP::default_capture_password` as shown in the stripped down
|
|
excerpt from :doc:`/scripts/base/protocols/http/main.zeek` below.
|
|
|
|
.. literalinclude:: http_main.zeek
|
|
:caption:
|
|
:language: zeek
|
|
:linenos:
|
|
:tab-width: 4
|
|
|
|
Because the constant was declared with the ``&redef`` attribute, if we
|
|
needed to turn this option on globally, we could do so by adding the
|
|
following line to our ``site/local.zeek`` file before firing up Zeek.
|
|
|
|
.. literalinclude:: data_type_const_simple.zeek
|
|
:caption:
|
|
:language: zeek
|
|
:linenos:
|
|
:tab-width: 4
|
|
|
|
While the idea of a re-definable constant might be odd, the constraint
|
|
that constants can only be altered at parse-time remains even with the
|
|
``&redef`` attribute. In the code snippet below, a table of strings
|
|
indexed by ports is declared as a constant before two values are added
|
|
to the table through ``redef`` statements. The table is then printed
|
|
in a :zeek:id:`zeek_init` event. Were we to try to alter the table in
|
|
an event handler, Zeek would notify the user of an error and the script
|
|
would fail.
|
|
|
|
.. literalinclude:: data_type_const.zeek
|
|
:caption:
|
|
:language: zeek
|
|
:linenos:
|
|
:tab-width: 4
|
|
|
|
.. code-block:: console
|
|
|
|
$ zeek -b data_type_const.zeek
|
|
{
|
|
[80/tcp] = WWW,
|
|
[6666/tcp] = IRC
|
|
}
|
|
|
|
Local Variables
|
|
~~~~~~~~~~~~~~~
|
|
|
|
Whereas globals and constants are widely available in scriptland
|
|
through various means, when a variable is defined with a local scope,
|
|
its availability is restricted to the body of the event or function in
|
|
which it was declared. Local variables tend to be used for values
|
|
that are only needed within a specific scope and once the processing
|
|
of a script passes beyond that scope and no longer used, the variable
|
|
is deleted. Zeek maintains names of locals separately from globally
|
|
visible ones, an example of which is illustrated below.
|
|
|
|
.. literalinclude:: data_type_local.zeek
|
|
:caption:
|
|
:language: zeek
|
|
:linenos:
|
|
:tab-width: 4
|
|
|
|
The script executes the event handler :zeek:id:`zeek_init` which in turn calls
|
|
the function ``add_two(i: count)`` with an argument of ``10``. Once Zeek
|
|
enters the ``add_two`` function, it provisions a locally scoped
|
|
variable called ``added_two`` to hold the value of ``i+2``, in this
|
|
case, ``12``. The ``add_two`` function then prints the value of the
|
|
``added_two`` variable and returns its value to the ``zeek_init`` event
|
|
handler. At this point, the variable ``added_two`` has fallen out of
|
|
scope and no longer exists while the value ``12`` still in use and
|
|
stored in the locally scoped variable ``test``. When Zeek finishes
|
|
processing the ``zeek_init`` function, the variable called ``test`` is
|
|
no longer in scope and, since there exist no other references to the
|
|
value ``12``, the value is also deleted.
|
|
|
|
|
|
Data Structures
|
|
---------------
|
|
|
|
It's difficult to talk about Zeek's data types in a practical manner
|
|
without first covering the data structures available in Zeek. Some of
|
|
the more interesting characteristics of data types are revealed when
|
|
used inside of a data structure, but given that data structures are
|
|
made up of data types, it devolves rather quickly into a
|
|
"chicken-and-egg" problem. As such, we'll introduce data types from
|
|
a bird's eye view before diving into data structures and from there a
|
|
more complete exploration of data types.
|
|
|
|
The table below shows the atomic types used in Zeek, of which the
|
|
first four should seem familiar if you have some scripting experience,
|
|
while the remaining six are less common in other languages. It should
|
|
come as no surprise that a scripting language for a Network Security
|
|
Monitoring platform has a fairly robust set of network-centric data
|
|
types and taking note of them here may well save you a late night of
|
|
reinventing the wheel.
|
|
|
|
.. list-table::
|
|
:header-rows: 1
|
|
|
|
* - Data Type
|
|
- Description
|
|
|
|
* - :zeek:see:`int`
|
|
- 64 bit signed integer
|
|
|
|
* - :zeek:see:`count`
|
|
- 64 bit unsigned integer
|
|
|
|
* - :zeek:see:`double`
|
|
- double precision floating precision
|
|
|
|
* - :zeek:see:`bool`
|
|
- boolean (T/F)
|
|
|
|
* - :zeek:see:`addr`
|
|
- IP address, IPv4 and IPv6
|
|
|
|
* - :zeek:see:`port`
|
|
- transport layer port
|
|
|
|
* - :zeek:see:`subnet`
|
|
- CIDR subnet mask
|
|
|
|
* - :zeek:see:`time`
|
|
- absolute epoch time
|
|
|
|
* - :zeek:see:`interval`
|
|
- a time interval
|
|
|
|
* - :zeek:see:`pattern`
|
|
- regular expression
|
|
|
|
Sets
|
|
~~~~
|
|
|
|
Sets in Zeek are used to store unique elements of the same data
|
|
type. In essence, you can think of them as "a unique set of integers"
|
|
or "a unique set of IP addresses". While the declaration of a set may
|
|
differ based on the data type being collected, the set will always
|
|
contain unique elements and the elements in the set will always be of
|
|
the same data type. Such requirements make the set data type perfect
|
|
for information that is already naturally unique such as ports or IP
|
|
addresses. The code snippet below shows both an explicit and implicit
|
|
declaration of a locally scoped set.
|
|
|
|
.. literalinclude:: data_struct_set_declaration.zeek
|
|
:caption:
|
|
:language: zeek
|
|
:linenos:
|
|
:lines: 1-4,22
|
|
:tab-width: 4
|
|
|
|
As you can see, sets are declared using the format ``SCOPE var_name:
|
|
set[TYPE]``. Adding and removing elements in a set is achieved using
|
|
the ``add`` and ``delete`` statements. Once you have elements inserted into
|
|
the set, it's likely that you'll need to either iterate over that set
|
|
or test for membership within the set, both of which are covered by
|
|
the ``in`` operator. In the case of iterating over a set, combining the
|
|
``for`` statement and the ``in`` operator will allow you to sequentially
|
|
process each element of the set as seen below.
|
|
|
|
.. literalinclude:: data_struct_set_declaration.zeek
|
|
:caption:
|
|
:language: zeek
|
|
:linenos:
|
|
:lines: 17-21
|
|
:lineno-start: 17
|
|
:tab-width: 4
|
|
|
|
Here, the ``for`` statement loops over the contents of the set storing
|
|
each element in the temporary variable ``i``. With each iteration of
|
|
the ``for`` loop, the next element is chosen. Since sets are not an
|
|
ordered data type, you cannot guarantee the order of the elements as
|
|
the ``for`` loop processes.
|
|
|
|
To test for membership in a set the ``in`` statement can be combined
|
|
with an ``if`` statement to return a true or false value. If the
|
|
exact element in the condition is already in the set, the condition
|
|
returns true and the body executes. The ``in`` statement can also be
|
|
negated by the ``!`` operator to create the inverse of the condition.
|
|
While we could rewrite the corresponding line below as ``if ( !(
|
|
587/tcp in ssl_ports ))`` try to avoid using this construct; instead,
|
|
negate the in operator itself. While the functionality is the same,
|
|
using the ``!in`` is more efficient as well as a more natural construct
|
|
which will aid in the readability of your script.
|
|
|
|
.. literalinclude:: data_struct_set_declaration.zeek
|
|
:caption:
|
|
:language: zeek
|
|
:linenos:
|
|
:lines: 13-15
|
|
:lineno-start: 13
|
|
:tab-width: 4
|
|
|
|
You can see the full script and its output below.
|
|
|
|
.. literalinclude:: data_struct_set_declaration.zeek
|
|
:caption:
|
|
:language: zeek
|
|
:linenos:
|
|
:tab-width: 4
|
|
|
|
.. code-block:: console
|
|
|
|
$ zeek data_struct_set_declaration.zeek
|
|
SSL Port: 22/tcp
|
|
SSL Port: 443/tcp
|
|
SSL Port: 587/tcp
|
|
SSL Port: 993/tcp
|
|
Non-SSL Port: 80/tcp
|
|
Non-SSL Port: 25/tcp
|
|
Non-SSL Port: 143/tcp
|
|
Non-SSL Port: 23/tcp
|
|
|
|
Tables
|
|
~~~~~~
|
|
|
|
A table in Zeek is a mapping of a key to a value or yield. While the
|
|
values don't have to be unique, each key in the table must be unique
|
|
to preserve a one-to-one mapping of keys to values.
|
|
|
|
.. literalinclude:: data_struct_table_declaration.zeek
|
|
:caption:
|
|
:language: zeek
|
|
:linenos:
|
|
:tab-width: 4
|
|
|
|
.. code-block:: console
|
|
|
|
$ zeek data_struct_table_declaration.zeek
|
|
Service Name: SSH - Common Port: 22/tcp
|
|
Service Name: HTTPS - Common Port: 443/tcp
|
|
Service Name: SMTPS - Common Port: 587/tcp
|
|
Service Name: IMAPS - Common Port: 993/tcp
|
|
|
|
In this example,
|
|
we've compiled a table of SSL-enabled services and their common
|
|
ports. The explicit declaration and constructor for the table are on
|
|
two different lines and lay out the data types of the keys (strings) and the
|
|
data types of the values (ports) and then fill in some sample key and
|
|
value pairs. You can also use a table accessor to insert one
|
|
key-value pair into the table. When using the ``in``
|
|
operator on a table, you are effectively working with the keys of the table.
|
|
In the case of an ``if`` statement, the ``in`` operator will check for
|
|
membership among the set of keys and return a true or false value.
|
|
The example shows how to check if ``SMTPS`` is not in the set
|
|
of keys for the ``ssl_services`` table and if the condition holds true,
|
|
we add the key-value pair to the table. Finally, the example shows how
|
|
to use a ``for`` statement to iterate over each key currently in the table.
|
|
|
|
Simple examples aside, tables can become extremely complex as the keys
|
|
and values for the table become more intricate. Tables can have keys
|
|
comprised of multiple data types and even a series of elements called
|
|
a "tuple". The flexibility gained with the use of complex tables in
|
|
Zeek implies a cost in complexity for the person writing the scripts
|
|
but pays off in effectiveness given the power of Zeek as a network
|
|
security platform.
|
|
|
|
.. literalinclude:: data_struct_table_complex.zeek
|
|
:caption:
|
|
:language: zeek
|
|
:linenos:
|
|
:tab-width: 4
|
|
|
|
.. code-block:: console
|
|
|
|
$ zeek -b data_struct_table_complex.zeek
|
|
Harakiri was released in 1962 by Shochiku Eiga studios, directed by Masaki Kobayashi and starring Tatsuya Nakadai
|
|
Goyokin was released in 1969 by Fuji studios, directed by Hideo Gosha and starring Tatsuya Nakadai
|
|
Tasogare Seibei was released in 2002 by Eisei Gekijo studios, directed by Yoji Yamada and starring Hiroyuki Sanada
|
|
Kiru was released in 1968 by Toho studios, directed by Kihachi Okamoto and starring Tatsuya Nakadai
|
|
|
|
This script shows a sample table of strings indexed by two
|
|
strings, a count, and a final string. With a tuple acting as an
|
|
aggregate key, the order is important as a change in order would
|
|
result in a new key. Here, we're using the table to track the
|
|
director, studio, year of release, and lead actor in a series of
|
|
samurai flicks.
|
|
|
|
In the case of the ``for`` statement above, iteration is done over all
|
|
parts of the key. When not all parts of a key are needed within the ``for``
|
|
loop's body, these can be ignored by using the blank identifier ``_``
|
|
instead of a variable.
|
|
It's important to note, however, that the structure of the key needs to
|
|
be reflected: All parts of the key need to be captured within the brackets
|
|
by a variable or the blank identifier.
|
|
As a special case, a single blank identifier allows to ignore the whole key.
|
|
In the previous example, we need squared brackets surrounding four temporary
|
|
variables to act as a collection for our iteration. While this
|
|
is a contrived example, we could easily have had keys containing IP addresses
|
|
(``addr``), ports (``port``) and even a ``string`` calculated as the result
|
|
of a reverse hostname lookup.
|
|
|
|
The example below continues with the ``samurai_flicks`` table and shows usage
|
|
of the blank identifier in combination with key-value iteration.
|
|
Using key-value iteration short-cuts the table access to lookup the value as
|
|
it provides the respective entry's value directly in addition to the key.
|
|
|
|
First, iteration is done by capturing the directors and movie names and
|
|
ignoring all other elements of the key. Second, the whole key is ignored
|
|
and only movie names used.
|
|
|
|
.. literalinclude:: data_struct_table_complex_blank_value.zeek
|
|
:caption:
|
|
:language: zeek
|
|
:linenos:
|
|
:tab-width: 4
|
|
|
|
.. code-block:: console
|
|
|
|
$ zeek data_struct_table_complex_blank_value.zeek
|
|
Kiru was directed by Kihachi Okamoto
|
|
Harakiri was directed by Masaki Kobayashi
|
|
Tasogare Seibei was directed by Yoji Yamada
|
|
Goyokin was directed by Hideo Gosha
|
|
Kiru is a movie
|
|
Harakiri is a movie
|
|
Tasogare Seibei is a movie
|
|
Goyokin is a movie
|
|
|
|
|
|
Vectors
|
|
~~~~~~~
|
|
|
|
If you're coming to Zeek with a programming background, you may or may
|
|
not be familiar with a vector data type depending on your language of
|
|
choice. On the surface, vectors perform much of the same
|
|
functionality as associative arrays with unsigned integers as their
|
|
indices. They are however more efficient than that and they allow for
|
|
ordered access. As such any time you need to sequentially store data of the
|
|
same type, in Zeek you should reach for a vector. Vectors are a
|
|
collection of objects, all of which are of the same data type, to
|
|
which elements can be dynamically added or removed. Since Vectors use
|
|
contiguous storage for their elements, the contents of a vector can be
|
|
accessed through a zero-indexed numerical offset.
|
|
|
|
The format for the declaration of a Vector follows the pattern of
|
|
other declarations, namely, ``SCOPE v: vector of T`` where ``v`` is
|
|
the name of your vector, and ``T`` is the data type of its members.
|
|
For example, the following snippet shows an explicit and implicit
|
|
declaration of two locally scoped vectors. The script populates the
|
|
first vector by inserting values at the end; it does that by placing
|
|
the vector name between two vertical pipes to get the vector's current
|
|
length before printing the contents of both Vectors and their current
|
|
lengths.
|
|
|
|
.. literalinclude:: data_struct_vector_declaration.zeek
|
|
:caption:
|
|
:language: zeek
|
|
:linenos:
|
|
:tab-width: 4
|
|
|
|
.. code-block:: console
|
|
|
|
$ zeek data_struct_vector_declaration.zeek
|
|
contents of v1: [1, 2, 3, 4]
|
|
length of v1: 4
|
|
contents of v2: [1, 2, 3, 4]
|
|
length of v2: 4
|
|
|
|
In a lot of cases, storing elements in a vector is simply a precursor
|
|
to then iterating over them. Iterating over a vector is easy with the
|
|
``for`` keyword. The sample below iterates over a vector of IP
|
|
addresses and for each IP address, masks that address with 18 bits.
|
|
The ``for`` keyword is used to generate a locally scoped variable
|
|
called ``i`` which will hold the index of the current element in the
|
|
vector. Using ``i`` as an index to addr_vector we can access the
|
|
current item in the vector with ``addr_vector[i]``.
|
|
|
|
.. literalinclude:: data_struct_vector_iter.zeek
|
|
:caption:
|
|
:language: zeek
|
|
:linenos:
|
|
:tab-width: 4
|
|
|
|
.. code-block:: console
|
|
|
|
$ zeek -b data_struct_vector_iter.zeek
|
|
1.2.0.0/18
|
|
2.3.0.0/18
|
|
3.4.0.0/18
|
|
|
|
Providing a value variable to the ``for`` loop allows skipping the extra
|
|
index operation. As the index variable is now is unused, the script below
|
|
uses ``_``, the blank identifier, to ignore it. This script is semantically
|
|
equivalent to the previous one, but does direct value iteration and therefore
|
|
potentially more performant for very large vectors.
|
|
|
|
.. literalinclude:: data_struct_vector_iter_value.zeek
|
|
:caption:
|
|
:language: zeek
|
|
:linenos:
|
|
:tab-width: 4
|
|
|
|
Data Types Revisited
|
|
--------------------
|
|
|
|
addr
|
|
~~~~
|
|
|
|
The ``addr``, or address, data type manages to cover a surprisingly
|
|
large amount of ground while remaining succinct. IPv4, IPv6 and even
|
|
hostname constants are included in the ``addr`` data type. While IPv4
|
|
addresses use the default dotted quad formatting, IPv6 addresses use
|
|
the RFC 2373 defined notation with the addition of squared brackets
|
|
wrapping the entire address. When you venture into hostname
|
|
constants, Zeek performs a little slight of hand for the benefit of the
|
|
user; a hostname constant is, in fact, a set of addresses. Zeek will
|
|
issue a DNS request when it sees a hostname constant in use and return
|
|
a set whose elements are the answers to the DNS request. For example,
|
|
if you were to use ``local google = www.google.com;`` you would end up
|
|
with a locally scoped ``set[addr]`` with elements that represent the
|
|
current set of round robin DNS entries for google. At first blush,
|
|
this seems trivial, but it is yet another example of Zeek making the
|
|
life of the common Zeek scripter a little easier through abstraction
|
|
applied in a practical manner. (Note however that these IP addresses
|
|
will never get updated during Zeek's processing, so often this
|
|
mechanism most useful for addresses that are expected to remain
|
|
static.).
|
|
|
|
port
|
|
~~~~
|
|
|
|
Transport layer port numbers in Zeek are represented in the format of
|
|
``<unsigned integer>/<protocol name>``, e.g., ``22/tcp`` or
|
|
``53/udp``. Zeek supports TCP(``/tcp``), UDP(``/udp``),
|
|
ICMP(``/icmp``) and UNKNOWN(``/unknown``) as protocol designations.
|
|
While ICMP doesn't have an actual port, Zeek supports the concept of
|
|
ICMP "ports" by using the ICMP message type and ICMP message code as
|
|
the source and destination port respectively. Ports can be compared
|
|
for equality using the ``==`` or ``!=`` operators and can even be
|
|
compared for ordering. Zeek gives the protocol designations the
|
|
following "order": ``unknown`` < ``tcp`` < ``udp`` < ``icmp``. For
|
|
example ``65535/tcp`` is smaller than ``0/udp``.
|
|
|
|
subnet
|
|
~~~~~~
|
|
|
|
Zeek has full support for CIDR notation subnets as a base data type.
|
|
There is no need to manage the IP and the subnet mask as two separate
|
|
entities when you can provide the same information in CIDR notation in
|
|
your scripts. The following example below uses a Zeek script to
|
|
determine if a series of IP addresses are within a set of subnets
|
|
using a 20 bit subnet mask.
|
|
|
|
.. literalinclude:: data_type_subnets.zeek
|
|
:caption:
|
|
:language: zeek
|
|
:linenos:
|
|
:tab-width: 4
|
|
|
|
Because this is a script that doesn't use any kind of network
|
|
analysis, we can handle the event :zeek:id:`zeek_init` which is always
|
|
generated by Zeek's core upon startup. In the example script, two
|
|
locally scoped vectors are created to hold our lists of subnets and IP
|
|
addresses respectively. Then, using a set of nested ``for`` loops, we
|
|
iterate over every subnet and every IP address and use an ``if``
|
|
statement to compare an IP address against a subnet using the ``in``
|
|
operator. The ``in`` operator returns true if the IP address falls
|
|
within a given subnet based on the longest prefix match calculation.
|
|
For example, ``10.0.0.1 in 10.0.0.0/8`` would return true while
|
|
``192.168.2.1 in 192.168.1.0/24`` would return false. When we run the
|
|
script, we get the output listing the IP address and the subnet in
|
|
which it belongs.
|
|
|
|
.. code-block:: console
|
|
|
|
$ zeek data_type_subnets.zeek
|
|
172.16.4.56 belongs to subnet 172.16.0.0/20
|
|
172.16.47.254 belongs to subnet 172.16.32.0/20
|
|
172.16.22.45 belongs to subnet 172.16.16.0/20
|
|
172.16.1.1 belongs to subnet 172.16.0.0/20
|
|
|
|
time
|
|
~~~~
|
|
|
|
While there is currently no supported way to add a time constant in
|
|
Zeek, two built-in functions exist to make use of the ``time`` data
|
|
type. Both :zeek:id:`network_time` and :zeek:id:`current_time` return a
|
|
``time`` data type but they each return a time based on different
|
|
criteria. The ``current_time`` function returns what is called the
|
|
wall-clock time as defined by the operating system. However,
|
|
``network_time`` returns the timestamp of the last packet processed
|
|
be it from a live data stream or saved packet capture. Both functions
|
|
return the time in epoch seconds, meaning ``strftime`` must be used to
|
|
turn the output into human readable output. The script below makes
|
|
use of the :zeek:id:`connection_established` event handler to generate text
|
|
every time a SYN/ACK packet is seen responding to a SYN packet as part
|
|
of a TCP handshake. The text generated, is in the format of a
|
|
timestamp and an indication of who the originator and responder were.
|
|
We use the ``strftime`` format string of ``%Y-%m-%d %H:%M:%S`` to
|
|
produce a common date time formatted time stamp.
|
|
|
|
.. literalinclude:: data_type_time.zeek
|
|
:caption:
|
|
:language: zeek
|
|
:linenos:
|
|
:tab-width: 4
|
|
|
|
When the script is executed we get an output showing the details of
|
|
established connections.
|
|
|
|
.. code-block:: console
|
|
|
|
$ zeek -r wikipedia.trace data_type_time.zeek
|
|
2011/06/18 19:03:08: New connection established from 141.142.220.118 to 208.80.152.118\x0a
|
|
2011/06/18 19:03:08: New connection established from 141.142.220.118 to 208.80.152.3\x0a
|
|
2011/06/18 19:03:08: New connection established from 141.142.220.118 to 208.80.152.3\x0a
|
|
2011/06/18 19:03:08: New connection established from 141.142.220.118 to 208.80.152.3\x0a
|
|
2011/06/18 19:03:08: New connection established from 141.142.220.118 to 208.80.152.3\x0a
|
|
2011/06/18 19:03:08: New connection established from 141.142.220.118 to 208.80.152.3\x0a
|
|
2011/06/18 19:03:08: New connection established from 141.142.220.118 to 208.80.152.3\x0a
|
|
2011/06/18 19:03:08: New connection established from 141.142.220.118 to 208.80.152.2\x0a
|
|
2011/06/18 19:03:09: New connection established from 141.142.220.235 to 173.192.163.128\x0a
|
|
|
|
interval
|
|
~~~~~~~~
|
|
|
|
The ``interval`` data type is another area in Zeek where rational
|
|
application of abstraction makes perfect sense. As a data type, the
|
|
``interval`` represents a relative time as denoted by a numeric constant
|
|
followed by a unit of time. For example, 2.2 seconds would be
|
|
``2.2sec`` and thirty-one days would be represented by ``31days``.
|
|
Zeek supports ``usec``, ``msec``, ``sec``, ``min``, ``hr``, or ``day`` which represent
|
|
microseconds, milliseconds, seconds, minutes, hours, and days
|
|
respectively. In fact, the ``interval`` data type allows for a surprising
|
|
amount of variation in its definitions. There can be a space between
|
|
the numeric constant or they can be crammed together like a temporal
|
|
portmanteau. The time unit can be either singular or plural. All of
|
|
this adds up to to the fact that both ``42hrs`` and ``42 hr`` are
|
|
perfectly valid and logically equivalent in Zeek. The point, however,
|
|
is to increase the readability and thus maintainability of a script.
|
|
Intervals can even be negated, allowing for ``- 10mins`` to represent
|
|
"ten minutes ago".
|
|
|
|
Intervals in Zeek can have mathematical operations performed against
|
|
them allowing the user to perform addition, subtraction,
|
|
multiplication, division, and comparison operations. As well, Zeek
|
|
returns an ``interval`` when differencing two ``time`` values using the ``-``
|
|
operator. The script below amends the script started in the section
|
|
above to include a time delta value printed along with the connection
|
|
establishment report.
|
|
|
|
.. literalinclude:: data_type_interval.zeek
|
|
:caption:
|
|
:language: zeek
|
|
:linenos:
|
|
:tab-width: 4
|
|
|
|
When we re-execute the script we see an additional line in the
|
|
output, displaying the time delta since the last fully established
|
|
connection.
|
|
|
|
.. code-block:: console
|
|
|
|
$ zeek -r wikipedia.trace data_type_interval.zeek
|
|
2011/06/18 19:03:08: New connection established from 141.142.220.118 to 208.80.152.118
|
|
2011/06/18 19:03:08: New connection established from 141.142.220.118 to 208.80.152.3
|
|
Time since last connection: 132.0 msecs 97.0 usecs
|
|
2011/06/18 19:03:08: New connection established from 141.142.220.118 to 208.80.152.3
|
|
Time since last connection: 177.0 usecs
|
|
2011/06/18 19:03:08: New connection established from 141.142.220.118 to 208.80.152.3
|
|
Time since last connection: 2.0 msecs 177.0 usecs
|
|
2011/06/18 19:03:08: New connection established from 141.142.220.118 to 208.80.152.3
|
|
Time since last connection: 33.0 msecs 898.0 usecs
|
|
2011/06/18 19:03:08: New connection established from 141.142.220.118 to 208.80.152.3
|
|
Time since last connection: 35.0 usecs
|
|
2011/06/18 19:03:08: New connection established from 141.142.220.118 to 208.80.152.3
|
|
Time since last connection: 2.0 msecs 532.0 usecs
|
|
2011/06/18 19:03:08: New connection established from 141.142.220.118 to 208.80.152.2
|
|
Time since last connection: 7.0 msecs 866.0 usecs
|
|
2011/06/18 19:03:09: New connection established from 141.142.220.235 to 173.192.163.128
|
|
Time since last connection: 817.0 msecs 703.0 usecs
|
|
|
|
|
|
Pattern
|
|
~~~~~~~
|
|
|
|
Zeek has support for fast text searching operations using regular
|
|
expressions and even goes so far as to declare a native data type for
|
|
the patterns used in regular expressions. A pattern constant is
|
|
created by enclosing text within the forward slash characters. Zeek
|
|
supports syntax very similar to the Flex lexical analyzer syntax. The
|
|
most common use of patterns in Zeek you are likely to come across is
|
|
embedded matching using the ``in`` operator. Embedded matching
|
|
adheres to a strict format, requiring the regular expression or
|
|
pattern constant to be on the left side of the ``in`` operator and the
|
|
string against which it will be tested to be on the right.
|
|
|
|
.. literalinclude:: data_type_pattern_01.zeek
|
|
:caption:
|
|
:language: zeek
|
|
:linenos:
|
|
:tab-width: 4
|
|
|
|
In the sample above, two local variables are declared to hold our
|
|
sample sentence and regular expression. Our regular expression in
|
|
this case will return true if the string contains either the word
|
|
``quick`` or the word ``lazy``. The ``if`` statement in the script uses
|
|
embedded matching and the ``in`` operator to check for the existence
|
|
of the pattern within the string. If the statement resolves to true,
|
|
:zeek:id:`split_string` is called to break the string into separate pieces.
|
|
:zeek:id:`split_string` takes a string and a pattern as its arguments and returns a
|
|
vector of strings. Each element of the vector represents
|
|
segments before and after any matches against the pattern but
|
|
excluding the actual matches. In this case, our pattern matches
|
|
twice resulting in a vector with three elements.
|
|
|
|
.. code-block:: console
|
|
|
|
$ zeek data_type_pattern_01.zeek
|
|
The
|
|
brown fox jumps over the
|
|
dog.
|
|
|
|
Patterns can also be used to compare strings using equality and
|
|
inequality operators through the ``==`` and ``!=`` operators
|
|
respectively. When used in this manner however, the string must match
|
|
entirely to resolve to true. For example, the script below uses two
|
|
ternary conditional statements to illustrate the use of the ``==``
|
|
operator with patterns. The output is altered based
|
|
on the result of the comparison between the pattern and the string.
|
|
|
|
.. literalinclude:: data_type_pattern_02.zeek
|
|
:caption:
|
|
:language: zeek
|
|
:linenos:
|
|
:tab-width: 4
|
|
|
|
.. code-block:: console
|
|
|
|
$ zeek data_type_pattern_02.zeek
|
|
equality and /^?(equal)$?/ are not equal
|
|
equality and /^?(equality)$?/ are equal
|
|
|
|
Record Data Type
|
|
----------------
|
|
|
|
With Zeek's support for a wide array of data types and data structures,
|
|
an obvious extension is to include the ability to create custom
|
|
data types composed of atomic types and further data structures. To
|
|
accomplish this, Zeek introduces the ``record`` type and the ``type``
|
|
keyword. Similar to how you would define a new data structure in C
|
|
with the ``typedef`` and ``struct`` keywords, Zeek allows you to cobble
|
|
together new data types to suit the needs of your situation.
|
|
|
|
When combined with the ``type`` keyword, ``record`` can generate a
|
|
composite type. We have, in fact, already encountered a complex
|
|
example of the ``record`` data type in the earlier sections, the
|
|
:zeek:type:`connection` record passed to many events. Another one,
|
|
:zeek:type:`Conn::Info`, which corresponds to the fields logged into
|
|
:file:`conn.log`, is shown by the excerpt below.
|
|
|
|
.. literalinclude:: data_type_record.zeek
|
|
:caption:
|
|
:language: zeek
|
|
:linenos:
|
|
:tab-width: 4
|
|
|
|
Looking at the structure of the definition, a new collection of data
|
|
types is being defined as a type called ``Info``. Since this type
|
|
definition is within the confines of an export block, what is defined
|
|
is, in fact, ``Conn::Info``.
|
|
|
|
The formatting for a declaration of a record type in Zeek includes the
|
|
descriptive name of the type being defined and the separate fields
|
|
that make up the record. The individual fields that make up the new
|
|
record are not limited in type or number as long as the name for each
|
|
field is unique.
|
|
|
|
.. literalinclude:: data_struct_record_01.zeek
|
|
:caption:
|
|
:language: zeek
|
|
:linenos:
|
|
:tab-width: 4
|
|
|
|
.. code-block:: console
|
|
|
|
$ zeek data_struct_record_01.zeek
|
|
Service: dns(RFC1035)
|
|
port: 53/udp
|
|
port: 53/tcp
|
|
Service: http(RFC2616)
|
|
port: 8080/tcp
|
|
port: 80/tcp
|
|
|
|
The sample above shows a simple type definition that includes a
|
|
string, a set of ports, and a count to define a service type. Also
|
|
included is a function to print each field of a record in a formatted
|
|
fashion and a :zeek:id:`zeek_init` event handler to show some
|
|
functionality of working with records. The definitions of the DNS and
|
|
HTTP services are both done in-line using squared brackets before being
|
|
passed to the ``print_service`` function. The ``print_service``
|
|
function makes use of the ``$`` dereference operator to access the
|
|
fields within the newly defined Service record type.
|
|
|
|
As you saw in the definition for the ``Conn::Info`` record, other
|
|
records are even valid as fields within another record. We can extend
|
|
the example above to include another record that contains a Service
|
|
record.
|
|
|
|
.. literalinclude:: data_struct_record_02.zeek
|
|
:caption:
|
|
:language: zeek
|
|
:linenos:
|
|
:tab-width: 4
|
|
|
|
.. code-block:: console
|
|
|
|
$ zeek data_struct_record_02.zeek
|
|
System: morlock
|
|
Service: http(RFC2616)
|
|
port: 8080/tcp
|
|
port: 80/tcp
|
|
Service: dns(RFC1035)
|
|
port: 53/udp
|
|
port: 53/tcp
|
|
|
|
The example above includes a second record type in which a field is
|
|
used as the data type for a set. Records can be repeatedly nested
|
|
within other records, their fields reachable through repeated chains
|
|
of the ``$`` dereference operator.
|
|
|
|
It's also common to see a ``type`` used to simply alias a data
|
|
structure to a more descriptive name. The example below shows an
|
|
example of this from Zeek's own type definitions file.
|
|
|
|
.. code-block:: zeek
|
|
:caption: init-bare.zeek
|
|
|
|
type string_array: table[count] of string;
|
|
type string_set: set[string];
|
|
type addr_set: set[addr];
|
|
|
|
The three lines above alias a type of data structure to a descriptive
|
|
name. Functionally, the operations are the same, however, each of the
|
|
types above are named such that their function is instantly
|
|
identifiable. This is another place in Zeek scripting where
|
|
consideration can lead to better readability of your code and thus
|
|
easier maintainability in the future.
|
|
|
|
|
|
Custom Logging
|
|
==============
|
|
|
|
Armed with a decent understanding of the data types and data
|
|
structures in Zeek, exploring the various frameworks available is a
|
|
much more rewarding effort. The framework with which most users are
|
|
likely to have the most interaction is the Logging Framework.
|
|
Designed in such a way to so as to abstract much of the process of
|
|
creating a file and appending ordered and organized data into it, the
|
|
Logging Framework makes use of some potentially unfamiliar
|
|
nomenclature. Specifically, Log Streams, Filters and Writers are
|
|
simply abstractions of the processes required to manage a high rate of
|
|
incoming logs while maintaining full operability. If you've seen Zeek
|
|
employed in an environment with a large number of connections, you
|
|
know that logs are produced incredibly quickly; the ability to process
|
|
a large set of data and write it to disk is due to the design of the
|
|
Logging Framework.
|
|
|
|
Data is written to a Log Stream based on decision making processes in
|
|
Zeek's scriptland. Log Streams correspond to a single log as defined
|
|
by the set of name/value pairs that make up its fields. That data can
|
|
then be filtered, modified, or redirected with Logging Filters which,
|
|
by default, are set to log everything. Filters can be used to break
|
|
log files into subsets or duplicate that information to another
|
|
output. The final output of the data is defined by the writer. Zeek's
|
|
default writer is simple tab separated ASCII files but Zeek also
|
|
includes support for `DataSeries <https://github.com/dataseries>`_
|
|
and `Elasticsearch <http://www.elasticsearch.org>`_ outputs as well as
|
|
additional writers currently in development. While these new terms
|
|
and ideas may give the impression that the Logging Framework is
|
|
difficult to work with, the actual learning curve is, in actuality,
|
|
not very steep at all. The abstraction built into the Logging
|
|
Framework makes it such that a vast majority of scripts needs not go
|
|
past the basics. In effect, writing to a log file is as simple as
|
|
defining the format of your data, letting Zeek know that you wish to
|
|
create a new log, and then calling the :zeek:id:`Log::write` method to
|
|
output log records.
|
|
|
|
The Logging Framework is an area in Zeek where, the more you see it
|
|
used and the more you use it yourself, the more second nature the
|
|
boilerplate parts of the code will become. As such, let's work
|
|
through a contrived example of simply logging the digits 1 through 10
|
|
and their corresponding factorial to the default ASCII log writer.
|
|
It's always best to work through the problem once, simulating the
|
|
desired output with ``print`` and ``fmt`` before attempting to dive
|
|
into the Logging Framework.
|
|
|
|
.. literalinclude:: framework_logging_factorial_01.zeek
|
|
:caption:
|
|
:language: zeek
|
|
:linenos:
|
|
:tab-width: 4
|
|
|
|
.. code-block:: console
|
|
|
|
$ zeek framework_logging_factorial_01.zeek
|
|
1
|
|
2
|
|
6
|
|
24
|
|
120
|
|
720
|
|
5040
|
|
40320
|
|
362880
|
|
3628800
|
|
|
|
This script defines a factorial function to recursively calculate the
|
|
factorial of a unsigned integer passed as an argument to the function. Using
|
|
``print`` and :zeek:id:`fmt` we can ensure that Zeek can perform these
|
|
calculations correctly as well get an idea of the answers ourselves.
|
|
|
|
The output of the script aligns with what we expect so now it's time
|
|
to integrate the Logging Framework.
|
|
|
|
.. literalinclude:: framework_logging_factorial_02.zeek
|
|
:caption:
|
|
:language: zeek
|
|
:linenos:
|
|
:tab-width: 4
|
|
|
|
As mentioned above we have to perform a few steps before we can
|
|
issue the :zeek:id:`Log::write` method and produce a logfile.
|
|
As we are working within a namespace and informing an outside
|
|
entity of workings and data internal to the namespace, we use
|
|
an ``export`` block. First we need to inform Zeek
|
|
that we are going to be adding another Log Stream by adding a value to
|
|
the :zeek:type:`Log::ID` enumerable. In this script, we append the
|
|
value ``LOG`` to the ``Log::ID`` enumerable, however due to this being in
|
|
an export block the value appended to ``Log::ID`` is actually
|
|
``Factor::LOG``. Next, we define the fields
|
|
that make up the data of our logs and dictate its format. This script
|
|
defines a new record datatype called ``Info`` (actually,
|
|
``Factor::Info``) with two fields, both unsigned integers. Each of the
|
|
fields in the ``Factor::Info`` record type include the ``&log``
|
|
attribute, indicating that these fields should be passed to the
|
|
Logging Framework when ``Log::write`` is called.
|
|
Any record fields without the ``&log`` attribute are ignored by the
|
|
Logging Framework. The next step is to create the logging
|
|
stream with :zeek:id:`Log::create_stream` which takes a ``Log::ID`` and a
|
|
record as its arguments. In this example, we call the
|
|
``Log::create_stream`` method and pass ``Factor::LOG`` and the
|
|
``Factor::Info`` record as arguments. From here on out, if we issue
|
|
the ``Log::write`` command with the correct ``Log::ID`` and a properly
|
|
formatted ``Factor::Info`` record, a log entry will be generated.
|
|
|
|
Now, if we run this script, instead of generating
|
|
logging information to stdout, no output is created. Instead the
|
|
output is all in :file:`factor.log`, properly formatted and organized.
|
|
|
|
.. code-block:: console
|
|
|
|
$ zeek framework_logging_factorial_02.zeek
|
|
$ cat factor.log
|
|
#separator \x09
|
|
#set_separator ,
|
|
#empty_field (empty)
|
|
#unset_field -
|
|
#path factor
|
|
#open 2018-12-14-21-47-18
|
|
#fields num factorial_num
|
|
#types count count
|
|
1 1
|
|
2 2
|
|
3 6
|
|
4 24
|
|
5 120
|
|
6 720
|
|
7 5040
|
|
8 40320
|
|
9 362880
|
|
10 3628800
|
|
#close 2018-12-14-21-47-18
|
|
|
|
While the previous example is a simplistic one, it serves to
|
|
demonstrate the small pieces of script code that need to be in place in
|
|
order to generate logs. For example, it's common to call
|
|
``Log::create_stream`` in :zeek:id:`zeek_init` and while in a live
|
|
example, determining when to call ``Log::write`` would likely be
|
|
done in an event handler, in this case we use :zeek:id:`zeek_done` .
|
|
|
|
If you've already spent time with a deployment of Zeek, you've likely
|
|
had the opportunity to view, search through, or manipulate the logs
|
|
produced by the Logging Framework. The log output from a default
|
|
installation of Zeek is substantial to say the least, however, there
|
|
are times in which the way the Logging Framework by default isn't
|
|
ideal for the situation. This can range from needing to log more or
|
|
less data with each call to ``Log::write`` or even the need to split
|
|
log files based on arbitrary logic. In the later case, Filters come
|
|
into play along with the Logging Framework. Filters grant a level of
|
|
customization to Zeek's scriptland, allowing the script writer to
|
|
include or exclude fields in the log and even make alterations to the
|
|
path of the file in which the logs are being placed. Each stream,
|
|
when created, is given a default filter called, not surprisingly,
|
|
``default``. When using the ``default`` filter, every key value pair
|
|
with the ``&log`` attribute is written to a single file. For the
|
|
example we've been using, let's extend it so as to write any factorial
|
|
which is a factor of 5 to an alternate file, while writing the
|
|
remaining logs to :file:`factor.log`.
|
|
|
|
.. literalinclude:: framework_logging_factorial_03.zeek
|
|
:caption:
|
|
:language: zeek
|
|
:linenos:
|
|
:tab-width: 4
|
|
|
|
To dynamically alter the file in which a stream writes its logs, a
|
|
filter can specify a function that returns a string to be used as the
|
|
filename for the current call to ``Log::write``. The definition for
|
|
this function has to take as its parameters a ``Log::ID`` called id, a
|
|
string called ``path`` and the appropriate record type for the logs called
|
|
``rec``. You can see the definition of ``mod5`` used in this example
|
|
conforms to that requirement. The function simply returns
|
|
``factor-mod5`` if the factorial is divisible evenly by 5, otherwise, it
|
|
returns ``factor-non5``. In the additional ``zeek_init`` event
|
|
handler, we define a locally scoped ``Log::Filter`` and assign it a
|
|
record that defines the ``name`` and ``path_func`` fields. We then
|
|
call ``Log::add_filter`` to add the filter to the ``Factor::LOG``
|
|
``Log::ID`` and call ``Log::remove_filter`` to remove the ``default``
|
|
filter for ``Factor::LOG``. Had we not removed the ``default`` filter,
|
|
we'd have ended up with three log files: :file:`factor-mod5.log` with all the
|
|
factorials that are a factors of 5, :file:`factor-non5.log` with the
|
|
factorials that are not factors of 5, and :file:`factor.log` which would have
|
|
included all factorials.
|
|
|
|
.. code-block:: console
|
|
|
|
$ zeek framework_logging_factorial_03.zeek
|
|
$ cat factor-mod5.log
|
|
#separator \x09
|
|
#set_separator ,
|
|
#empty_field (empty)
|
|
#unset_field -
|
|
#path factor-mod5
|
|
#open 2018-12-14-21-47-18
|
|
#fields num factorial_num
|
|
#types count count
|
|
5 120
|
|
6 720
|
|
7 5040
|
|
8 40320
|
|
9 362880
|
|
10 3628800
|
|
#close 2018-12-14-21-47-1
|
|
|
|
The ability of Zeek to generate easily customizable and extensible logs
|
|
which remain easily parsable is a big part of the reason Zeek has
|
|
gained a large measure of respect. In fact, it's difficult at times
|
|
to think of something that Zeek doesn't log and as such, it is often
|
|
advantageous for analysts and systems architects to instead hook into
|
|
the logging framework to be able to perform custom actions based upon
|
|
the data being sent to the Logging Frame. To that end, every default
|
|
log stream in Zeek generates a custom event that can be handled by
|
|
anyone wishing to act upon the data being sent to the stream. By
|
|
convention these events are usually in the format ``log_x`` where x is
|
|
the name of the logging stream; as such the event raised for every log
|
|
sent to the Logging Framework by the HTTP parser would be ``log_http``.
|
|
Instead of using an external script to parse the :file:`http.log` file and
|
|
do post-processing for each entry, this can be done in real time inside
|
|
Zeek by defining an event handler for the ``log_http`` event.
|
|
|
|
Telling Zeek to raise an event in your own Logging stream is as simple
|
|
as exporting that event name and then adding that event in the call to
|
|
``Log::create_stream``. Going back to our simple example of logging
|
|
the factorial of an integer, we add ``log_factor`` to the ``export``
|
|
block and define the value to be passed to it, in this case the
|
|
``Factor::Info`` record. We then list the ``log_factor`` function as
|
|
the ``$ev`` field in the call to ``Log::create_stream``
|
|
|
|
.. literalinclude:: framework_logging_factorial_04.zeek
|
|
:caption:
|
|
:language: zeek
|
|
:linenos:
|
|
:tab-width: 4
|
|
|
|
|
|
Raising Notices
|
|
===============
|
|
|
|
While Zeek's Logging Framework provides an easy and systematic way to
|
|
generate logs, there still exists a need to indicate when a specific
|
|
behavior has been detected and a method to allow that detection to
|
|
come to someone's attention. To that end, the Notice Framework is in
|
|
place to allow script writers a codified means through which they can
|
|
raise a notice, as well as a system through which an operator can
|
|
opt-in to receive the notice. Zeek holds to the philosophy that it is
|
|
up to the individual operator to indicate the behaviors in which they
|
|
are interested and as such Zeek ships with a large number of policy
|
|
scripts which detect behavior that may be of interest but it does not
|
|
presume to guess as to which behaviors are "action-able". In effect,
|
|
Zeek works to separate the act of detection and the responsibility of
|
|
reporting. With the Notice Framework it's simple to raise a notice
|
|
for any behavior that is detected.
|
|
|
|
To raise a notice in Zeek, you only need to indicate to Zeek that you
|
|
are provide a specific :zeek:type:`Notice::Type` by exporting it and then
|
|
make a call to :zeek:id:`NOTICE` supplying it with an appropriate
|
|
:zeek:type:`Notice::Info` record. Often times the call to ``NOTICE``
|
|
includes just the ``Notice::Type``, and a concise message. There are
|
|
however, significantly more options available when raising notices as
|
|
seen in the definition of :zeek:type:`Notice::Info`. The only field in
|
|
``Notice::Info`` whose
|
|
attributes make it a required field is the ``note`` field. Still,
|
|
good manners are always important and including a concise message in
|
|
``$msg`` and, where necessary, the contents of the connection record
|
|
in ``$conn`` along with the ``Notice::Type`` tend to comprise the
|
|
minimum of information required for an notice to be considered useful.
|
|
If the ``$conn`` variable is supplied the Notice Framework will
|
|
auto-populate the ``$id`` and ``$src`` fields as well. Other fields
|
|
that are commonly included, ``$identifier`` and ``$suppress_for`` are
|
|
built around the automated suppression feature of the Notice Framework
|
|
which we will cover shortly.
|
|
|
|
One of the default policy scripts raises a notice when an SSH login
|
|
has been heuristically detected and the originating hostname is one
|
|
that would raise suspicion. Effectively, the script attempts to
|
|
define a list of hosts from which you would never want to see SSH
|
|
traffic originating, like DNS servers, mail servers, etc. To
|
|
accomplish this, the script adheres to the separation of detection
|
|
and reporting by detecting a behavior and raising a notice. Whether
|
|
or not that notice is acted upon is decided by the local Notice
|
|
Policy, but the script attempts to supply as much information as
|
|
possible while staying concise.
|
|
|
|
.. code-block:: zeek
|
|
:caption: scripts/policy/protocols/ssh/interesting-hostnames.zeek
|
|
|
|
##! This script will generate a notice if an apparent SSH login originates
|
|
##! or heads to a host with a reverse hostname that looks suspicious. By
|
|
##! default, the regular expression to match "interesting" hostnames includes
|
|
##! names that are typically used for infrastructure hosts like nameservers,
|
|
##! mail servers, web servers and ftp servers.
|
|
|
|
@load base/frameworks/notice
|
|
|
|
module SSH;
|
|
|
|
export {
|
|
redef enum Notice::Type += {
|
|
## Generated if a login originates or responds with a host where
|
|
## the reverse hostname lookup resolves to a name matched by the
|
|
## :zeek:id:`SSH::interesting_hostnames` regular expression.
|
|
Interesting_Hostname_Login,
|
|
};
|
|
|
|
## Strange/bad host names to see successful SSH logins from or to.
|
|
option interesting_hostnames =
|
|
/^d?ns[0-9]*\./ |
|
|
/^smtp[0-9]*\./ |
|
|
/^mail[0-9]*\./ |
|
|
/^pop[0-9]*\./ |
|
|
/^imap[0-9]*\./ |
|
|
/^www[0-9]*\./ |
|
|
/^ftp[0-9]*\./;
|
|
}
|
|
|
|
function check_ssh_hostname(id: conn_id, uid: string, host: addr)
|
|
{
|
|
when ( local hostname = lookup_addr(host) )
|
|
{
|
|
if ( interesting_hostnames in hostname )
|
|
{
|
|
NOTICE([$note=Interesting_Hostname_Login,
|
|
$msg=fmt("Possible SSH login involving a %s %s with an interesting hostname.",
|
|
Site::is_local_addr(host) ? "local" : "remote",
|
|
host == id$orig_h ? "client" : "server"),
|
|
$sub=hostname, $id=id, $uid=uid]);
|
|
}
|
|
}
|
|
}
|
|
|
|
event ssh_auth_successful(c: connection, auth_method_none: bool)
|
|
{
|
|
for ( host in set(c$id$orig_h, c$id$resp_h) )
|
|
{
|
|
check_ssh_hostname(c$id, c$uid, host);
|
|
}
|
|
}
|
|
|
|
While much of the script relates to the actual detection, the parts
|
|
specific to the Notice Framework are actually quite interesting in
|
|
themselves. The script's ``export`` block adds the value
|
|
``SSH::Interesting_Hostname_Login`` to the enumerable constant
|
|
``Notice::Type`` to indicate to the Zeek core that a new type of notice
|
|
is being defined. The script then calls ``NOTICE`` and defines the
|
|
``$note``, ``$msg``, ``$sub``, ``id``, and ``$uid`` fields of the
|
|
:zeek:type:`Notice::Info` record. (More commonly, one would set
|
|
``$conn`` instead, however this script avoids using the connection
|
|
record inside the when-statement for performance reasons.)
|
|
There are two ternary if
|
|
statements that modify the ``$msg`` text depending on whether the
|
|
host is a local address and whether it is the client or the server.
|
|
This use of :zeek:id:`fmt` and ternary operators is a concise way to
|
|
lend readability to the notices that are generated without the need
|
|
for branching ``if`` statements that each raise a specific notice.
|
|
|
|
The opt-in system for notices is managed through writing
|
|
:zeek:id:`Notice::policy` hooks. A ``Notice::policy`` hook takes as
|
|
its argument a ``Notice::Info`` record which will hold the same
|
|
information your script provided in its call to ``NOTICE``. With
|
|
access to the ``Notice::Info`` record for a specific notice you can
|
|
include logic such as in statements in the body of your hook to alter
|
|
the policy for handling notices on your system. In Zeek, hooks are
|
|
akin to a mix of functions and event handlers: like functions, calls
|
|
to them are synchronous (i.e., run to completion and return); but like
|
|
events, they can have multiple bodies which will all execute. For
|
|
defining a notice policy, you define a hook and Zeek will take care of
|
|
passing in the ``Notice::Info`` record. The simplest kind of
|
|
``Notice::policy`` hooks simply check the value of ``$note`` in the
|
|
``Notice::Info`` record being passed into the hook and performing an
|
|
action based on the answer. The hook below adds the
|
|
:zeek:enum:`Notice::ACTION_EMAIL` action for the
|
|
``SSH::Interesting_Hostname_Login`` notice raised in the
|
|
:doc:`/scripts/policy/protocols/ssh/interesting-hostnames.zeek` script.
|
|
|
|
.. literalinclude:: framework_notice_hook_01.zeek
|
|
:caption:
|
|
:language: zeek
|
|
:linenos:
|
|
:tab-width: 4
|
|
|
|
In the example above we've added ``Notice::ACTION_EMAIL`` to the
|
|
``n$actions`` set. This set, defined in the Notice Framework scripts,
|
|
can only have entries from the :zeek:type:`Notice::Action` type, which is
|
|
itself an enumerable that defines the values shown in the table below
|
|
along with their corresponding meanings. The
|
|
:zeek:enum:`Notice::ACTION_LOG` action writes the notice to the
|
|
``Notice::LOG`` logging stream which, in the default configuration,
|
|
will write each notice to the :file:`notice.log` file and take no further
|
|
action. The :zeek:enum:`Notice::ACTION_EMAIL` action will send an email
|
|
to the address or addresses defined in the :zeek:id:`Notice::mail_dest`
|
|
variable with the particulars of the notice as the body of the email.
|
|
The last action, :zeek:enum:`Notice::ACTION_ALARM` sends the notice to
|
|
the :zeek:enum:`Notice::ALARM_LOG` logging stream which is then rotated
|
|
hourly and its contents emailed in readable ASCII to the addresses in
|
|
``Notice::mail_dest``.
|
|
|
|
.. list-table::
|
|
|
|
* - :zeek:see:`Notice::ACTION_NONE`
|
|
- Take no action
|
|
* - :zeek:see:`Notice::ACTION_LOG`
|
|
- Send the notice to the Notice::LOG logging stream.
|
|
* - :zeek:see:`Notice::ACTION_EMAIL`
|
|
- Send an email with the notice in the body.
|
|
* - :zeek:see:`Notice::ACTION_ALARM`
|
|
- Send the notice to the Notice::Alarm_LOG stream.
|
|
|
|
While actions like the ``Notice::ACTION_EMAIL`` action have appeal for
|
|
quick alerts and response, a caveat of its use is to make sure the
|
|
notices configured with this action also have a suppression. A
|
|
suppression is a means through which notices can be ignored after they
|
|
are initially raised if the author of the script has set an
|
|
identifier. An identifier is a unique string of information collected
|
|
from the connection relative to the behavior that has been observed by
|
|
Zeek.
|
|
|
|
.. code-block:: zeek
|
|
:caption: scripts/policy/protocols/ssl/expiring-certs.zeek
|
|
|
|
NOTICE([$note=Certificate_Expires_Soon,
|
|
$msg=fmt("Certificate %s is going to expire at %T", cert$subject, cert$not_valid_after),
|
|
$conn=c, $suppress_for=1day,
|
|
$identifier=cat(c$id$resp_h, c$id$resp_p, hash),
|
|
$fuid=fuid]);
|
|
|
|
In the :doc:`/scripts/policy/protocols/ssl/expiring-certs.zeek` script
|
|
which identifies when SSL certificates are set to expire and raises
|
|
notices when it crosses a predefined threshold, the call to
|
|
``NOTICE`` above also sets the ``$identifier`` entry by concatenating
|
|
the responder IP, port, and the hash of the certificate. The
|
|
selection of responder IP, port and certificate hash fits perfectly
|
|
into an appropriate identifier as it creates a unique identifier with
|
|
which the suppression can be matched. Were we to take out any of the
|
|
entities used for the identifier, for example the certificate hash, we
|
|
could be setting our suppression too broadly, causing an analyst to
|
|
miss a notice that should have been raised. Depending on the
|
|
available data for the identifier, it can be useful to set the
|
|
``$suppress_for`` variable as well. The ``expiring-certs.zeek`` script
|
|
sets ``$suppress_for`` to ``1day``, telling the Notice Framework to
|
|
suppress the notice for 24 hours after the first notice is raised.
|
|
Once that time limit has passed, another notice can be raised which
|
|
will again set the ``1day`` suppression time. Suppressing for a
|
|
specific amount of time has benefits beyond simply not filling up an
|
|
analyst's email inbox; keeping the notice alerts timely and succinct
|
|
helps avoid a case where an analyst might see the notice and, due to
|
|
over exposure, ignore it.
|
|
|
|
The ``$suppress_for`` variable can also be altered in a
|
|
``Notice::policy`` hook, allowing a deployment to better suit the
|
|
environment in which it is be run. Using the example of
|
|
``expiring-certs.zeek``, we can write a ``Notice::policy`` hook for
|
|
``SSL::Certificate_Expires_Soon`` to configure the ``$suppress_for``
|
|
variable to a shorter time.
|
|
|
|
.. literalinclude:: framework_notice_hook_suppression_01.zeek
|
|
:caption:
|
|
:language: zeek
|
|
:linenos:
|
|
:tab-width: 4
|
|
|
|
While ``Notice::policy`` hooks allow you to build custom
|
|
predicate-based policies for a deployment, there are bound to be times
|
|
where you don't require the full expressiveness that a hook allows.
|
|
In short, there will be notice policy considerations where a broad
|
|
decision can be made based on the ``Notice::Type`` alone. To
|
|
facilitate these types of decisions, the Notice Framework supports
|
|
Notice Policy shortcuts. These shortcuts are implemented through the
|
|
means of a group of data structures that map specific, predefined
|
|
details and actions to the effective name of a notice. Primarily
|
|
implemented as a set or table of enumerables of :zeek:type:`Notice::Type`,
|
|
Notice Policy shortcuts can be placed as a single directive in your
|
|
``local.zeek`` file as a concise readable configuration. As these
|
|
variables are all constants, it bears mentioning that these variables
|
|
are all set at parse-time before Zeek is fully up and running and not
|
|
set dynamically.
|
|
|
|
+------------------------------------+-----------------------------------------------------+-------------------------------------+
|
|
| Name | Description | Data Type |
|
|
+====================================+=====================================================+=====================================+
|
|
| Notice::ignored_types | Ignore the Notice::Type entirely | set[Notice::Type] |
|
|
+------------------------------------+-----------------------------------------------------+-------------------------------------+
|
|
| Notice::emailed_types | Set Notice::ACTION_EMAIL to this Notice::Type | set[Notice::Type] |
|
|
+------------------------------------+-----------------------------------------------------+-------------------------------------+
|
|
| Notice::alarmed_types | Set Notice::ACTION_ALARM to this Notice::Type | set[Notice::Type] |
|
|
+------------------------------------+-----------------------------------------------------+-------------------------------------+
|
|
| Notice::not_suppressed_types | Remove suppression from this Notice::Type | set[Notice::Type] |
|
|
+------------------------------------+-----------------------------------------------------+-------------------------------------+
|
|
| Notice::type_suppression_intervals | Alter the $suppress_for value for this Notice::Type | table[Notice::Type] of interval |
|
|
+------------------------------------+-----------------------------------------------------+-------------------------------------+
|
|
|
|
|
|
|
|
The table above details the five Notice Policy shortcuts, their
|
|
meaning and the data type used to implement them. With the exception
|
|
of ``Notice::type_suppression_intervals`` a ``set`` data type is
|
|
employed to hold the ``Notice::Type`` of the notice upon which a
|
|
shortcut should applied. The first three shortcuts are fairly self
|
|
explanatory, applying an action to the ``Notice::Type`` elements in
|
|
the set, while the latter two shortcuts alter details of the
|
|
suppression being applied to the Notice. The shortcut
|
|
``Notice::not_suppressed_types`` can be used to remove the configured
|
|
suppression from a notice while ``Notice::type_suppression_intervals``
|
|
can be used to alter the suppression interval defined by $suppress_for
|
|
in the call to ``NOTICE``.
|
|
|
|
.. literalinclude:: framework_notice_shortcuts_01.zeek
|
|
:caption:
|
|
:language: zeek
|
|
:linenos:
|
|
:tab-width: 4
|
|
|
|
The Notice Policy shortcut above adds the ``Notice::Type`` of
|
|
``SSH::Interesting_Hostname_Login`` to the
|
|
``Notice::emailed_types`` set while the shortcut below alters the length
|
|
of time for which those notices will be suppressed.
|
|
|
|
.. literalinclude:: framework_notice_shortcuts_02.zeek
|
|
:caption:
|
|
:language: zeek
|
|
:linenos:
|
|
:tab-width: 4
|