zeek/doc/frameworks/logging.rst
Jon Siwek 9967aea52c Integrate new Broxygen functionality into Sphinx.
Add a "broxygen" domain Sphinx extension w/ directives to allow
on-the-fly documentation to be generated w/ Bro and included in files.

This means all autogenerated reST docs are now done by Bro.  The odd
CMake/Python glue scipts which used to generate some portions are now
gone.  Bro and the Sphinx extension handle checking for outdated docs
themselves.

Parallel builds of `make doc` target should now work (mostly because
I don't think there's any tasks that can be done in parallel anymore).

Overall, this seems to simplify things and make the Broxygen-generated
portions of the documentation visible/traceable from the main Sphinx
source tree.  The one odd thing still is that per-script documentation
is rsync'd in to a shadow copy of the Sphinx source tree within the
build dir.  This is less elegant than using the new broxygen extension
to make per-script docs, but rsync is faster and simpler.  Simpler as in
less code because it seems like, in the best case, I'd need to write a
custom Sphinx Builder to be able to get that to even work.
2013-11-21 14:34:32 -06:00

390 lines
13 KiB
ReStructuredText

.. _framework-logging:
=================
Logging Framework
=================
.. rst-class:: opening
Bro comes with a flexible key-value based logging interface that
allows fine-grained control of what gets logged and how it is
logged. This document describes how logging can be customized and
extended.
.. contents::
Terminology
===========
Bro's logging interface is built around three main abstractions:
Log streams
A stream corresponds to a single log. It defines the set of
fields that a log consists of with their names and fields.
Examples are the ``conn`` for recording connection summaries,
and the ``http`` stream for recording HTTP activity.
Filters
Each stream has a set of filters attached to it that determine
what information gets written out. By default, each stream has
one default filter that just logs everything directly to disk
with an automatically generated file name. However, further
filters can be added to record only a subset, split a stream
into different outputs, or to even duplicate the log to
multiple outputs. If all filters are removed from a stream,
all output is disabled.
Writers
A writer defines the actual output format for the information
being logged. At the moment, Bro comes with only one type of
writer, which produces tab separated ASCII files. In the
future we will add further writers, like for binary output and
direct logging into a database.
Basics
======
The data fields that a stream records are defined by a record type
specified when it is created. Let's look at the script generating Bro's
connection summaries as an example,
:doc:`/scripts/base/protocols/conn/main.bro`. It defines a record
:bro:type:`Conn::Info` that lists all the fields that go into
``conn.log``, each marked with a ``&log`` attribute indicating that it
is part of the information written out. To write a log record, the
script then passes an instance of :bro:type:`Conn::Info` to the logging
framework's :bro:id:`Log::write` function.
By default, each stream automatically gets a filter named ``default``
that generates the normal output by recording all record fields into a
single output file.
In the following, we summarize ways in which the logging can be
customized. We continue using the connection summaries as our example
to work with.
Filtering
---------
To create a new output file for an existing stream, you can add a
new filter. A filter can, e.g., restrict the set of fields being
logged:
.. code:: bro
event bro_init()
{
# Add a new filter to the Conn::LOG stream that logs only
# timestamp and originator address.
local filter: Log::Filter = [$name="orig-only", $path="origs", $include=set("ts", "id.orig_h")];
Log::add_filter(Conn::LOG, filter);
}
Note the fields that are set for the filter:
``name``
A mandatory name for the filter that can later be used
to manipulate it further.
``path``
The filename for the output file, without any extension (which
may be automatically added by the writer). Default path values
are generated by taking the stream's ID and munging it slightly.
:bro:enum:`Conn::LOG` is converted into ``conn``,
:bro:enum:`PacketFilter::LOG` is converted into
``packet_filter``, and :bro:enum:`Known::CERTS_LOG` is
converted into ``known_certs``.
``include``
A set limiting the fields to the ones given. The names
correspond to those in the :bro:type:`Conn::Info` record, with
sub-records unrolled by concatenating fields (separated with
dots).
Using the code above, you will now get a new log file ``origs.log``
that looks like this::
#separator \x09
#path origs
#fields ts id.orig_h
#types time addr
1128727430.350788 141.42.64.125
1128727435.450898 141.42.64.125
If you want to make this the only log file for the stream, you can
remove the default filter (which, conveniently, has the name
``default``):
.. code:: bro
event bro_init()
{
# Remove the filter called "default".
Log::remove_filter(Conn::LOG, "default");
}
An alternate approach to "turning off" a log is to completely disable
the stream:
.. code:: bro
event bro_init()
{
Log::disable_stream(Conn::LOG);
}
If you want to skip only some fields but keep the rest, there is a
corresponding ``exclude`` filter attribute that you can use instead of
``include`` to list only the ones you are not interested in.
A filter can also determine output paths *dynamically* based on the
record being logged. That allows, e.g., to record local and remote
connections into separate files. To do this, you define a function
that returns the desired path:
.. code:: bro
function split_log(id: Log::ID, path: string, rec: Conn::Info) : string
{
# Return "conn-local" if originator is a local IP, otherwise "conn-remote".
local lr = Site::is_local_addr(rec$id$orig_h) ? "local" : "remote";
return fmt("%s-%s", path, lr);
}
event bro_init()
{
local filter: Log::Filter = [$name="conn-split", $path_func=split_log, $include=set("ts", "id.orig_h")];
Log::add_filter(Conn::LOG, filter);
}
Running this will now produce two files, ``local.log`` and
``remote.log``, with the corresponding entries. One could extend this
further for example to log information by subnets or even by IP
address. Be careful, however, as it is easy to create many files very
quickly ...
.. sidebar:: A More Generic Path Function
The ``split_log`` method has one draw-back: it can be used
only with the :bro:enum:`Conn::LOG` stream as the record type is hardcoded
into its argument list. However, Bro allows to do a more generic
variant:
.. code:: bro
function split_log(id: Log::ID, path: string, rec: record { id: conn_id; } ) : string
{
return Site::is_local_addr(rec$id$orig_h) ? "local" : "remote";
}
This function can be used with all log streams that have records
containing an ``id: conn_id`` field.
While so far we have seen how to customize the columns being logged,
you can also control which records are written out by providing a
predicate that will be called for each log record:
.. code:: bro
function http_only(rec: Conn::Info) : bool
{
# Record only connections with successfully analyzed HTTP traffic
return rec$service == "http";
}
event bro_init()
{
local filter: Log::Filter = [$name="http-only", $path="conn-http", $pred=http_only];
Log::add_filter(Conn::LOG, filter);
}
This will result in a log file ``conn-http.log`` that contains only
traffic detected and analyzed as HTTP traffic.
Extending
---------
You can add further fields to a log stream by extending the record
type that defines its content. Let's say we want to add a boolean
field ``is_private`` to :bro:type:`Conn::Info` that indicates whether the
originator IP address is part of the :rfc:`1918` space:
.. code:: bro
# Add a field to the connection log record.
redef record Conn::Info += {
## Indicate if the originator of the connection is part of the
## "private" address space defined in RFC1918.
is_private: bool &default=F &log;
};
Now we need to set the field. A connection's summary is generated at
the time its state is removed from memory. We can add another handler
at that time that sets our field correctly:
.. code:: bro
event connection_state_remove(c: connection)
{
if ( c$id$orig_h in Site::private_address_space )
c$conn$is_private = T;
}
Now ``conn.log`` will show a new field ``is_private`` of type
``bool``.
Notes:
- For extending logs this way, one needs a bit of knowledge about how
the script that creates the log stream is organizing its state
keeping. Most of the standard Bro scripts attach their log state to
the :bro:type:`connection` record where it can then be accessed, just
as the ``c$conn`` above. For example, the HTTP analysis adds a field
``http`` of type :bro:type:`HTTP::Info` to the :bro:type:`connection`
record. See the script reference for more information.
- When extending records as shown above, the new fields must always be
declared either with a ``&default`` value or as ``&optional``.
Furthermore, you need to add the ``&log`` attribute or otherwise the
field won't appear in the output.
Hooking into the Logging
------------------------
Sometimes it is helpful to do additional analysis of the information
being logged. For these cases, a stream can specify an event that will
be generated every time a log record is written to it. All of Bro's
default log streams define such an event. For example, the connection
log stream raises the event :bro:id:`Conn::log_conn`. You
could use that for example for flagging when a connection to a
specific destination exceeds a certain duration:
.. code:: bro
redef enum Notice::Type += {
## Indicates that a connection remained established longer
## than 5 minutes.
Long_Conn_Found
};
event Conn::log_conn(rec: Conn::Info)
{
if ( rec$duration > 5mins )
NOTICE([$note=Long_Conn_Found,
$msg=fmt("unusually long conn to %s", rec$id$resp_h),
$id=rec$id]);
}
Often, these events can be an alternative to post-processing Bro logs
externally with Perl scripts. Much of what such an external script
would do later offline, one may instead do directly inside of Bro in
real-time.
Rotation
--------
By default, no log rotation occurs, but it's globally controllable for all
filters by redefining the :bro:id:`Log::default_rotation_interval` option:
.. code:: bro
redef Log::default_rotation_interval = 1 hr;
Or specifically for certain :bro:type:`Log::Filter` instances by setting
their ``interv`` field. Here's an example of changing just the
:bro:enum:`Conn::LOG` stream's default filter rotation.
.. code:: bro
event bro_init()
{
local f = Log::get_filter(Conn::LOG, "default");
f$interv = 1 min;
Log::remove_filter(Conn::LOG, "default");
Log::add_filter(Conn::LOG, f);
}
ASCII Writer Configuration
--------------------------
The ASCII writer has a number of options for customizing the format of
its output, see :doc:`/scripts/base/frameworks/logging/writers/ascii.bro`.
Adding Streams
==============
It's easy to create a new log stream for custom scripts. Here's an
example for the ``Foo`` module:
.. code:: bro
module Foo;
export {
# Create an ID for our new stream. By convention, this is
# called "LOG".
redef enum Log::ID += { LOG };
# Define the fields. By convention, the type is called "Info".
type Info: record {
ts: time &log;
id: conn_id &log;
};
# Define a hook event. By convention, this is called
# "log_<stream>".
global log_foo: event(rec: Info);
}
# This event should be handled at a higher priority so that when
# users modify your stream later and they do it at priority 0,
# their code runs after this.
event bro_init() &priority=5
{
# Create the stream. This also adds a default filter automatically.
Log::create_stream(Foo::LOG, [$columns=Info, $ev=log_foo]);
}
You can also add the state to the :bro:type:`connection` record to make
it easily accessible across event handlers:
.. code:: bro
redef record connection += {
foo: Info &optional;
}
Now you can use the :bro:id:`Log::write` method to output log records and
save the logged ``Foo::Info`` record into the connection record:
.. code:: bro
event connection_established(c: connection)
{
local rec: Foo::Info = [$ts=network_time(), $id=c$id];
c$foo = rec;
Log::write(Foo::LOG, rec);
}
See the existing scripts for how to work with such a new connection
field. A simple example is :doc:`/scripts/base/protocols/syslog/main.bro`.
When you are developing scripts that add data to the :bro:type:`connection`
record, care must be given to when and how long data is stored.
Normally data saved to the connection record will remain there for the
duration of the connection and from a practical perspective it's not
uncommon to need to delete that data before the end of the connection.
Other Writers
-------------
Bro supports the following output formats other than ASCII:
.. toctree::
:maxdepth: 1
logging-dataseries
logging-elasticsearch
logging-input-sqlite