mirror of
https://github.com/zeek/zeek.git
synced 2025-10-15 21:18:20 +00:00
Reorganize documentation index
Moved examples and use cases into a common subdir so those topics will get grouped together in the main sidebar/TOC
This commit is contained in:
parent
b1b7027982
commit
c46b018fdb
46 changed files with 27 additions and 16 deletions
308
doc/examples/logs/index.rst
Normal file
308
doc/examples/logs/index.rst
Normal file
|
@ -0,0 +1,308 @@
|
|||
|
||||
.. _bro-logging:
|
||||
|
||||
=======
|
||||
Logging
|
||||
=======
|
||||
|
||||
Once Bro has been deployed in an environment and monitoring live
|
||||
traffic, it will, in its default configuration, begin to produce
|
||||
human-readable ASCII logs. Each log file, produced by Bro's
|
||||
:ref:`framework-logging`, is populated with organized, mostly
|
||||
connection-oriented data. As the standard log files are simple ASCII
|
||||
data, working with the data contained in them can be done from a
|
||||
command line terminal once you have been familiarized with the types
|
||||
of data that can be found in each file. In the following, we work
|
||||
through the logs general structure and then examine some standard ways
|
||||
of working with them.
|
||||
|
||||
----------------------
|
||||
Working with Log Files
|
||||
----------------------
|
||||
|
||||
Generally, all of Bro's log files are produced by a corresponding
|
||||
script that defines their individual structure. However, as each log
|
||||
file flows through the Logging Framework, they share a set of
|
||||
structural similarities. Without breaking into the scripting aspect of
|
||||
Bro here, a bird's eye view of how the log files are produced
|
||||
progresses as follows. The script's author defines the kinds of data,
|
||||
such as the originating IP address or the duration of a connection,
|
||||
which will make up the fields (i.e., columns) of the log file. The
|
||||
author then decides what network activity should generate a single log
|
||||
file entry (i.e., one line). For example, this could be a connection
|
||||
having been completed or an HTTP ``GET`` request being issued by an
|
||||
originator. When these behaviors are observed during operation, the
|
||||
data is passed to the Logging Framework which adds the entry
|
||||
to the appropriate log file.
|
||||
|
||||
As the fields of the log entries can be further customized by the
|
||||
user, the Logging Framework makes use of a header block to ensure that
|
||||
it remains self-describing. Here's the first few lines of a ``conn.log``.
|
||||
|
||||
.. sourcecode:: console
|
||||
|
||||
$ cat conn.log
|
||||
#separator \x09
|
||||
#set_separator ,
|
||||
#empty_field (empty)
|
||||
#unset_field -
|
||||
#path conn
|
||||
#open 2018-12-10-22-18-00
|
||||
#fields ts uid id.orig_h id.orig_p id.resp_h id.resp_p proto service duration orig_bytes resp_bytes conn_state local_orig local_resp missed_bytes history orig_pkts orig_ip_bytes resp_pkts resp_ip_bytes tunnel_parents
|
||||
#types time string addr port addr port enum string interval count count string bool bool count string count count count count set[string]
|
||||
1300475167.096535 CHhAvVGS1DHFjwGM9 141.142.220.202 5353 224.0.0.251 5353 udp dns - - - S0 - - 0 D 1 73 0 0 -
|
||||
1300475167.097012 ClEkJM2Vm5giqnMf4h fe80::217:f2ff:fed7:cf65 5353 ff02::fb 5353 udp dns - - - S0 - - 0 D 1 199 0 0 -
|
||||
1300475167.099816 C4J4Th3PJpwUYZZ6gc 141.142.220.50 5353 224.0.0.251 5353 udp dns - - - S0 - - 0 D 1 179 0 0 -
|
||||
[...]
|
||||
|
||||
As you can see, the header consists of lines prefixed by ``#`` and
|
||||
includes information such as what separators are being used for
|
||||
various types of data, what an empty field looks like and what an
|
||||
unset field looks like. In this example, the default TAB separator is
|
||||
being used as the delimiter between fields (``\x09`` is the tab
|
||||
character in hex). It also lists the comma as the separator for set
|
||||
data, the string ``(empty)`` as the indicator for an empty field and
|
||||
the ``-`` character as the indicator for a field that hasn't been set.
|
||||
The timestamp for when the file was created is included under
|
||||
``#open``. The header then goes on to detail the fields being listed
|
||||
in the file and the data types of those fields, in ``#fields`` and
|
||||
``#types``, respectively. These two entries are often the two most
|
||||
significant points of interest as they detail not only the field names
|
||||
but the data types used. When navigating through the different log
|
||||
files with tools like ``sed``, ``awk``, or ``grep``, having the field
|
||||
definitions readily available saves the user some mental leg work. The
|
||||
field names are also a key resource for using the :ref:`bro-cut
|
||||
<bro-cut>` utility included with Bro, see below.
|
||||
|
||||
Next to the header follows the main content. In this example we see 7
|
||||
connections with their key properties, such as originator and
|
||||
responder IP addresses (note how Bro transparently handles both IPv4 and
|
||||
IPv6), transport-layer ports, application-layer services ( - the
|
||||
``service`` field is filled in as Bro determines a specific protocol to
|
||||
be in use, independent of the connection's ports), payload size, and
|
||||
more. See :bro:type:`Conn::Info` for a description of all fields.
|
||||
|
||||
In addition to ``conn.log``, Bro generates many further logs by
|
||||
default, including:
|
||||
|
||||
``dpd.log``
|
||||
A summary of protocols encountered on non-standard ports.
|
||||
|
||||
``dns.log``
|
||||
All DNS activity.
|
||||
|
||||
``ftp.log``
|
||||
A log of FTP session-level activity.
|
||||
|
||||
``files.log``
|
||||
Summaries of files transferred over the network. This information
|
||||
is aggregated from different protocols, including HTTP, FTP, and
|
||||
SMTP.
|
||||
|
||||
``http.log``
|
||||
A summary of all HTTP requests with their replies.
|
||||
|
||||
``known_certs.log``
|
||||
SSL certificates seen in use.
|
||||
|
||||
``smtp.log``
|
||||
A summary of SMTP activity.
|
||||
|
||||
``ssl.log``
|
||||
A record of SSL sessions, including certificates being used.
|
||||
|
||||
``weird.log``
|
||||
A log of unexpected protocol-level activity. Whenever Bro's
|
||||
protocol analysis encounters a situation it would not expect
|
||||
(e.g., an RFC violation) it logs it in this file. Note that in
|
||||
practice, real-world networks tend to exhibit a large number of
|
||||
such "crud" that is usually not worth following up on.
|
||||
|
||||
As you can see, some log files are specific to a particular protocol,
|
||||
while others aggregate information across different types of activity.
|
||||
For a complete list of log files and a description of its purpose,
|
||||
see :doc:`Log Files </script-reference/log-files>`.
|
||||
|
||||
.. _bro-cut:
|
||||
|
||||
Using ``bro-cut``
|
||||
-----------------
|
||||
|
||||
The ``bro-cut`` utility can be used in place of other tools to build
|
||||
terminal commands that remain flexible and accurate independent of
|
||||
possible changes to the log file itself. It accomplishes this by parsing
|
||||
the header in each file and allowing the user to refer to the specific
|
||||
columnar data available (in contrast to tools like ``awk`` that
|
||||
require the user to refer to fields referenced by their position).
|
||||
For example, the following command extracts just the given columns
|
||||
from a ``conn.log``:
|
||||
|
||||
.. sourcecode:: console
|
||||
|
||||
$ cat conn.log | bro-cut id.orig_h id.orig_p id.resp_h duration
|
||||
141.142.220.202 5353 224.0.0.251 -
|
||||
fe80::217:f2ff:fed7:cf65 5353 ff02::fb -
|
||||
141.142.220.50 5353 224.0.0.251 -
|
||||
141.142.220.118 43927 141.142.2.2 0.000435
|
||||
141.142.220.118 37676 141.142.2.2 0.000420
|
||||
141.142.220.118 40526 141.142.2.2 0.000392
|
||||
141.142.220.118 32902 141.142.2.2 0.000317
|
||||
141.142.220.118 59816 141.142.2.2 0.000343
|
||||
141.142.220.118 59714 141.142.2.2 0.000375
|
||||
141.142.220.118 58206 141.142.2.2 0.000339
|
||||
[...]
|
||||
|
||||
The corresponding ``awk`` command will look like this:
|
||||
|
||||
.. sourcecode:: console
|
||||
|
||||
$ awk '/^[^#]/ {print $3, $4, $5, $6, $9}' conn.log
|
||||
141.142.220.202 5353 224.0.0.251 5353 -
|
||||
fe80::217:f2ff:fed7:cf65 5353 ff02::fb 5353 -
|
||||
141.142.220.50 5353 224.0.0.251 5353 -
|
||||
141.142.220.118 43927 141.142.2.2 53 0.000435
|
||||
141.142.220.118 37676 141.142.2.2 53 0.000420
|
||||
141.142.220.118 40526 141.142.2.2 53 0.000392
|
||||
141.142.220.118 32902 141.142.2.2 53 0.000317
|
||||
141.142.220.118 59816 141.142.2.2 53 0.000343
|
||||
141.142.220.118 59714 141.142.2.2 53 0.000375
|
||||
141.142.220.118 58206 141.142.2.2 53 0.000339
|
||||
[...]
|
||||
|
||||
While the output is similar, the advantages to using bro-cut over
|
||||
``awk`` lay in that, while ``awk`` is flexible and powerful, ``bro-cut``
|
||||
was specifically designed to work with Bro's log files. Firstly, the
|
||||
``bro-cut`` output includes only the log file entries, while the
|
||||
``awk`` solution needs to skip the header manually. Secondly, since
|
||||
``bro-cut`` uses the field descriptors to identify and extract data,
|
||||
it allows for flexibility independent of the format and contents of
|
||||
the log file. It's not uncommon for a Bro configuration to add extra
|
||||
fields to various log files as required by the environment. In this
|
||||
case, the fields in the ``awk`` command would have to be altered to
|
||||
compensate for the new position whereas the ``bro-cut`` output would
|
||||
not change.
|
||||
|
||||
.. note::
|
||||
|
||||
The sequence of field names given to ``bro-cut`` determines the
|
||||
output order, which means you can also use ``bro-cut`` to reorder
|
||||
fields. That can be helpful when piping into, e.g., ``sort``.
|
||||
|
||||
As you may have noticed, the command for ``bro-cut`` uses the output
|
||||
redirection through the ``cat`` command and ``|`` operator. Whereas
|
||||
tools like ``awk`` allow you to indicate the log file as a command
|
||||
line option, bro-cut only takes input through redirection such as
|
||||
``|`` and ``<``. There are a couple of ways to direct log file data
|
||||
into ``bro-cut``, each dependent upon the type of log file you're
|
||||
processing. A caveat of its use, however, is that all of the
|
||||
header lines must be present.
|
||||
|
||||
.. note::
|
||||
|
||||
``bro-cut`` provides an option ``-c`` to include a corresponding
|
||||
format header into the output, which allows to chain multiple
|
||||
``bro-cut`` instances or perform further post-processing that
|
||||
evaluates the header information.
|
||||
|
||||
In its default setup, Bro will rotate log files on an hourly basis,
|
||||
moving the current log file into a directory with format
|
||||
``YYYY-MM-DD`` and gzip compressing the file with a file format that
|
||||
includes the log file type and time range of the file. In the case of
|
||||
processing a compressed log file you simply adjust your command line
|
||||
tools to use the complementary ``z*`` versions of commands such as ``cat``
|
||||
(``zcat``) or ``grep`` (``zgrep``).
|
||||
|
||||
Working with Timestamps
|
||||
-----------------------
|
||||
|
||||
``bro-cut`` accepts the flag ``-d`` to convert the epoch time values
|
||||
in the log files to human-readable format. The following command
|
||||
includes the human readable time stamp, the unique identifier, the
|
||||
HTTP ``Host``, and HTTP ``URI`` as extracted from the ``http.log``
|
||||
file:
|
||||
|
||||
.. sourcecode:: console
|
||||
|
||||
$ bro-cut -d ts uid host uri < http.log
|
||||
2011-03-18T19:06:08+0000 CUM0KZ3MLUfNB0cl11 bits.wikimedia.org /skins-1.5/monobook/main.css
|
||||
2011-03-18T19:06:08+0000 CwjjYJ2WqgTbAqiHl6 upload.wikimedia.org /wikipedia/commons/6/63/Wikipedia-logo.png
|
||||
2011-03-18T19:06:08+0000 C3eiCBGOLw3VtHfOj upload.wikimedia.org /wikipedia/commons/thumb/b/bb/Wikipedia_wordmark.svg/174px-Wikipedia_wordmark.svg.png
|
||||
2011-03-18T19:06:08+0000 Ck51lg1bScffFj34Ri upload.wikimedia.org /wikipedia/commons/b/bd/Bookshelf-40x201_6.png
|
||||
2011-03-18T19:06:08+0000 CtxTCR2Yer0FR1tIBg upload.wikimedia.org /wikipedia/commons/thumb/8/8a/Wikinews-logo.png/35px-Wikinews-logo.png
|
||||
[...]
|
||||
|
||||
Often times log files from multiple sources are stored in UTC time to
|
||||
allow easy correlation. Converting the timestamp from a log file to
|
||||
UTC can be accomplished with the ``-u`` option:
|
||||
|
||||
.. sourcecode:: console
|
||||
|
||||
$ bro-cut -u ts uid host uri < http.log
|
||||
2011-03-18T19:06:08+0000 CUM0KZ3MLUfNB0cl11 bits.wikimedia.org /skins-1.5/monobook/main.css
|
||||
2011-03-18T19:06:08+0000 CwjjYJ2WqgTbAqiHl6 upload.wikimedia.org /wikipedia/commons/6/63/Wikipedia-logo.png
|
||||
2011-03-18T19:06:08+0000 C3eiCBGOLw3VtHfOj upload.wikimedia.org /wikipedia/commons/thumb/b/bb/Wikipedia_wordmark.svg/174px-Wikipedia_wordmark.svg.png
|
||||
2011-03-18T19:06:08+0000 Ck51lg1bScffFj34Ri upload.wikimedia.org /wikipedia/commons/b/bd/Bookshelf-40x201_6.png
|
||||
2011-03-18T19:06:08+0000 CtxTCR2Yer0FR1tIBg upload.wikimedia.org /wikipedia/commons/thumb/8/8a/Wikinews-logo.png/35px-Wikinews-logo.png
|
||||
[...]
|
||||
|
||||
The default time format when using the ``-d`` or ``-u`` is the
|
||||
``strftime`` format string ``%Y-%m-%dT%H:%M:%S%z`` which results in a
|
||||
string with year, month, day of month, followed by hour, minutes,
|
||||
seconds and the timezone offset. The default format can be altered by
|
||||
using the ``-D`` and ``-U`` flags, using the standard ``strftime``
|
||||
syntax. For example, to format the timestamp in the US-typical "Middle
|
||||
Endian" you could use a format string of: ``%d-%m-%YT%H:%M:%S%z``
|
||||
|
||||
.. sourcecode:: console
|
||||
|
||||
$ bro-cut -D %d-%m-%YT%H:%M:%S%z ts uid host uri < http.log
|
||||
18-03-2011T19:06:08+0000 CUM0KZ3MLUfNB0cl11 bits.wikimedia.org /skins-1.5/monobook/main.css
|
||||
18-03-2011T19:06:08+0000 CwjjYJ2WqgTbAqiHl6 upload.wikimedia.org /wikipedia/commons/6/63/Wikipedia-logo.png
|
||||
18-03-2011T19:06:08+0000 C3eiCBGOLw3VtHfOj upload.wikimedia.org /wikipedia/commons/thumb/b/bb/Wikipedia_wordmark.svg/174px-Wikipedia_wordmark.svg.png
|
||||
18-03-2011T19:06:08+0000 Ck51lg1bScffFj34Ri upload.wikimedia.org /wikipedia/commons/b/bd/Bookshelf-40x201_6.png
|
||||
18-03-2011T19:06:08+0000 CtxTCR2Yer0FR1tIBg upload.wikimedia.org /wikipedia/commons/thumb/8/8a/Wikinews-logo.png/35px-Wikinews-logo.png
|
||||
[...]
|
||||
|
||||
See ``man strfime`` for more options for the format string.
|
||||
|
||||
Using UIDs
|
||||
----------
|
||||
|
||||
While Bro can do signature-based analysis, its primary focus is on
|
||||
behavioral detection which alters the practice of log review from
|
||||
"reactionary review" to a process a little more akin to a hunting
|
||||
trip. A common progression of review includes correlating a session
|
||||
across multiple log files. As a connection is processed by Bro, a
|
||||
unique identifier is assigned to each session. This unique identifier
|
||||
is generally included in any log file entry associated with that
|
||||
connection and can be used to cross-reference different log files.
|
||||
|
||||
A simple example would be to cross-reference a UID seen in a
|
||||
``conn.log`` file. Here, we're looking for the connection with the
|
||||
largest number of bytes from the responder by redirecting the output
|
||||
for ``cat conn.log`` into bro-cut to extract the UID and the
|
||||
resp_bytes, then sorting that output by the resp_bytes field.
|
||||
|
||||
.. sourcecode:: console
|
||||
|
||||
$ cat conn.log | bro-cut uid resp_bytes | sort -nrk2 | head -5
|
||||
CwjjYJ2WqgTbAqiHl6 734
|
||||
CtxTCR2Yer0FR1tIBg 734
|
||||
Ck51lg1bScffFj34Ri 734
|
||||
CLNN1k2QMum1aexUK7 734
|
||||
CykQaM33ztNt0csB9a 733
|
||||
|
||||
Taking the UID of the first of the top responses, we can now
|
||||
crossreference that with the UIDs in the ``http.log`` file.
|
||||
|
||||
.. sourcecode:: console
|
||||
|
||||
$ cat http.log | bro-cut uid id.resp_h method status_code host uri | grep UM0KZ3MLUfNB0cl11
|
||||
CUM0KZ3MLUfNB0cl11 208.80.152.118 GET 304 bits.wikimedia.org /skins-1.5/monobook/main.css
|
||||
|
||||
As you can see there are two HTTP ``GET`` requests within the
|
||||
session that Bro identified and logged. Given that HTTP is a stream
|
||||
protocol, it can have multiple ``GET``/``POST``/etc requests in a
|
||||
stream and Bro is able to extract and track that information for you,
|
||||
giving you an in-depth and structured view into HTTP traffic on your
|
||||
network.
|
Loading…
Add table
Add a link
Reference in a new issue