zeek/doc/log-formats.rst
Tim Wojtulewicz ded98cd373 Copy docs into Zeek repo directly
This is based on commit 2731def9159247e6da8a3191783c89683363689c from the
zeek-docs repo.
2025-09-26 02:58:29 +00:00

657 lines
30 KiB
ReStructuredText
Raw Permalink Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

.. _logschema: https://github.com/zeek/logschema
===============================
Zeek Log Formats and Inspection
===============================
Zeek creates a variety of logs when run in its default configuration. This
data can be intimidating for a first-time user. In this section, we will
process a sample packet trace with Zeek, and take a brief look at the sorts
of logs Zeek creates. We will look at logs created in Zeek's traditional TSV
format, how to switch to logging in JSON format, and assorted tooling to
help you work with the logs. Finally, we'll cover Zeek's support for log
schemas that describe what Zeek's logs look like in detail.
Working with a Sample Trace
===========================
For the examples that follow, we will use Zeek on a Linux system to process
network traffic captured and stored to disk. We saved this trace file earlier
in packet capture (PCAP) format as :file:`tm1t.pcap`. The command line protocol
analyzer Tcpdump, which ships with most Unix-like distributions, summarizes the
contents of this file.
.. code-block:: console
zeek@zeek:~/zeek-test$ tcpdump -n -r tm1t.pcap
::
reading from file tm1t.pcap, link-type EN10MB (Ethernet)
14:39:59.305988 IP 192.168.4.76.36844 > 192.168.4.1.53: 19671+ A? testmyids.com. (31)
14:39:59.306059 IP 192.168.4.76.36844 > 192.168.4.1.53: 8555+ AAAA? testmyids.com. (31)
14:39:59.354577 IP 192.168.4.1.53 > 192.168.4.76.36844: 8555 0/1/0 (94)
14:39:59.372840 IP 192.168.4.1.53 > 192.168.4.76.36844: 19671 1/0/0 A 31.3.245.133 (47)
14:39:59.430166 IP 192.168.4.76.46378 > 31.3.245.133.80: Flags [S], seq 3723031366, win 65535, options [mss 1460,sackOK,TS val 3137978796 ecr 0,nop,wscale 11], length 0
14:39:59.512232 IP 31.3.245.133.80 > 192.168.4.76.46378: Flags [S.], seq 2993782376, ack 3723031367, win 28960, options [mss 1460,sackOK,TS val 346747623 ecr 3137978796,nop,wscale 7], length 0
14:39:59.512284 IP 192.168.4.76.46378 > 31.3.245.133.80: Flags [.], ack 1, win 32, options [nop,nop,TS val 3137978878 ecr 346747623], length 0
14:39:59.512593 IP 192.168.4.76.46378 > 31.3.245.133.80: Flags [P.], seq 1:78, ack 1, win 32, options [nop,nop,TS val 3137978878 ecr 346747623], length 77: HTTP: GET / HTTP/1.1
14:39:59.600488 IP 31.3.245.133.80 > 192.168.4.76.46378: Flags [.], ack 78, win 227, options [nop,nop,TS val 346747711 ecr 3137978878], length 0
14:39:59.604000 IP 31.3.245.133.80 > 192.168.4.76.46378: Flags [P.], seq 1:296, ack 78, win 227, options [nop,nop,TS val 346747713 ecr 3137978878], length 295: HTTP: HTTP/1.1 200 OK
14:39:59.604020 IP 192.168.4.76.46378 > 31.3.245.133.80: Flags [.], ack 296, win 33, options [nop,nop,TS val 3137978970 ecr 346747713], length 0
14:39:59.604493 IP 192.168.4.76.46378 > 31.3.245.133.80: Flags [F.], seq 78, ack 296, win 33, options [nop,nop,TS val 3137978970 ecr 346747713], length 0
14:39:59.684281 IP 31.3.245.133.80 > 192.168.4.76.46378: Flags [F.], seq 296, ack 79, win 227, options [nop,nop,TS val 346747796 ecr 3137978970], length 0
14:39:59.684346 IP 192.168.4.76.46378 > 31.3.245.133.80: Flags [.], ack 297, win 33, options [nop,nop,TS val 3137979050 ecr 346747796], length 0
This is a simple exchange involving domain name system (DNS) traffic followed
by HyperText Transfer Protocol (HTTP) traffic.
Rather than run Zeek against a live interface, we will ask Zeek to digest this
trace. This process allows us to vary Zeeks run-time operation, keeping the
traffic constant.
First we make two directories to store the log files that Zeek will produce.
Then we will move into the “default” directory.
.. code-block:: console
zeek@zeek:~/zeek-test$ mkdir default
zeek@zeek:~/zeek-test$ mkdir json
zeek@zeek:~/zeek-test$ cd default/
Zeek TSV Format Logs
====================
From this location on disk, we tell Zeek to digest the :file:`tm1t.pcap` file.
.. code-block:: console
zeek@zeek:~/zeek-test/default$ zeek -C -r ../tm1t.pcap
The ``-r`` flag tells Zeek where to find the trace of interest.
The ``-C`` flag tells Zeek to ignore any TCP checksum errors. This happens on
many systems due to a feature called “checksum offloading,” but it does not
affect our analysis.
Zeek completes its task without reporting anything to the command line. This is
standard Unix-like behavior. Using the :program:`ls` command we see what files
Zeek created when processing the trace.
.. code-block:: console
zeek@zeek:~/zeek-test/default$ ls -al
::
total 28
drwxrwxr-x 2 zeek zeek 4096 Jun 5 14:48 .
drwxrwxr-x 4 zeek zeek 4096 Jun 5 14:43 ..
-rw-rw-r-- 1 zeek zeek 737 Jun 5 14:48 conn.log
-rw-rw-r-- 1 zeek zeek 778 Jun 5 14:48 dns.log
-rw-rw-r-- 1 zeek zeek 712 Jun 5 14:48 files.log
-rw-rw-r-- 1 zeek zeek 883 Jun 5 14:48 http.log
-rw-rw-r-- 1 zeek zeek 254 Jun 5 14:48 packet_filter.log
Zeek created five files. We will look at the contents of Zeek log data in
detail in later sections. For now, we will take a quick look at each file,
beginning with the :file:`conn.log`.
We use the :program:`cat` command to show the contents of each log.
.. code-block:: console
zeek@zeek:~/zeek-test/default$ cat conn.log
::
#separator \x09
#set_separator ,
#empty_field (empty)
#unset_field -
#path conn
#open 2020-06-05-14-48-32
#fields ts uid id.orig_h id.orig_p id.resp_h id.resp_p proto service duration orig_bytes resp_bytes conn_state local_orig local_resp missed_bytes history orig_pkts orig_ip_bytes resp_pkts resp_ip_bytes tunnel_parents ip_proto
#types time string addr port addr port enum string interval count count string bool bool count string count count count count set[string] count
1591367999.305988 CazOhH2qDUiJTWMCY 192.168.4.76 36844 192.168.4.1 53 udp dns 0.066852 62 141 SF - -0 Dd 2 118 2 197 - 17
1591367999.430166 CLqEx41jYPOdfHF586 192.168.4.76 46378 31.3.245.133 80 tcp http 0.254115 77 295 SF - -0 ShADadFf 6 397 4 511 - 6
#close 2020-06-05-14-48-32
Next we look at Zeeks :file:`dns.log`.
.. code-block:: console
zeek@zeek:~/zeek-test/default$ cat dns.log
::
#separator \x09
#set_separator ,
#empty_field (empty)
#unset_field -
#path dns
#open 2020-06-05-14-48-32
#fields ts uid id.orig_h id.orig_p id.resp_h id.resp_p proto trans_id rtt query qclass qclass_name qtypeqtype_name rcode rcode_name AA TC RD RA Z answers TTLs rejected
#types time string addr port addr port enum count interval string count string count string count string bool bool bool bool count vector[string] vector[interval] bool
1591367999.306059 CazOhH2qDUiJTWMCY 192.168.4.76 36844 192.168.4.1 53 udp 8555 - testmyids.com 1 C_INTERNET 28 AAAA 0 NOERROR F F T F 0 - - F
1591367999.305988 CazOhH2qDUiJTWMCY 192.168.4.76 36844 192.168.4.1 53 udp 19671 0.066852 testmyids.com 1 C_INTERNET 1 A 0 NOERROR F F T T 0 31.3.245.133 3600.000000 F
#close 2020-06-05-14-48-32
Next we look at Zeeks :file:`files.log`.
.. code-block:: console
zeek@zeek:~/zeek-test/default$ cat files.log
::
#separator \x09
#set_separator ,
#empty_field (empty)
#unset_field -
#path files
#open 2020-06-05-14-48-32
#fields ts fuid uid id.orig_h id.origh_p id.resp_h id.resp_p source depth analyzers mime_type filename duration local_orig is_orig seen_bytes total_bytes missing_bytes overflow_bytes timedout parent_fuid md5 sha1 sha256 extracted extracted_cutoff extracted_size
#types time string string addr port addr port string count set[string] string string interval bool bool countcount count count bool string string string string string bool count
1591367999.604000 FEEsZS1w0Z0VJIb5x4 CLqEx41jYPOdfHF586 192.168.4.76 46378 31.3.245.133 80 HTTP 0 (empty) text/plain - 0.000000 - F 39 39 0 0 F - - - - - - -
#close 2020-06-05-14-48-32
Next we look at Zeeks :file:`http.log`.
.. code-block:: console
zeek@zeek:~/zeek-test/default$ cat http.log
::
#separator \x09
#set_separator ,
#empty_field (empty)
#unset_field -
#path http
#open 2020-06-05-14-48-32
#fields ts uid id.orig_h id.orig_p id.resp_h id.resp_p trans_depth method host uri referrer version user_agent origin request_body_len response_body_len status_code status_msg info_code info_msg tags username password proxied orig_fuids orig_filenames orig_mime_types resp_fuids resp_filenames resp_mime_types
#types time string addr port addr port count string string string string string string string count count count string countstring set[enum] string string set[string] vector[string] vector[string] vector[string] vector[string] vector[string] vector[string]
1591367999.512593 CLqEx41jYPOdfHF586 192.168.4.76 46378 31.3.245.133 80 1 GET testmyids.com / - 1.1 curl/7.47.0 - 0 39 200 OK - - (empty) - - - - - - FEEsZS1w0Z0VJIb5x4 - text/plain
#close 2020-06-05-14-48-32
Finally, we look at Zeeks :file:`packet_filter.log`. This log shows any
filters that Zeek applied when processing the trace.
.. code-block:: console
zeek@zeek:~/zeek-test/default$ cat packet_filter.log
::
#separator \x09
#set_separator ,
#empty_field (empty)
#unset_field -
#path packet_filter
#open 2020-06-05-14-48-32
#fields ts node filter init success
#types time string string bool bool
1591368512.420771 zeek ip or not ip T T
#close 2020-06-05-14-48-32
As we can see with each log file, there is a set of headers beginning with the
hash character (``#``) followed by metadata about the trace. This format is the
standard version of Zeek data, represented as tab separated values (TSV).
Interpreting this data as shown requires remembering which “column” applies to
which “value.” For example, in the :file:`dns.log`, the third field is
``id.orig_h``, so when we see data in that field, such as ``192.168.4.76``, we
know that ``192.168.4.76`` is ``id.orig_h``.
One of the common use cases for interacting with Zeek log files requires
analyzing specific fields. Investigators may not need to see all of the fields
produced by Zeek when solving a certain problem. The following sections offer a
few ways to address this concern when processing Zeek logs in text format.
Zeek TSV Format and :program:`awk`
==================================
A very traditional way of interacting with Zeek logs involves using native
Unix-like text processing tools like :program:`awk`. Awk requires specifying
the fields of interest as positions in the log file. Take a second look at the
:file:`dns.log` entry above, and consider the parameters necessary to view only
the source IP address, the query, and the response. These values appear in the
3rd, 10th, and 22nd fields in the Zeek TSV log entries. Therefore, we could
invoke :program:`awk` using the following syntax:
.. code-block:: console
zeek@zeek:~/zeek-test/default$ awk '/^[^#]/ {print $3, $10, $22}' dns.log
::
192.168.4.76 testmyids.com -
192.168.4.76 testmyids.com 31.3.245.133
Now we have a much more compact view, with just the fields we want.
Unfortunately, this requires specifying fields by location. If we were to
modify the log output, or if the Zeek project were to change the log output,
any scripts we built using :program:`awk` and field locations would require
modification. For this reason, the Zeek project recommends alternatives like
the following.
Zeek TSV Format and :program:`zeek-cut`
=======================================
The Zeek project provides a tool called :program:`zeek-cut` to make it easier
for analysts to interact with Zeek logs in TSV format. It parses the header in
each file and allows the user to refer to the specific columnar data available.
This is in contrast to tools like :program:`awk` that require the user to refer
to fields referenced by their position.
Consider the :file:`dns.log` generated earlier. If we process it with
:program:`zeek-cut`, without any modifications, this is the result:
.. code-block:: console
zeek@zeek:~/zeek-test/default$ cat dns.log | zeek-cut
::
1591367999.306059 CazOhH2qDUiJTWMCY 192.168.4.76 36844 192.168.4.1 53 udp 8555 - testmyids.com 1 C_INTERNET 28 AAAA 0 NOERROR F F T F 0 - - F
1591367999.305988 CazOhH2qDUiJTWMCY 192.168.4.76 36844 192.168.4.1 53 udp 19671 0.066852 testmyids.com 1 C_INTERNET 1 A 0 NOERROR F F T T 0 31.3.245.133 3600.000000 F
That is the :file:`dns.log`, minus the header fields showed earlier. Note we
have to invoke the cat utility in a pipeline to process files with
:program:`zeek-cut`.
If we pass :program:`zeek-cut` the fields we wish to see, the output looks like
this:
.. code-block:: console
zeek@zeek:~/zeek-test/default$ cat dns.log | zeek-cut id.orig_h query answers
::
192.168.4.76 testmyids.com -
192.168.4.76 testmyids.com 31.3.245.133
The sequence of field names given to :program:`zeek-cut` determines the output
order. This means you can also use :program:`zeek-cut` to reorder fields. For
example:
.. code-block:: console
zeek@zeek:~/zeek-test/default$ cat dns.log | zeek-cut query answers id.orig_h
::
testmyids.com - 192.168.4.76
testmyids.com 31.3.245.133 192.168.4.76
This feature can be helpful when piping output into programs like :program:`sort`.
:program:`zeek-cut` uses output redirection through the :program:`cat` command
and ``|`` operator. Whereas tools like :program:`awk` allow you to indicate the
log file as a command line option, :program:`zeek-cut` only takes input through
redirection such as ``|`` and ``<``.
For example, instead of using :program:`cat` and the pipe redirector, we could
obtain the previous output with this syntax:
.. code-block:: console
zeek@zeek:~/zeek-test/default$ zeek-cut id.orig_h query answers < dns.log
::
192.168.4.76 testmyids.com -
192.168.4.76 testmyids.com 31.3.245.133
Note that in its default setup using ZeekControl (but not with a simple
command-line invocation like ``zeek -i eth0``), watching a live interface and
writing logs to disk, Zeek will rotate log files on an hourly basis. Zeek will
move the current log file into a directory named using the format
``YYYY-MM-DD``. Zeek will use :program:`gzip` to compress the file with a naming
convention that includes the log file type and time range of the file.
When processing a compressed log file, use the :program:`zcat` tool instead of
:program:`cat` to read the file. Consider working with the gzip-encoding file
created in the following example. For demonstration purposes, we create a copy
of the :file:`dns.log` file as :file:`dns1.log`, :program:`gzip` it, and then
read it with :program:`zcat` instead of :program:`cat`.
.. code-block:: console
so16@so16:~/zeek-test/default$ cp dns.log dns1.log
so16@so16:~/zeek-test/default$ gzip dns1.log
so16@so16:~/zeek-test/default$ zcat dns1.log.gz
::
#separator \x09
#set_separator ,
#empty_field (empty)
#unset_field -
#path dns
#open 2020-06-05-14-48-32
#fields ts uid id.orig_h id.orig_p id.resp_h id.resp_p proto trans_id rtt query qclass qclass_name qtypeqtype_name rcode rcode_name AA TC RD RA Z answers TTLs rejected
#types time string addr port addr port enum count interval string count string count string count string bool bool bool bool count vector[string] vector[interval] bool
1591367999.306059 CazOhH2qDUiJTWMCY 192.168.4.76 36844 192.168.4.1 53 udp 8555 - testmyids.com 1 C_INTERNET 28 AAAA 0 NOERROR F F T F 0 - - F
1591367999.305988 CazOhH2qDUiJTWMCY 192.168.4.76 36844 192.168.4.1 53 udp 19671 0.066852 testmyids.com 1 C_INTERNET 1 A 0 NOERROR F F T T 0 31.3.245.133 3600.000000 F
#close 2020-06-05-14-48-32
:program:`zeek-cut` accepts the flag ``-d`` to convert the epoch time values in
the log files to human-readable format. For example, observe the default
timestamp value:
.. code-block:: console
zeek@zeek:~/zeek-test/default$ zcat dns1.log.gz | zeek-cut ts id.orig_h query answers
::
1591367999.306059 192.168.4.76 testmyids.com -
1591367999.305988 192.168.4.76 testmyids.com 31.3.245.133
Now see the effect of using the ``-d`` flag:
.. code-block:: console
zeek@zeek:~/zeek-test/default$ cat dns.log | zeek-cut -d ts id.orig_h query answers
::
2020-06-05T14:39:59+0000 192.168.4.76 testmyids.com -
2020-06-05T14:39:59+0000 192.168.4.76 testmyids.com 31.3.245.133
Converting the timestamp from a log file to UTC can be accomplished with the
``-u`` option.
The default time format when using the ``-d`` or ``-u`` is the ``strftime``
format string ``%Y-%m-%dT%H:%M:%S%z`` which results in a string with year,
month, day of month, followed by hour, minutes, seconds and the timezone
offset.
The default format can be altered by using the ``-D`` and ``-U`` flags, using the
standard ``strftime`` syntax. For example, to format the timestamp in the
US-typical “Middle Endian” you could use a format string of:
``%m-%d-%YT%H:%M:%S%z``
.. code-block:: console
zeek@zeek:~/zeek-test/default$ cat dns.log | zeek-cut -D %d-%m-%YT%H:%M:%S%z ts id.orig_h query answers
::
06-05-2020T14:39:59+0000 192.168.4.76 testmyids.com -
06-05-2020T14:39:59+0000 192.168.4.76 testmyids.com 31.3.245.133
Using :program:`awk` and :program:`zeek-cut` have been the traditional method
of interacting with Zeek logs. In the next section we will look at the
possibilities once we enable an alternative output format.
Zeek JSON Format Logs
=====================
During the last decade, the JavaScript Object Notation (JSON) format has become
a standard way to label and store many types of data. Zeek offers support for
this format. In the following example we will re-run the :file:`tm1t.pcap` trace
through Zeek, but request that it output logs in JSON format.
First we change into the json directory to avoid overwriting our existing log
files.
.. code-block:: console
zeek@zeek:~/zeek-test/default$ cd ../json/
Next we tell Zeek to output logs in JSON format using the command as shown.
.. code-block:: console
zeek@zeek:~/zeek-test/json$ zeek -C -r ../tm1t.pcap LogAscii::use_json=T
When we look at the directory contents, we see the same five output files.
.. code-block:: console
zeek@zeek:~/zeek-test/json$ ls -al
::
total 28
drwxrwxr-x 2 zeek zeek 4096 Jun 5 14:47 .
drwxrwxr-x 4 zeek zeek 4096 Jun 5 14:43 ..
-rw-rw-r-- 1 zeek zeek 708 Jun 5 14:47 conn.log
-rw-rw-r-- 1 zeek zeek 785 Jun 5 14:47 dns.log
-rw-rw-r-- 1 zeek zeek 325 Jun 5 14:47 files.log
-rw-rw-r-- 1 zeek zeek 405 Jun 5 14:47 http.log
-rw-rw-r-- 1 zeek zeek 90 Jun 5 14:47 packet_filter.log
However, if we look at the file contents, the format is much different.
First we look at :file:`packet_filter.log`.
.. code-block:: console
zeek@zeek:~/zeek-test/json$ cat packet_filter.log
::
{"ts":1591368442.854585,"node":"zeek","filter":"ip or not ip","init":true,"success":true}
Next we look at :file:`conn.log` and :file:`dns.log`:
.. code-block:: console
zeek@zeek:~/zeek-test/json$ cat conn.log
::
{"ts":1591367999.305988,"uid":"CMdzit1AMNsmfAIiQc","id.orig_h":"192.168.4.76","id.orig_p":36844,"id.resp_h":"192.168.4.1","id.resp_p":53,"proto":"udp","service":"dns","duration":0.06685185432434082,"orig_bytes":62,"resp_bytes":141,"conn_state":"SF","missed_bytes":0,"history":"Dd","orig_pkts":2,"orig_ip_bytes":118,"resp_pkts":2,"resp_ip_bytes":197,"ip_proto":17}
{"ts":1591367999.430166,"uid":"C5bLoe2Mvxqhawzqqd","id.orig_h":"192.168.4.76","id.orig_p":46378,"id.resp_h":"31.3.245.133","id.resp_p":80,"proto":"tcp","service":"http","duration":0.25411510467529297,"orig_bytes":77,"resp_bytes":295,"conn_state":"SF","missed_bytes":0,"history":"ShADadFf","orig_pkts":6,"orig_ip_bytes":397,"resp_pkts":4,"resp_ip_bytes":511,"ip_proto":6}
.. code-block:: console
zeek@zeek:~/zeek-test/json$ cat dns.log
::
{"ts":1591367999.306059,"uid":"CMdzit1AMNsmfAIiQc","id.orig_h":"192.168.4.76","id.orig_p":36844,"id.resp_h":"192.168.4.1","id.resp_p":53,"proto":"udp","trans_id":8555,"query":"testmyids.com","qclass":1,"qclass_name":"C_INTERNET","qtype":28,"qtype_name":"AAAA","rcode":0,"rcode_name":"NOERROR","AA":false,"TC":false,"RD":true,"RA":false,"Z":0,"rejected":false}
{"ts":1591367999.305988,"uid":"CMdzit1AMNsmfAIiQc","id.orig_h":"192.168.4.76","id.orig_p":36844,"id.resp_h":"192.168.4.1","id.resp_p":53,"proto":"udp","trans_id":19671,"rtt":0.06685185432434082,"query":"testmyids.com","qclass":1,"qclass_name":"C_INTERNET","qtype":1,"qtype_name":"A","rcode":0,"rcode_name":"NOERROR","AA":false,"TC":false,"RD":true,"RA":true,"Z":0,"answers":["31.3.245.133"],"TTLs":[3600.0],"rejected":false}
Next we look at :file:`files.log`.
.. code-block:: console
zeek@zeek:~/zeek-test/json$ cat files.log
::
{"ts":1591367999.604,"fuid":"FEEsZS1w0Z0VJIb5x4","uid":"C5bLoe2Mvxqhawzqqd","id.orig_h":"192.168.4.76","id.orig_p":46378,"id.resp_h":"31.3.245.133","id.resp_p":80,"source":"HTTP","depth":0,"analyzers":[],"mime_type":"text/plain","duration":0.0,"is_orig":false,"seen_bytes":39,"total_bytes":39,"missing_bytes":0,"overflow_bytes":0,"timedout":false}
Next we look at the :file:`http.log`.
.. code-block:: console
zeek@zeek:~/zeek-test/json$ cat http.log
::
{"ts":1591367999.512593,"uid":"C5bLoe2Mvxqhawzqqd","id.orig_h":"192.168.4.76","id.orig_p":46378,"id.resp_h":"31.3.245.133","id.resp_p":80,"trans_depth":1,"method":"GET","host":"testmyids.com","uri":"/","version":"1.1","user_agent":"curl/7.47.0","request_body_len":0,"response_body_len":39,"status_code":200,"status_msg":"OK","tags":[],"resp_fuids":["FEEsZS1w0Z0VJIb5x4"],"resp_mime_types":["text/plain"]}
Comparing the two log styles, we see strengths and weaknesses for each. For
example, the TSV format shows the Zeek types associated with each entry, such
as ``string``, ``addr``, ``port``, and so on. The JSON format does not include
that data. However, the JSON format associates each field “key” with a
“value,” such as ``"id.orig_p":46378``. While this necessarily increases the
amount of disk space used to store the raw logs, it makes it easier for
analysts and software to interpret the data, as the key is directly associated
with the value that follows. For this reason, most developers and analysts have
adopted the JSON output format for Zeek logs. That is the format we will use
for the log analysis sections of the documentation.
Zeek JSON Format and :program:`jq`
==================================
Analysts sometimes choose to inspect JSON-formatted Zeek files using
applications that recognize JSON format, such as :program:`jq`, which is a
JSON parser by Stephen Dolan, available at GitHub
(https://stedolan.github.io/jq/). It may already be installed on your Unix-like
system.
In the following example we process the :file:`dns.log` file with the ``.``
filter, which tells :program:`jq` to simply output what it finds in the file.
By default :program:`jq` outputs JSON formatted data in its “pretty-print”
style, which puts one key:value pair on each line as shown.
.. code-block:: console
so16@so16:~/zeek-test/json$ jq . dns.log
::
{
"ts": 1591367999.306059,
"uid": "CMdzit1AMNsmfAIiQc",
"id.orig_h": "192.168.4.76",
"id.orig_p": 36844,
"id.resp_h": "192.168.4.1",
"id.resp_p": 53,
"proto": "udp",
"trans_id": 8555,
"query": "testmyids.com",
"qclass": 1,
"qclass_name": "C_INTERNET",
"qtype": 28,
"qtype_name": "AAAA",
"rcode": 0,
"rcode_name": "NOERROR",
"AA": false,
"TC": false,
"RD": true,
"RA": false,
"Z": 0,
"rejected": false
}
{
"ts": 1591367999.305988,
"uid": "CMdzit1AMNsmfAIiQc",
"id.orig_h": "192.168.4.76",
"id.orig_p": 36844,
"id.resp_h": "192.168.4.1",
"id.resp_p": 53,
"proto": "udp",
"trans_id": 19671,
"rtt": 0.06685185432434082,
"query": "testmyids.com",
"qclass": 1,
"qclass_name": "C_INTERNET",
"qtype": 1,
"qtype_name": "A",
"rcode": 0,
"rcode_name": "NOERROR",
"AA": false,
"TC": false,
"RD": true,
"RA": true,
"Z": 0,
"answers": [
"31.3.245.133"
],
"TTLs": [
3600
],
"rejected": false
}
We can tell :program:`jq` to output what it sees in “compact” format using the
``-c`` switch.
.. code-block:: console
so16@so16:~/zeek-test/json$ jq . -c dns.log
::
{"ts":1591367999.306059,"uid":"CMdzit1AMNsmfAIiQc","id.orig_h":"192.168.4.76","id.orig_p":36844,"id.resp_h":"192.168.4.1","id.resp_p":53,"proto":"udp","trans_id":8555,"query":"testmyids.com","qclass":1,"qclass_name":"C_INTERNET","qtype":28,"qtype_name":"AAAA","rcode":0,"rcode_name":"NOERROR","AA":false,"TC":false,"RD":true,"RA":false,"Z":0,"rejected":false}
{"ts":1591367999.305988,"uid":"CMdzit1AMNsmfAIiQc","id.orig_h":"192.168.4.76","id.orig_p":36844,"id.resp_h":"192.168.4.1","id.resp_p":53,"proto":"udp","trans_id":19671,"rtt":0.06685185432434082,"query":"testmyids.com","qclass":1,"qclass_name":"C_INTERNET","qtype":1,"qtype_name":"A","rcode":0,"rcode_name":"NOERROR","AA":false,"TC":false,"RD":true,"RA":true,"Z":0,"answers":["31.3.245.133"],"TTLs":[3600],"rejected":false}
The power of :program:`jq` becomes evident when we decide we only want to see
specific values. For example, the following tells :program:`jq` to look at the
:file:`dns.log` and report the source IP of systems doing DNS queries, followed
by the query, and any answer to the query.
.. code-block:: console
so16@so16:~/zeek-test/json$ jq -c '[."id.orig_h", ."query", ."answers"]' dns.log
::
["192.168.4.76","testmyids.com",null]
["192.168.4.76","testmyids.com",["31.3.245.133"]]
For a more comprehensive description of the capabilities of :program:`jq`,
see the `jq manual <https://stedolan.github.io/jq/manual/>`_.
With this basic understanding of how to interact with Zeek logs, we can now
turn to specific logs and interpret their values.
Log Schemas
===========
It's important to note that the exact set and shape of Zeek's logs is highly
site-dependent. While every Zeek version ships with a set of logs enabled by
default, it also includes optional ones that you're welcome to enable. (Feel
free to peruse the :ref:`full set<log-files>`.) In addition, many of Zeek's
`add-on packages <https://packages.zeek.org/>`_ introduce logs of their own, or
enrich existing ones with additional metadata. And finally, Zeek's
:ref:`logging framework <framework-logging>` lets you apply your own log
customizations with a bit of scripting.
Zeek's `logschema <https://github.com/zeek/logschema>`_ package helps you
understand your Zeek logs. It produces log schemas that detail your
installation's set of logs and their fields. For each field, the schemas
provide rich metadata including name, type, and docstrings. They can also
explain the source of a field, such as the specific script or the name of the
Zeek package that added it. Log schemas are also a great way to understand how
and whether your logs change when you upgrade to a newer version of Zeek.
To produce schemas, you need to tell Zeek which schema exporters to load.
An easy way to do this is to simply start Zeek with your installed packages
and an exporter of your choice. To get started, try the following:
.. code-block:: console
$ zkg install logschema
$ zeek logschema/export/jsonschema packages
Your local directory will now contain a JSON Schema description for each of your
installation's logs.
.. code-block:: console
$ cat zeek-conn-log.schema.json | jq
{
"$schema": "https://json-schema.org/draft/2020-12/schema",
"title": "Schema for Zeek conn.log",
"description": "JSON Schema for Zeek conn.log",
"type": "object",
"properties": {
"ts": {
"description": "This is the time of the first packet.",
"type": "number",
"examples": [
"1737691432.132607"
],
"x-zeek": {
"type": "time",
"record_type": "Conn::Info",
"is_optional": false,
"script": "base/protocols/conn/main.zeek"
}
},
...
}
The logschema package supports a range of schema formats including JSON Schema
and CSV, and works with Zeek 5.2 and newer. Take a look at the package's
`documentation <https://github.com/zeek/logschema>`_ for details.