mirror of
https://github.com/zeek/zeek.git
synced 2025-10-02 06:38:20 +00:00

It looks better by default with the RTD theme, Bro syntax highlighting is supported well enough, and I think will be more more consistent with the literalinclude usages, so being able to drop the extra Sphinx extension seems good.
108 lines
3.8 KiB
ReStructuredText
108 lines
3.8 KiB
ReStructuredText
|
|
.. _mime-stats:
|
|
|
|
====================
|
|
MIME Type Statistics
|
|
====================
|
|
|
|
Files are constantly transmitted over HTTP on regular networks. These
|
|
files belong to a specific category (e.g., executable, text, image)
|
|
identified by a `Multipurpose Internet Mail Extension (MIME)
|
|
<http://en.wikipedia.org/wiki/MIME>`_. Although MIME was originally
|
|
developed to identify the type of non-text attachments on email, it is
|
|
also used by a web browser to identify the type of files transmitted and
|
|
present them accordingly.
|
|
|
|
In this tutorial, we will demonstrate how to use the Sumstats Framework
|
|
to collect statistical information based on MIME types; specifically,
|
|
the total number of occurrences, size in bytes, and number of unique
|
|
hosts transmitting files over HTTP per each type. For instructions on
|
|
extracting and creating a local copy of these files, visit :ref:`this
|
|
tutorial <http-monitor>`.
|
|
|
|
------------------------------------------------
|
|
MIME Statistics with Sumstats
|
|
------------------------------------------------
|
|
|
|
When working with the :ref:`Summary Statistics Framework
|
|
<sumstats-framework>`, you need to define three different pieces: (i)
|
|
Observations, where the event is observed and fed into the framework.
|
|
(ii) Reducers, where observations are collected and measured. (iii)
|
|
Sumstats, where the main functionality is implemented.
|
|
|
|
We start by defining our observation along with a record to store
|
|
all statistical values and an observation interval. We are conducting our
|
|
observation on the :bro:see:`HTTP::log_http` event and are interested
|
|
in the MIME type, size of the file ("response_body_len"), and the
|
|
originator host ("orig_h"). We use the MIME type as our key and create
|
|
observers for the other two values.
|
|
|
|
.. literalinclude:: mimestats.bro
|
|
:caption:
|
|
:language: bro
|
|
:linenos:
|
|
:lines: 6-29
|
|
:lineno-start: 6
|
|
|
|
.. literalinclude:: mimestats.bro
|
|
:caption:
|
|
:language: bro
|
|
:linenos:
|
|
:lines: 54-64
|
|
:lineno-start: 54
|
|
|
|
Next, we create the reducers. The first will accumulate file sizes
|
|
and the second will make sure we only store a host ID once. Below is
|
|
the partial code from a :bro:see:`bro_init` handler.
|
|
|
|
.. literalinclude:: mimestats.bro
|
|
:caption:
|
|
:language: bro
|
|
:linenos:
|
|
:lines: 34-37
|
|
:lineno-start: 34
|
|
|
|
In our final step, we create the SumStats where we check for the
|
|
observation interval. Once it expires, we populate the record
|
|
(defined above) with all the relevant data and write it to a log.
|
|
|
|
.. literalinclude:: mimestats.bro
|
|
:caption:
|
|
:language: bro
|
|
:linenos:
|
|
:lines: 38-51
|
|
:lineno-start: 38
|
|
|
|
After putting the three pieces together we end up with the following
|
|
final code for our script.
|
|
|
|
.. literalinclude:: mimestats.bro
|
|
:caption:
|
|
:language: bro
|
|
:linenos:
|
|
|
|
.. sourcecode:: console
|
|
|
|
$ bro -r http/bro.org.pcap mimestats.bro
|
|
#separator \x09
|
|
#set_separator ,
|
|
#empty_field (empty)
|
|
#unset_field -
|
|
#path mime_metrics
|
|
#open 2018-12-14-16-25-06
|
|
#fields ts ts_delta mtype uniq_hosts hits bytes
|
|
#types time interval string count count count
|
|
1389719059.311698 300.000000 image/png 1 9 82176
|
|
1389719059.311698 300.000000 image/gif 1 1 172
|
|
1389719059.311698 300.000000 image/x-icon 1 2 2300
|
|
1389719059.311698 300.000000 text/html 1 2 42231
|
|
1389719059.311698 300.000000 text/plain 1 15 128001
|
|
1389719059.311698 300.000000 image/jpeg 1 1 186859
|
|
1389719059.311698 300.000000 application/pgp-signature 1 1 836
|
|
#close 2018-12-14-16-25-06
|
|
|
|
.. note::
|
|
|
|
The redefinition of :bro:see:`Site::local_nets` is only done inside
|
|
this script to make it a self-contained example. It's typically
|
|
redefined somewhere else.
|