mirror of
https://github.com/zeek/zeek.git
synced 2025-10-02 06:38:20 +00:00

Moved examples and use cases into a common subdir so those topics will get grouped together in the main sidebar/TOC
108 lines
3.8 KiB
ReStructuredText
108 lines
3.8 KiB
ReStructuredText
|
|
.. _mime-stats:
|
|
|
|
====================
|
|
MIME Type Statistics
|
|
====================
|
|
|
|
Files are constantly transmitted over HTTP on regular networks. These
|
|
files belong to a specific category (e.g., executable, text, image)
|
|
identified by a `Multipurpose Internet Mail Extension (MIME)
|
|
<http://en.wikipedia.org/wiki/MIME>`_. Although MIME was originally
|
|
developed to identify the type of non-text attachments on email, it is
|
|
also used by a web browser to identify the type of files transmitted and
|
|
present them accordingly.
|
|
|
|
In this tutorial, we will demonstrate how to use the Sumstats Framework
|
|
to collect statistical information based on MIME types; specifically,
|
|
the total number of occurrences, size in bytes, and number of unique
|
|
hosts transmitting files over HTTP per each type. For instructions on
|
|
extracting and creating a local copy of these files, visit :ref:`this
|
|
tutorial <http-monitor>`.
|
|
|
|
------------------------------------------------
|
|
MIME Statistics with Sumstats
|
|
------------------------------------------------
|
|
|
|
When working with the :ref:`Summary Statistics Framework
|
|
<sumstats-framework>`, you need to define three different pieces: (i)
|
|
Observations, where the event is observed and fed into the framework.
|
|
(ii) Reducers, where observations are collected and measured. (iii)
|
|
Sumstats, where the main functionality is implemented.
|
|
|
|
We start by defining our observation along with a record to store
|
|
all statistical values and an observation interval. We are conducting our
|
|
observation on the :bro:see:`HTTP::log_http` event and are interested
|
|
in the MIME type, size of the file ("response_body_len"), and the
|
|
originator host ("orig_h"). We use the MIME type as our key and create
|
|
observers for the other two values.
|
|
|
|
.. literalinclude:: mimestats.bro
|
|
:caption:
|
|
:language: bro
|
|
:linenos:
|
|
:lines: 6-29
|
|
:lineno-start: 6
|
|
|
|
.. literalinclude:: mimestats.bro
|
|
:caption:
|
|
:language: bro
|
|
:linenos:
|
|
:lines: 54-64
|
|
:lineno-start: 54
|
|
|
|
Next, we create the reducers. The first will accumulate file sizes
|
|
and the second will make sure we only store a host ID once. Below is
|
|
the partial code from a :bro:see:`bro_init` handler.
|
|
|
|
.. literalinclude:: mimestats.bro
|
|
:caption:
|
|
:language: bro
|
|
:linenos:
|
|
:lines: 34-37
|
|
:lineno-start: 34
|
|
|
|
In our final step, we create the SumStats where we check for the
|
|
observation interval. Once it expires, we populate the record
|
|
(defined above) with all the relevant data and write it to a log.
|
|
|
|
.. literalinclude:: mimestats.bro
|
|
:caption:
|
|
:language: bro
|
|
:linenos:
|
|
:lines: 38-51
|
|
:lineno-start: 38
|
|
|
|
After putting the three pieces together we end up with the following
|
|
final code for our script.
|
|
|
|
.. literalinclude:: mimestats.bro
|
|
:caption:
|
|
:language: bro
|
|
:linenos:
|
|
|
|
.. sourcecode:: console
|
|
|
|
$ bro -r http/bro.org.pcap mimestats.bro
|
|
#separator \x09
|
|
#set_separator ,
|
|
#empty_field (empty)
|
|
#unset_field -
|
|
#path mime_metrics
|
|
#open 2018-12-14-16-25-06
|
|
#fields ts ts_delta mtype uniq_hosts hits bytes
|
|
#types time interval string count count count
|
|
1389719059.311698 300.000000 image/png 1 9 82176
|
|
1389719059.311698 300.000000 image/gif 1 1 172
|
|
1389719059.311698 300.000000 image/x-icon 1 2 2300
|
|
1389719059.311698 300.000000 text/html 1 2 42231
|
|
1389719059.311698 300.000000 text/plain 1 15 128001
|
|
1389719059.311698 300.000000 image/jpeg 1 1 186859
|
|
1389719059.311698 300.000000 application/pgp-signature 1 1 836
|
|
#close 2018-12-14-16-25-06
|
|
|
|
.. note::
|
|
|
|
The redefinition of :bro:see:`Site::local_nets` is only done inside
|
|
this script to make it a self-contained example. It's typically
|
|
redefined somewhere else.
|