Restructuring the main documentation index.

I'm merging in the remaining pieces from the former doc directory and
restructuring things into sub-directories.
This commit is contained in:
Robin Sommer 2013-04-01 17:30:12 -07:00
parent 12e4dd8066
commit 25bf563e1c
41 changed files with 7679 additions and 100 deletions

View file

@ -31,7 +31,7 @@ add_custom_target(broxygen
${DOC_SOURCE_WORKDIR}/scripts ${DOC_SOURCE_WORKDIR}/scripts
# append to the master index of all policy scripts # append to the master index of all policy scripts
COMMAND cat ${MASTER_POLICY_INDEX} >> COMMAND cat ${MASTER_POLICY_INDEX} >>
${DOC_SOURCE_WORKDIR}/scripts/index.rst ${DOC_SOURCE_WORKDIR}/scripts/scripts.rst
# append to the master index of all policy packages # append to the master index of all policy packages
COMMAND cat ${MASTER_PACKAGE_INDEX} >> COMMAND cat ${MASTER_PACKAGE_INDEX} >>
${DOC_SOURCE_WORKDIR}/scripts/packages.rst ${DOC_SOURCE_WORKDIR}/scripts/packages.rst

86
doc/cluster/index.rst Normal file
View file

@ -0,0 +1,86 @@
========================
Setting up a Bro Cluster
========================
Intro
------
Bro is not multithreaded, so once the limitations of a single processor core are reached, the only option currently is to spread the workload across many cores or even many physical computers. The cluster deployment scenario for Bro is the current solution to build these larger systems. The accompanying tools and scripts provide the structure to easily manage many Bro processes examining packets and doing correlation activities but acting as a singular, cohesive entity.
Architecture
---------------
The figure below illustrates the main components of a Bro cluster.
.. image:: /images/deployment.png
Tap
***
This is a mechanism that splits the packet stream in order to make a copy
available for inspection. Examples include the monitoring port on a switch and
an optical splitter for fiber networks.
Frontend
********
This is a discrete hardware device or on-host technique that will split your traffic into many streams or flows. The Bro binary does not do this job. There are numerous ways to accomplish this task, some of which are described below in `Frontend Options`_.
Manager
*******
This is a Bro process which has two primary jobs. It receives log messages and notices from the rest of the nodes in the cluster using the Bro communications protocol. The result is that you will end up with single logs for each log instead of many discrete logs that you have to later combine in some manner with post processing. The manager also takes the opportunity to de-duplicate notices and it has the ability to do so since its acting as the choke point for notices and how notices might be processed into actions such as emailing, paging, or blocking.
The manager process is started first by BroControl and it only opens its designated port and waits for connections, it doesnt initiate any connections to the rest of the cluster. Once the workers are started and connect to the manager, logs and notices will start arriving to the manager process from the workers.
Proxy
*****
This is a Bro process which manages synchronized state. Variables can be synchronized across connected Bro processes automatically in Bro and proxies will help the workers by alleviating the need for all of the workers to connect directly to each other.
Examples of synchronized state from the scripts that ship with Bro are things such as the full list of “known” hosts and services which are hosts or services which have been detected as performing full TCP handshakes or an analyzed protocol has been found on the connection. If worker A detects host 1.2.3.4 as an active host, it would be beneficial for worker B to know that as well so worker A shares that information as an insertion to a set <link to set documentation would be good here> which travels to the clusters proxy and the proxy then sends that same set insertion to worker B. The result is that worker A and worker B have shared knowledge about host and services that are active on the network being monitored.
The proxy model extends to having multiple proxies as well if necessary for performance reasons, it only adds one additional step for the Bro processes. Each proxy connects to another proxy in a ring and the workers are shared between them as evenly as possible. When a proxy receives some new bit of state, it will share that with its proxy which is then shared around the ring of proxies and down to all of the workers. From a practical standpoint, there are no rules of thumb established yet for the number of proxies necessary for the number of workers they are serving. Best is to start with a single proxy and add more if communication performance problems are found.
Bro processes acting as proxies dont tend to be extremely intense to CPU or memory and users frequently run proxy processes on the same physical host as the manager.
Worker
******
This is the Bro process that sniffs network traffic and does protocol analysis on the reassembled traffic streams. Most of the work of an active cluster takes place on the workers and as such, the workers typically represent the bulk of the Bro processes that are running in a cluster. The fastest memory and CPU core speed you can afford is best here since all of the protocol parsing and most analysis will take place here. There are no particular requirements for the disks in workers since almost all logging is done remotely to the manager and very little is normally written to disk.
The rule of thumb we have followed recently is to allocate approximately 1 core for every 80Mbps of traffic that is being analyzed, however this estimate could be extremely traffic mix specific. It has generally worked for mixed traffic with many users and servers. For example, if your traffic peaks around 2Gbps (combined) and you want to handle traffic at peak load, you may want to have 26 cores available (2048 / 80 == 25.6). If the 80Mbps estimate works for your traffic, this could be handled by 3 physical hosts dedicated to being workers with each one containing dual 6-core processors.
Once a flow based load balancer is put into place this model is extremely easy to scale as well so its recommended that you guess at the amount of hardware you will need to fully analyze your traffic. If it turns out that you need more, its relatively easy to increase the size of the cluster in most cases.
Frontend Options
----------------
There are many options for setting up a frontend flow distributor and in many cases it may even be beneficial to do multiple stages of flow distribution on the network and on the host.
Discrete hardware flow balancers
********************************
cPacket
^^^^^^^
If you are monitoring one or more 10G physical interfaces, the recommended solution is to use either a cFlow or cVu device from cPacket because they are currently being used very successfully at a number of sites. These devices will perform layer-2 load balancing by rewriting the destination ethernet MAC address to cause each packet associated with a particular flow to have the same destination MAC. The packets can then be passed directly to a monitoring host where each worker has a BPF filter to limit its visibility to only that stream of flows or onward to a commodity switch to split the traffic out to multiple 1G interfaces for the workers. This can ultimately greatly reduce costs since workers can use relatively inexpensive 1G interfaces.
OpenFlow Switches
^^^^^^^^^^^^^^^^^
We are currently exploring the use of OpenFlow based switches to do flow based load balancing directly on the switch which can greatly reduce frontend costs for many users. This document will be updated when we have more information.
On host flow balancing
**********************
PF_RING
^^^^^^^
The PF_RING software for Linux has a “clustering” feature which will do flow based load balancing across a number of processes that are sniffing the same interface. This will allow you to easily take advantage of multiple cores in a single physical host because Bros main event loop is single threaded and cant natively utilize all of the cores. More information about Bro with PF_RING can be found here: (someone want to write a quick Bro/PF_RING tutorial to link to here? document installing kernel module, libpcap wrapper, building Bro with the --with-pcap configure option)
Netmap
^^^^^^
FreeBSD has an in-progress project named Netmap which will enable flow based load balancing as well. When it becomes viable for real world use, this document will be updated.
Click! Software Router
^^^^^^^^^^^^^^^^^^^^^^
Click! can be used for flow based load balancing with a simple configuration. (link to an example for the config). This solution is not recommended on Linux due to Bros PF_RING support and only as a last resort on other operating systems since it causes a lot of overhead due to context switching back and forth between kernel and userland several times per packet.

View file

@ -0,0 +1,68 @@
.. -*- mode: rst-mode -*-
..
.. Version number is filled in automatically.
.. |version| replace:: 0.34-3
======
BinPAC
======
.. rst-class:: opening
BinPAC is a high level language for describing protocol parsers and
generates C++ code. It is currently maintained and distributed with the
Bro Network Security Monitor distribution, however, the generated parsers
may be used with other programs besides Bro.
Download
--------
You can find the latest BinPAC release for download at
http://www.bro.org/download.
BinPAC's git repository is located at `git://git.bro.org/binpac.git
<git://git.bro.org/binpac.git>`__. You can browse the repository
`here <http://git.bro.org/binpac.git>`__.
This document describes BinPAC |version|. See the ``CHANGES``
file for version history.
Prerequisites
-------------
BinPAC relies on the following libraries and tools, which need to be
installed before you begin:
* Flex (Fast Lexical Analyzer)
Flex is already installed on most systems, so with luck you can
skip having to install it yourself.
* Bison (GNU Parser Generator)
Bison is also already installed on many system.
* CMake 2.6.3 or greater
CMake is a cross-platform, open-source build system, typically
not installed by default. See http://www.cmake.org for more
information regarding CMake and the installation steps below for
how to use it to build this distribution. CMake generates native
Makefiles that depend on GNU Make by default
Installation
------------
To build and install into ``/usr/local``::
./configure
cd build
make
make install
This will perform an out-of-source build into the build directory using
the default build options and then install the binpac binary into
``/usr/local/bin``.
You can specify a different installation directory with::
./configure --prefix=<dir>
Run ``./configure --help`` for more options.

View file

@ -0,0 +1,70 @@
.. -*- mode: rst; -*-
..
.. Version number is filled in automatically.
.. |version| replace:: 0.26-5
======================
Bro Auxiliary Programs
======================
.. contents::
:Version: |version|
Handy auxiliary programs related to the use of the Bro Network Security
Monitor (http://www.bro.org).
Note that some files that were formerly distributed with Bro as part
of the aux/ tree are now maintained separately. See the
http://www.bro.org/download for their download locations.
adtrace
=======
Makefile and source for the adtrace utility. This program is used
in conjunction with the localnetMAC.pl perl script to compute the
network address that compose the internal and extern nets that bro
is monitoring. This program when run by itself just reads a pcap
(tcpdump) file and writes out the src MAC, dst MAC, src IP, dst
IP for each packet seen in the file. This output is processed by
the localnetMAC.pl script during 'make install'.
devel-tools
===========
A set of scripts used commonly for Bro development.
extract-conn-by-uid:
Extracts a connection from a trace file based
on its UID found in Bro's conn.log
gen-mozilla-ca-list.rb
Generates list of Mozilla SSL root certificates in
a format readable by Bro.
update-changes
A script to maintain the CHANGES and VERSION files.
git-show-fastpath
Show commits to the fastpath branch not yet merged into master.
cpu-bench-with-trace
Run a number of Bro benchmarks on a trace file.
nftools
=======
Utilities for dealing with Bro's custom file format for storing
NetFlow records. nfcollector reads NetFlow data from a socket and
writes it in Bro's format. ftwire2bro reads NetFlow "wire" format
(e.g., as generated by a 'flow-export' directive) and writes it in
Bro's format.
rst
===
Makefile and source for the rst utility. "rst" can be invoked by
a Bro script to terminate an established TCP connection by forging
RST tear-down packets. See terminate_connection() in conn.bro.

View file

@ -0,0 +1,231 @@
.. -*- mode: rst-mode -*-
..
.. Version number is filled in automatically.
.. |version| replace:: 0.54
============================
Python Bindings for Broccoli
============================
.. rst-class:: opening
This Python module provides bindings for Broccoli, Bro's client
communication library. In general, the bindings provide the same
functionality as Broccoli's C API.
.. contents::
Download
--------
You can find the latest Broccoli-Python release for download at
http://www.bro.org/download.
Broccoli-Python's git repository is located at `git://git.bro.org/broccoli-python.git
<git://git.bro.org/broccoli-python.git>`__. You can browse the repository
`here <http://git.bro.org/broccoli-python.git>`__.
This document describes Broccoli-Python |version|. See the ``CHANGES``
file for version history.
Installation
------------
Installation of the Python module is pretty straight-forward. After
Broccoli itself has been installed, it follows the standard installation
process for Python modules::
python setup.py install
Try the following to test the installation. If you do not see any
error message, everything should be fine::
python -c "import broccoli"
Usage
-----
The following examples demonstrate how to send and receive Bro
events in Python.
The main challenge when using Broccoli from Python is dealing with
the data types of Bro event parameters as there is no one-to-one
mapping between Bro's types and Python's types. The Python modules
automatically maps between those types which both systems provide
(such as strings) and provides a set of wrapper classes for Bro
types which do not have a direct Python equivalent (such as IP
addresses).
Connecting to Bro
~~~~~~~~~~~~~~~~~
The following code sets up a connection from Python to a remote Bro
instance (or another Broccoli) and provides a connection handle for
further communication::
from broccoli import *
bc = Connection("127.0.0.1:47758")
An ``IOError`` will be raised if the connection cannot be established.
Sending Events
~~~~~~~~~~~~~~
Once you have a connection handle ``bc`` set up as shown above, you can
start sending events::
bc.send("foo", 5, "attack!")
This sends an event called ``foo`` with two parameters, ``5`` and
``attack!``. Broccoli operates asynchronously, i.e., events scheduled
with ``send()`` are not always sent out immediately but might be
queued for later transmission. To ensure that all events get out
(and incoming events are processed, see below), you need to call
``bc.processInput()`` regularly.
Data Types
~~~~~~~~~~
In the example above, the types of the event parameters are
automatically derived from the corresponding Python types: the first
parameter (``5``) has the Bro type ``int`` and the second one
(``attack!``) has Bro type ``string``.
For types which do not have a Python equivalent, the ``broccoli``
module provides wrapper classes which have the same names as the
corresponding Bro types. For example, to send an event called ``bar``
with one ``addr`` argument and one ``count`` argument, you can write::
bc.send("bar", addr("192.168.1.1"), count(42))
The following table summarizes the available atomic types and their
usage.
======== =========== ===========================
Bro Type Python Type Example
======== =========== ===========================
addr ``addr("192.168.1.1")``
bool bool ``True``
count ``count(42)``
double float ``3.14``
enum Type currently not supported
int int ``5``
interval ``interval(60)``
net Type currently not supported
port ``port("80/tcp")``
string string ``"attack!"``
subnet ``subnet("192.168.1.0/24")``
time ``time(1111111111.0)``
======== =========== ===========================
The ``broccoli`` module also supports sending Bro records as event
parameters. To send a record, you first define a record type. For
example, a Bro record type::
type my_record: record {
a: int;
b: addr;
c: subnet;
};
turns into Python as::
my_record = record_type("a", "b", "c")
As the example shows, Python only needs to know the attribute names
but not their types. The types are derived automatically in the same
way as discussed above for atomic event parameters.
Now you can instantiate a record instance of the newly defined type
and send it out::
rec = record(my_record)
rec.a = 5
rec.b = addr("192.168.1.1")
rec.c = subnet("192.168.1.0/24")
bc.send("my_event", rec)
.. note:: The Python module does not support nested records at this time.
Receiving Events
~~~~~~~~~~~~~~~~
To receive events, you define a callback function having the same
name as the event and mark it with the ``event`` decorator::
@event
def foo(arg1, arg2):
print arg1, arg2
Once you start calling ``bc.processInput()`` regularly (see above),
each received ``foo`` event will trigger the callback function.
By default, the event's arguments are always passed in with built-in
Python types. For Bro types which do not have a direct Python
equivalent (see table above), a substitute built-in type is used
which corresponds to the type the wrapper class' constructor expects
(see the examples in the table). For example, Bro type ``addr`` is
passed in as a string and Bro type ``time`` is passed in as a float.
Alternatively, you can define a _typed_ prototype for the event. If you
do so, arguments will first be type-checked and then passed to the
call-back with the specified type (which means instances of the
wrapper classes for non-Python types). Example::
@event(count, addr)
def bar(arg1, arg2):
print arg1, arg2
Here, ``arg1`` will be an instance of the ``count`` wrapper class and
``arg2`` will be an instance of the ``addr`` wrapper class.
Protoyping works similarly with built-in Python types::
@event(int, string):
def foo(arg1, arg2):
print arg1, arg2
In general, the prototype specifies the types in which the callback
wants to receive the arguments. This actually provides support for
simple type casts as some types support conversion to into something
different. If for instance the event source sends an event with a
single port argument, ``@event(port)`` will pass the port as an
instance of the ``port`` wrapper class; ``@event(string)`` will pass it
as a string (e.g., ``"80/tcp"``); and ``@event(int)`` will pass it as an
integer without protocol information (e.g., just ``80``). If an
argument cannot be converted into the specified type, a ``TypeError``
will be raised.
To receive an event with a record parameter, the record type first
needs to be defined, as described above. Then the type can be used
with the ``@event`` decorator in the same way as atomic types::
my_record = record_type("a", "b", "c")
@event(my_record)
def my_event(rec):
print rec.a, rec.b, rec.c
Helper Functions
----------------
The ``broccoli`` module provides one helper function: ``current_time()``
returns the current time as a float which, if necessary, can be
wrapped into a ``time`` parameter (i.e., ``time(current_time()``)
Examples
--------
There are some example scripts in the ``tests/`` subdirectory of the
``broccoli-python`` repository
`here <http://git.bro.org/broccoli-python.git/tree/HEAD:/tests>`_:
- ``broping.py`` is a (simplified) Python version of Broccoli's test program
``broping``. Start Bro with ``broping.bro``.
- ``broping-record.py`` is a Python version of Broccoli's ``broping``
for records. Start Bro with ``broping-record.bro``.
- ``test.py`` is a very ugly but comprehensive regression test and part of
the communication test-suite. Start Bro with ``test.bro``.

View file

@ -0,0 +1,67 @@
.. -*- mode: rst-mode -*-
..
.. Version number is filled in automatically.
.. |version| replace:: 1.54
===============================================
Ruby Bindings for Broccoli
===============================================
.. rst-class:: opening
This is the broccoli-ruby extension for Ruby which provides access
to the Broccoli API. Broccoli is a library for
communicating with the Bro Intrusion Detection System.
Download
========
You can find the latest Broccoli-Ruby release for download at
http://www.bro.org/download.
Broccoli-Ruby's git repository is located at `git://git.bro.org/broccoli-ruby.git
<git://git.bro.org/broccoli-ruby.git>`__. You can browse the repository
`here <http://git.bro.org/broccoli-ruby.git>`__.
This document describes Broccoli-Ruby |version|. See the ``CHANGES``
file for version history.
Installation
============
To install the extension:
1. Make sure that the ``broccoli-config`` binary is in your path.
(``export PATH=/usr/local/bro/bin:$PATH``)
2. Run ``sudo ruby setup.rb``.
To install the extension as a gem (suggested):
1. Install `rubygems <http://rubygems.org>`_.
2. Make sure that the ``broccoli-config`` binary is in your path.
(``export PATH=/usr/local/bro/bin:$PATH``)
3. Run, ``sudo gem install rbroccoli``.
Usage
=====
There aren't really any useful docs yet. Your best bet currently is
to read through the examples.
One thing I should mention however is that I haven't done any optimization
yet. You may find that if you write code that is going to be sending or
receiving extremely large numbers of events, that it won't run fast enough and
will begin to fall behind the Bro server. The dns_requests.rb example is
a good performance test if your Bro server is sitting on a network with many
dns lookups.
Contact
=======
If you have a question/comment/patch, see the Bro `contact page
<http://www.bro.org/contact/index.html>`_.

View file

@ -0,0 +1,141 @@
.. -*- mode: rst-mode -*-
..
.. Version number is filled in automatically.
.. |version| replace:: 1.92-9
===============================================
Broccoli: The Bro Client Communications Library
===============================================
.. rst-class:: opening
Broccoli is the "Bro client communications library". It allows you
to create client sensors for the Bro intrusion detection system.
Broccoli can speak a good subset of the Bro communication protocol,
in particular, it can receive Bro IDs, send and receive Bro events,
and send and receive event requests to/from peering Bros. You can
currently create and receive values of pure types like integers,
counters, timestamps, IP addresses, port numbers, booleans, and
strings.
Download
--------
You can find the latest Broccoli release for download at
http://www.bro.org/download.
Broccoli's git repository is located at
`git://git.bro.org/broccoli <git://git.bro.org/broccoli>`_. You
can browse the repository `here <http://git.bro.org/broccoli>`_.
This document describes Broccoli |version|. See the ``CHANGES``
file for version history.
Installation
------------
The Broccoli library has been tested on Linux, the BSDs, and Solaris.
A Windows build has not currently been tried but is part of our future
plans. If you succeed in building Broccoli on other platforms, let us
know!
Prerequisites
-------------
Broccoli relies on the following libraries and tools, which need to be
installed before you begin:
Flex (Fast Lexical Analyzer)
Flex is already installed on most systems, so with luck you
can skip having to install it yourself.
Bison (GNU Parser Generator)
This comes with many systems, but if you get errors compiling
parse.y, you will need to install it.
OpenSSL headers and libraries
For encrypted communication. These are likely installed,
though some platforms may require installation of a 'devel'
package for the headers.
CMake 2.6.3 or greater
CMake is a cross-platform, open-source build system, typically
not installed by default. See http://www.cmake.org for more
information regarding CMake and the installation steps below
for how to use it to build this distribution. CMake generates
native Makefiles that depend on GNU Make by default.
Broccoli can also make use of some optional libraries if they are found at
installation time:
Libpcap headers and libraries
Network traffic capture library
Installation
------------
To build and install into ``/usr/local``::
./configure
make
make install
This will perform an out-of-source build into the build directory using the
default build options and then install libraries into ``/usr/local/lib``.
You can specify a different installation directory with::
./configure --prefix=<dir>
Or control the python bindings install destination more precisely with::
./configure --python-install-dir=<dir>
Run ``./configure --help`` for more options.
Further notable configure options:
``--enable-debug``
This one enables lots of debugging output. Be sure to disable
this when using the library in a production environment! The
output could easily end up in undersired places when the stdout
of the program you've instrumented is used in other ways.
``--with-configfile=FILE``
Broccoli can read key/value pairs from a config file. By default
it is located in the etc directory of the installation root
(exception: when using ``--prefix=/usr``, ``/etc`` is used
instead of /usr/etc). The default config file name is
broccoli.conf. Using ``--with-configfile``, you can override the
location and name of the config file.
To use the library in other programs & configure scripts, use the
``broccoli-config`` script. It gives you the necessary configuration flags
and linker flags for your system, see ``--cflags`` and ``--libs``.
The API is contained in broccoli.h and pretty well documented. A few
usage examples can be found in the test directory, in particular, the
``broping`` tool can be used to test event transmission and reception. Have
a look at the policy file ``broping.bro`` for the events that need to be
defined at the peering Bro. Try ``broping -h`` for a look at the available
options.
Broccoli knows two kinds of version numbers: the release version number
(as in "broccoli-x.y.tar.gz", or as shipped with Bro) and the shared
library API version number (as in libbroccoli.so.3.0.0). The former
relates to changes in the tree, the latter to compatibility changes in
the API.
Comments, feedback and patches are appreciated; please check the `Bro
website <http://www.bro.org/community>`_.
Documentation
-------------
Please see the `Broccoli User Manual <./broccoli-manual.html>`_ and
the `Broccoli API Reference <../../broccoli-api/index.html>`_.

File diff suppressed because it is too large Load diff

File diff suppressed because it is too large Load diff

View file

@ -0,0 +1,843 @@
.. -*- mode: rst-mode -*-
..
.. Version number is filled in automatically.
.. |version| replace:: 0.4-14
============================================
BTest - A Simple Driver for Basic Unit Tests
============================================
.. rst-class:: opening
The ``btest`` is a simple framework for writing unit tests. Freely
borrowing some ideas from other packages, it's main objective is to
provide an easy-to-use, straightforward driver for a suite of
shell-based tests. Each test consists of a set of command lines that
will be executed, and success is determined based on their exit
codes. ``btest`` comes with some additional tools that can be used
within such tests to compare output against a previously established
baseline.
.. contents::
Download
========
You can find the latest BTest release for download at
http://www.bro.org/download.
BTest's git repository is located at `git://git.bro.org/btest.git
<git://git.bro.org/btest.git>`__. You can browse the repository
`here <http://git.bro.org/btest.git>`__.
This document describes BTest |version|. See the ``CHANGES``
file for version history.
Installation
============
Installation is simple and standard::
tar xzvf btest-*.tar.gz
cd btest-*
python setup.py install
This will install a few scripts: ``btest`` is the main driver program,
and there are a number of further helper scripts that we discuss below
(including ``btest-diff``, which is a tool for comparing output to a
previously established baseline).
Writing a Simple Test
=====================
In the most simple case, ``btest`` simply executes a set of command
lines, each of which must be prefixed with ``@TEST-EXEC:``
::
> cat examples/t1
@TEST-EXEC: echo "Foo" | grep -q Foo
@TEST-EXEC: test -d .
> btest examples/t1
examples.t1 ... ok
The test passes as both command lines return success. If one of them
didn't, that would be reported::
> cat examples/t2
@TEST-EXEC: echo "Foo" | grep -q Foo
@TEST-EXEC: test -d DOESNOTEXIST
> btest examples/t2
examples.t2 ... failed
Usually you will just run all tests found in a directory::
> btest examples
examples.t1 ... ok
examples.t2 ... failed
1 test failed
Why do we need the ``@TEST-EXEC:`` prefixes? Because the file
containing the test can simultaneously act as *its input*. Let's
say we want to verify a shell script::
> cat examples/t3.sh
# @TEST-EXEC: sh %INPUT
ls /etc | grep -q passwd
> btest examples/t3.sh
examples.t3 ... ok
Here, ``btest`` is executing (something similar to) ``sh
examples/t3.sh``, and then checks the return value as usual. The
example also shows that the ``@TEST-EXEC`` prefix can appear
anywhere, in particular inside the comment section of another
language.
Now, let's say we want to check the output of a program, making sure
that it matches what we expect. For that, we first add a command
line to the test that produces the output we want to check, and then
run ``btest-diff`` to make sure it matches a previously recorded
baseline. ``btest-diff`` is itself just a script that returns
success if the output is as expected, and failure otherwise. In the
following example, we use an awk script as a fancy way to print all
file names starting with a dot in the user's home directory. We
write that list into a file called ``dots`` and then check whether
its content matches what we know from last time::
> cat examples/t4.awk
# @TEST-EXEC: ls -a $HOME | awk -f %INPUT >dots
# @TEST-EXEC: btest-diff dots
/^\.+/ { print $1 }
Note that each test gets its own little sandbox directory when run,
so by creating a file like ``dots``, you aren't cluttering up
anything.
The first time we run this test, we need to record a baseline::
> btest -U examples/t4.awk
Now, ``btest-diff`` has remembered what the ``dots`` file should
look like::
> btest examples/t4.awk
examples.t4 ... ok
> touch ~/.NEWDOTFILE
> btest examples/t4.awk
examples.t4 ... failed
1 test failed
If we want to see what exactly the unexpected change is that was
introduced to ``dots``, there's a *diff* mode for that::
> btest -d examples/t4.awk
examples.t4 ... failed
% 'btest-diff dots' failed unexpectedly (exit code 1)
% cat .diag
== File ===============================
[... current dots file ...]
== Diff ===============================
--- /Users/robin/work/binpacpp/btest/Baseline/examples.t4/dots
2010-10-28 20:11:11.000000000 -0700
+++ dots 2010-10-28 20:12:30.000000000 -0700
@@ -4,6 +4,7 @@
.CFUserTextEncoding
.DS_Store
.MacOSX
+.NEWDOTFILE
.Rhistory
.Trash
.Xauthority
=======================================
% cat .stderr
[... if any of the commands had printed something to stderr, that would follow here ...]
Once we delete the new file, we are fine again::
> rm ~/.NEWDOTFILE
> btest -d examples/t4.awk
examples.t4 ... ok
That's already the main functionality that the ``btest`` package
provides. In the following, we describe a number of further options
extending/modifying this basic approach.
Reference
=========
Command Line Usage
------------------
``btest`` must be started with a list of tests and/or directories
given on the command line. In the latter case, the default is to
recursively scan the directories and assume all files found to be
tests to perform. It is however possible to exclude certain files by
specifying a suitable `configuration file`_.
``btest`` returns exit code 0 if all tests have successfully passed,
and 1 otherwise.
``btest`` accepts the following options:
-a ALTERNATIVE, --alternative=ALTERNATIVE
Activates an alternative_ configuration defined in the
configuration file. This option can be given multiple times to
run tests with several alternatives. If ``ALTERNATIVE`` is ``-``
that refers to running with the standard setup, which can be used
to run tests both with and without alterantives by giving both.
-b, --brief
Does not output *anything* for tests which pass. If all tests
pass, there will not be any output at all.
-c CONFIG, --config=CONFIG
Specifies an alternative `configuration file`_ to use. If not
specified, the default is to use a file called ``btest.cfg``
if found in the current directory.
-d, --diagnostics
Reports diagnostics for all failed tests. The diagnostics
include the command line that failed, its output to standard
error, and potential additional information recorded by the
command line for diagnostic purposes (see `@TEST-EXEC`_
below). In the case of ``btest-diff``, the latter is the
``diff`` between baseline and actual output.
-D, --diagnostics-all
Reports diagnostics for all tests, including those which pass.
-f DIAGFILE, --file-diagnostics=DIAGFILE
Writes diagnostics for all failed tests into the given file.
If the file already exists, it will be overwritten.
-g GROUPS, --group=GROUPS
Runs only tests assigned to the given test groups, see
`@TEST-GROUP`_. Multiple groups can be given as a
comma-separated list. Specifying ``-`` as a group name selects
all tests that do not belong to any group.
-j [THREADS], --jobs[=THREADS]
Runs up to the given number of tests in parallel. If no number
is given, BTest substitutes the number of available CPU cores
as reported by the OS.
By default, BTest assumes that all tests can be executed
concurrently without further constraints. One can however
ensure serialization of subsets by assigning them to the same
serialization set, see `@TEST-SERIALIZE`_.
-q, --quiet
Suppress information output other than about failed tests.
If all tests pass, there will not be any output at all.
-r, --rerun
Runs only tests that failed last time. After each execution
(except when updating baselines), BTest generates a state file
that records the tests that have failed. Using this option on
the next run then reads that file back in and limits execution
to those tests found in there.
-t, --tmp-keep
Does not delete any temporary files created for running the
tests (including their outputs). By default, the temporary
files for a test will be located in ``.tmp/<test>/``, where
``<test>`` is the relative path of the test file with all slashes
replaced with dots and the file extension removed (e.g., the files
for ``example/t3.sh`` will be in ``.tmp/example.t3``).
-U, --update-baseline
Records a new baseline for all ``btest-diff`` commands found
in any of the specified tests. To do this, all tests are run
as normal except that when ``btest-diff`` is executed, it
does not compute a diff but instead considers the given file
to be authoritative and records it as the version to compare
with in future runs.
-u, --update-interactive
Each time a ``btest-diff`` command fails in any tests that are
run, btest will stop and ask whether or not the user wants to
record a new baseline.
-v, --verbose
Shows all test command lines as they are executed.
-w, --wait
Interactively waits for ``<enter>`` after showing diagnostics
for a test.
-x FILE, --xml=FILE
Records test results in JUnit XML format to the given file.
If the file exists already, it is overwritten.
.. _configuration file:
Configuration
-------------
Specifics of ``btest``'s execution can be tuned with a configuration
file, which by default is ``btest.cfg`` if that's found in the
current directory. It can alternatively be specified with the
``--config`` command line option. The configuration file is
"INI-style", and an example comes with the distribution, see
``btest.cfg.example``. A configuration file has one main section,
``btest``, that defines most options; as well as an optional section
for defining `environment variables`_ and further optional sections
for defining alternatives_.
Note that all paths specified in the configuration file are relative
to ``btest``'s *base directory*. The base directory is either the
one where the configuration file is located if such is given/found,
or the current working directory if not. When setting values for
configuration options, the absolute path to the base directory is
available by using the macro ``%(testbase)s`` (the weird syntax is
due to Python's ``ConfigParser`` module).
Furthermore, all values can use standard "backtick-syntax" to
include the output of external commands (e.g., xyz=`\echo test\`).
Note that the backtick expansion is performed after any ``%(..)``
have already been replaced (including within the backticks).
Options
~~~~~~~
The following options can be set in the ``btest`` section of the
configuration file:
``TestDirs``
A space-separated list of directories to search for tests. If
defined, one doesn't need to specify any tests on the command
line.
``TmpDir``
A directory where to create temporary files when running tests.
By default, this is set to ``%(testbase)s/.tmp``.
``BaselineDir``
A directory where to store the baseline files for ``btest-diff``.
By default, this is set to ``%(testbase)s/Baseline``.
``IgnoreDirs``
A space-separated list of relative directory names to ignore
when scanning test directories recursively. Default is empty.
``IgnoreFiles``
A space-separated list of filename globs matching files to
ignore when scanning given test directories recursively.
Default is empty.
``StateFile``
The name of the state file to record the names of failing tests. Default is
``.btest.failed.dat``.
``Finalizer``
An executable that will be executed each time any test has
successfully run. It runs in the same directory as the test itself
and receives the name of the test as its parameter. The return
value indicates whether the test should indeed be considered
successful. By default, there's no finalizer set.
.. _environment variables:
Environment Variables
~~~~~~~~~~~~~~~~~~~~~
A special section ``environment`` defines environment variables that
will be propagated to all tests::
[environment]
CFLAGS=-O3
PATH=%(testbase)s/bin:%(default_path)s
Note how ``PATH`` can be adjusted to include local scripts: the
example above prefixes it with a local ``bin/`` directory inside the
base directory, using the predefined ``default_path`` macro to refer
to the ``PATH`` as it is set by default.
Furthermore, by setting ``PATH`` to include the ``btest``
distribution directory, one could skip the installation of the
``btest`` package.
.. _alternative:
Alternatives
~~~~~~~~~~~~
BTest can run a set of tests with different settings than it would
normally use by specifying an *alternative* configuration. Currently,
three things can be adjusted:
- Further environment variables can be set that will then be
available to all the commands that a test executes.
- *Filters* can modify an input file before a test uses it.
- *Substitutions* can modify command lines executed as part of a
test.
We discuss the three separately in the following. All of them are
defined by adding sections ``[<type>-<name>]`` where ``<type>``
corresponds to the type of adjustment being made and ``<name>`` is the
name of the alternative. Once at least one section is defined for a
name, that alternative can be enabled by BTest's ``--alternative``
flag.
Environment Variables
^^^^^^^^^^^^^^^^^^^^^
An alternative can add further environment variables by defining an
``[environment-<name>]`` section:
[environment-myalternative]
CFLAGS=-O3
Running ``btest`` with ``--alternative=myalternative`` will now make
the ``CFLAGS`` environment variable available to all commands
executed.
.. _filters:
Filters
^^^^^^^
Filters are a transparent way to adapt the input to a specific test
command before it is executed. A filter is defined by adding a section
``[filter-<name>]`` to the configuration file. This section must have
exactly one entry, and the name of that entry is interpreted as the
name of a command whose input is to be filtered. The value of that
entry is the name of a filter script that will be run with two
arguments representing input and output files, respectively. Example::
[filter-myalternative]
cat=%(testbase)s/bin/filter-cat
Once the filter is activated by running ``btest`` with
``--alternative=myalternative``, every time a ``@TEST-EXEC: cat
%INPUT`` is found, ``btest`` will first execute (something similar to)
``%(testbase)s/bin/filter-cat %INPUT out.tmp``, and then subsequently
``cat out.tmp`` (i.e., the original command but with the filtered
output). In the simplest case, the filter could be a no-op in the
form ``cp $1 $2``.
.. note::
There are a few limitations to the filter concept currently:
* Filters are *always* fed with ``%INPUT`` as their first
argument. We should add a way to filter other files as well.
* Filtered commands are only recognized if they are directly
starting the command line. For example, ``@TEST-EXEC: ls | cat
>outout`` would not trigger the example filter above.
* Filters are only executed for ``@TEST-EXEC``, not for
``@TEST-EXEC-FAIL``.
.. _substitution:
Substitutions
^^^^^^^^^^^^^^
Substitutions are similar to filters, yet they do not adapt the input
but the command line being executed. A substitution is defined by
adding a section ``[substitution-<name>]`` to the configuration file.
For each entry in this section, the entry's name specifies the
command that is to be replaced with something else given as its value.
Example::
[substitution-myalternative]
gcc=gcc -O2
Once the substitution is activated by running ``btest`` with
``--alternative=myalternative``, every time a ``@TEST-EXEC`` executes
``gcc``, that is replaced with ``gcc -O2``. The replacement is simple
string substitution so it works not only with commands but anything
found on the command line; it however only replaces full words, not
subparts of words.
Writing Tests
-------------
``btest`` scans a test file for lines containing keywords that
trigger certain functionality. Currently, the following keywords are
supported:
.. _@TEST-EXEC:
``@TEST-EXEC: <cmdline>``
Executes the given command line and aborts the test if it
returns an error code other than zero. The ``<cmdline>`` is
passed to the shell and thus can be a pipeline, use redirection,
and any environment variables specified in ``<cmdline>`` will be
expanded, etc.
When running a test, the current working directory for all
command lines will be set to a temporary sandbox (and will be
deleted later).
There are two macros that can be used in ``<cmdline>``:
``%INPUT`` will be replaced with the full pathname of the file defining
the test; and ``%DIR`` will be replaced with the directory where
the test file is located. The latter can be used to reference
further files also located there.
In addition to environment variables defined in the
configuration file, there are further ones that are passed into
the commands:
``TEST_DIAGNOSTICS``
A file where further diagnostic information can be saved
in case a command fails. ``--diagnostics`` will show
this file. (This is also where ``btest-diff`` stores its
diff.)
``TEST_MODE``
This is normally set to ``TEST``, but will be ``UPDATE``
if ``btest`` is run with ``--update-baseline``, or
``UPDATE_INTERACTIVE`` if run with ``--update-interactive``.
``TEST_BASELINE``
The name of a directory where the command can save permanent
information across ``btest`` runs. (This is where
``btest-diff`` stores its baseline in ``UPDATE`` mode.)
``TEST_NAME``
The name of the currently executing test.
``TEST_VERBOSE``
The path of a file where the test can record further
information about its execution that will be included with
btest's ``--verbose`` output. This is for further tracking
the execution of commands and should generally generate
output that follows a line-based structure.
.. note::
If a command returns the special exit code 100, the test is
considered failed, however subsequent test commands are still
run. ``btest-diff`` uses this special exit code to indicate that
no baseline has yet been established.
If a command returns the special exit code 200, the test is
considered failed and all further test executions are aborted.
``@TEST-EXEC-FAIL: <cmdline>``
Like ``@TEST-EXEC``, except that this expects the command to
*fail*, i.e., the test is aborted when the return code is zero.
``@TEST-REQUIRES: <cmdline>``
Defines a condition that must be met for the test to be executed.
The given command line will be run before any of the actual test
commands, and it must return success for the test to continue. If
it does not return success, the rest of the test will be skipped
but doing so will not be considered a failure of the test. This allows to
write conditional tests that may not always make sense to run, depending
on whether external constraints are satisfied or not (say, whether
a particular library is available). Multiple requirements may be
specified and then all must be met for the test to continue.
``@TEST-ALTERNATIVE: <alternative>`` Runs this test only for the given
alternative (see alternative_). If ``<alternatives>`` is
``default``, the test executes when BTest runs with no alternative
given (which however is the default anyways).
``@TEST-NOT-ALTERNATIVE: <alternative>`` Ignores this test for the
given alternative (see alternative_). If ``<alternative>`` is
``default``, the test is ignored if BTest runs with no alternative
given.
``@TEST-COPY-FILE: <file>``
Copy the given file into the test's directory before the test is
run. If ``<file>`` is a relative path, it's interpreted relative
to the BTest's base directory. Environment variables in ``<file>``
will be replaced if enclosed in ``${..}``. This command can be
given multiple times.
``@TEST-START-NEXT``
This is a short-cut for defining multiple test inputs in the
same file, all executing with the same command lines. When
``@TEST-START-NEXT`` is encountered, the test file is initially
considered to end at that point, and all ``@TEST-EXEC-*`` are
run with an ``%INPUT`` truncated accordingly. Afterwards, a
*new* ``%INPUT`` is created with everything *following* the
``@TEST-START-NEXT`` marker, and the *same* commands are run
again (further ``@TEST-EXEC-*`` will be ignored). The effect is
that a single file can actually define two tests, and the
``btest`` output will enumerate them::
> cat examples/t5.sh
# @TEST-EXEC: cat %INPUT | wc -c >output
# @TEST-EXEC: btest-diff output
This is the first test input in this file.
# @TEST-START-NEXT
... and the second.
> ./btest -D examples/t5.sh
examples.t5 ... ok
% cat .diag
== File ===============================
119
[...]
examples.t5-2 ... ok
% cat .diag
== File ===============================
22
[...]
Multiple ``@TEST-START-NEXT`` can be used to create more than
two tests per file.
``@TEST-START-FILE <file>``
This is used to include an additional input file for a test
right inside the test file. All lines following the keyword will
be written into the given file (and removed from the test's
`%INPUT`) until a terminating ``@TEST-END-FILE`` is found.
Example::
> cat examples/t6.sh
# @TEST-EXEC: awk -f %INPUT <foo.dat >output
# @TEST-EXEC: btest-diff output
{ lines += 1; }
END { print lines; }
@TEST-START-FILE foo.dat
1
2
3
@TEST-END-FILE
> btest -D examples/t6.sh
examples.t6 ... ok
% cat .diag
== File ===============================
3
Multiple such files can be defined within a single test.
Note that this is only one way to use further input files.
Another is to store a file in the same directory as the test
itself, making sure it's ignored via ``IgnoreFiles``, and then
refer to it via ``%DIR/<name>``.
.. _@TEST-GROUP:
``@TEST-GROUP: <group>``
Assigns the test to a group of name ``<group>``. By using option
``-g`` one can limit execution to all tests that belong to a given
group (or a set of groups).
.. _@TEST-SERIALIZE:
``@TEST-SERIALIZE: <set>``
When using option ``-j`` to parallelize execution, all tests that
specify the same serialization set are guaranteed to run
sequentially. ``<set>`` is an arbitrary user-chosen string.
Canonifying Diffs
=================
``btest-diff`` has the capability to filter its input through an
additional script before it compares the current version with the
baseline. This can be useful if certain elements in an output are
*expected* to change (e.g., timestamps). The filter can then
remove/replace these with something consistent. To enable such
canonification, set the environment variable
``TEST_DIFF_CANONIFIER`` to a script reading the original version
from stdin and writing the canonified version to stdout. Note that
both baseline and current output are passed through the filter
before their differences are computed.
Running Processes in the Background
===================================
Sometimes processes need to be spawned in the background for a test,
in particular if multiple processes need to cooperate in some fashion.
``btest`` comes with two helper scripts to make life easier in such a
situation:
``btest-bg-run <tag> <cmdline>``
This is a script that runs ``<cmdline>`` in the background, i.e.,
it's like using ``cmdline &`` in a shell script. Test execution
continues immediately with the next command. Note that the spawned
command is *not* run in the current directory, but instead in a
newly created sub-directory called ``<tag>``. This allows
spawning multiple instances of the same process without needing to
worry about conflicting outputs. If you want to access a command's
output later, like with ``btest-diff``, use ``<tag>/foo.log`` to
access it.
``btest-bg-wait [-k] <timeout>``
This script waits for all processes previously spawned via
``btest-bg-run`` to finish. If any of them exits with a non-zero
return code, ``btest-bg-wait`` does so as well, indicating a
failed test. ``<timeout>`` is mandatory and gives the maximum
number of seconds to wait for any of the processes to terminate.
If any process hasn't done so when the timeout expires, it will be
killed and the test is considered to be failed as long as ``-k``
is not given. If ``-k`` is given, pending processes are still
killed but the test continues normally, i.e., non-termination is
not considered a failure in this case. This script also collects
the processes' stdout and stderr outputs for diagnostics output.
Integration with Sphinx
=======================
``btest`` comes with a new directive for the documentation framework
`Sphinx <http://sphinx.pocoo.org>`_. The directive allows to write a
test directly inside a Sphinx document, and then to include output
from the test's command into the generated documentation. The same
tests can also run externally and will catch if any changes to the
included content occur. The following walks through setting this up.
Configuration
-------------
First, you need to tell Sphinx a base directory for the ``btest``
configuration as well as a directory in there where to store tests
it extracts from the Sphinx documentation. Typically, you'd just
create a new subdirectory ``tests`` in the Sphinx project for the
``btest`` setup and then store the tests in there in, e.g.,
``doc/``::
cd <sphinx-root>
mkdir tests
mkdir tests/doc
Then add the following to your Sphinx ``conf.py``::
extensions += ["btest-sphinx"]
btest_base="tests" # Relative to Sphinx-root.
btest_tests="doc" # Relative to btest_base.
Next, a finalizer to ``btest.cfg``::
[btest]
...
Finalizer=btest-diff-rst
Finally, create a ``btest.cfg`` in ``tests/`` as usual and add
``doc/`` to the ``TestDirs`` option.
Including a Test into a Sphinx Document
---------------------------------------
The ``btest`` extension provides a new directive to include a test
inside a Sphinx document::
.. btest:: <test-name>
<test content>
Here, ``<test-name>`` is a custom name for the test; it will be
stored in ``btest_tests`` under that name. ``<test content>`` is just
a standard test as you would normally put into one of the
``TestDirs``. Example::
.. btest:: just-a-test
@TEST-EXEC: expr 2 + 2
When you now run Sphinx, it will (1) store the test content into
``tests/doc/just-a-test`` (assuming the above path layout), and (2)
execute the test by running ``btest`` on it. You can then run
``btest`` manually in ``tests/`` as well and it will execute the test
just as it would in a standard setup. If a test fails when Sphinx runs
it, there will be a corresponding error and include the diagnostic output
into the document.
By default, nothing else will be included into the generated
documentation, i.e., the above test will just turn into an empty text
block. However, ``btest`` comes with a set of scripts that you can use
to specify content to be included. As a simple example,
``btest-rst-cmd <cmdline>`` will execute a command and (if it
succeeds) include both the command line and the standard output into
the documentation. Example::
.. btest:: another-test
@TEST-EXEC: btest-rst-cmd echo Hello, world!
When running Sphinx, this will render as:
.. code::
# echo Hello, world!
Hello world!
When running ``btest`` manually in ``tests/``, the ``Finalizer`` we
added to ``btest.cfg`` (see above) compares the generated reST code
with a previously established baseline, just like ``btest-diff`` does
with files. To establish the initial baseline, run ``btest -u``, like
you would with ``btest-diff``.
Scripts
-------
The following Sphinx support scripts come with ``btest``:
``btest-rst-cmd [options] <cmdline>``
By default, this executes ``<cmdline>`` and includes both the
command line itself and its standard output into the generated
documentation. See above for an example.
This script provides the following options:
-c ALTERNATIVE_CMDLINE
Show ``ALTERNATIVE_CMDLINE`` in the generated
documentation instead of the one actually executed. (It
still runs the ``<cmdline>`` given outside the option.)
-d
Do not actually execute ``<cmdline>``; just format it for
the generated documentation and include no further output.
-f FILTER_CMD
Pipe the command line's output through ``FILTER_CMD``
before including. If ``-r`` is given, it filters the
file's content instead of stdout.
-o
Do not include the executed command into the generated
documentation, just its output.
-r FILE
Insert ``FILE`` into output instead of stdout.
``btest-rst-include <file>``
Includes ``<file>`` inside a code block.
``btest-rst-pipe <cmdline>``
Executes ``<cmdline>``, includes its standard output inside a code
block. Note that this script does not include the command line
itself into the code block, just the output.
.. note::
All these scripts can be run directly from the command line to show
the reST code they generate.
.. note::
``btest-rst-cmd`` can do everything the other scripts provide if
you give it the right options. In fact, the other scripts are
provided just for convenience and leverage ``btest-rst-cmd``
internally.
License
=======
btest is open-source under a BSD licence.

View file

@ -0,0 +1,107 @@
.. -*- mode: rst-mode -*-
..
.. Version number is filled in automatically.
.. |version| replace:: 0.18
===============================================
capstats - A tool to get some NIC statistics.
===============================================
.. rst-class:: opening
capstats is a small tool to collect statistics on the
current load of a network interface, using either `libpcap
<http://www.tcpdump.org>`_ or the native interface for `Endace's
<http:///www.endace.com>`_. It reports statistics per time interval
and/or for the tool's total run-time.
Download
--------
You can find the latest capstats release for download at
http://www.bro.org/download.
Capstats's git repository is located at `git://git.bro.org/capstats.git
<git://git.bro.org/capstats.git>`__. You can browse the repository
`here <http://git.bro.org/capstats.git>`__.
This document describes capstats |version|. See the ``CHANGES``
file for version history.
Output
------
Here's an example output with output in one-second intervals until
``CTRL-C`` is hit:
.. console::
>capstats -i nve0 -I 1
1186620936.890567 pkts=12747 kpps=12.6 kbytes=10807 mbps=87.5 nic_pkts=12822 nic_drops=0 u=960 t=11705 i=58 o=24 nonip=0
1186620937.901490 pkts=13558 kpps=13.4 kbytes=11329 mbps=91.8 nic_pkts=13613 nic_drops=0 u=1795 t=24339 i=119 o=52 nonip=0
1186620938.912399 pkts=14771 kpps=14.6 kbytes=13659 mbps=110.7 nic_pkts=14781 nic_drops=0 u=2626 t=38154 i=185 o=111 nonip=0
1186620939.012446 pkts=1332 kpps=13.3 kbytes=1129 mbps=92.6 nic_pkts=1367 nic_drops=0 u=2715 t=39387 i=194 o=112 nonip=0
=== Total
1186620939.012483 pkts=42408 kpps=13.5 kbytes=36925 mbps=96.5 nic_pkts=1 nic_drops=0 u=2715 t=39387 i=194 o=112 nonip=0
Each line starts with a timestamp and the other fields are:
:pkts:
Absolute number of packets seen by ``capstats`` during interval.
:kpps:
Number of packets per second.
:kbytes:
Absolute number of KBytes during interval.
:mbps:
Mbits/sec.
:nic_pkts:
Number of packets as reported by ``libpcap``'s ``pcap_stats()`` (may not match _pkts_)
:nic_drops:
Number of packet drops as reported by ``libpcap``'s ``pcap_stats()``.
:u:
Number of UDP packets.
:t:
Number of TCP packets.
:i:
Number of ICMP packets.
:nonip:
Number of non-IP packets.
Options
-------
A list of all options::
capstats [Options] -i interface
-i| --interface <interface> Listen on interface
-d| --dag Use native DAG API
-f| --filter <filter> BPF filter
-I| --interval <secs> Stats logging interval
-l| --syslog Use syslog rather than print to stderr
-n| --number <count> Stop after outputting <number> intervals
-N| --select Use select() for live pcap (for testing only)
-p| --payload <n> Verifies that packets' payloads consist
entirely of bytes of the given value.
-q| --quiet <count> Suppress output, exit code indicates >= count
packets received.
-S| --size <size> Verify packets to have given <size>
-s| --snaplen <size> Use pcap snaplen <size>
-v| --version Print version and exit
-w| --write <filename> Write packets to file
Installation
------------
``capstats`` has been tested on Linux, FreeBSD, and MacOS. Please see
the ``INSTALL`` file for installation instructions.

28
doc/components/index.rst Normal file
View file

@ -0,0 +1,28 @@
=====================
Additional Components
=====================
The following are snapshots of documentation for components that come
with this version of Bro (|version|). Since they can also be used
independently, see the `download page
<http://bro-ids.org/download/index.html>`_ for documentation of any
current, independent component releases.
.. toctree::
:maxdepth: 1
BinPAC - A protocol parser generator <binpac/README>
Broccoli - The Bro Client Communication Library (README) <broccoli/README>
Broccoli - User Manual <broccoli/broccoli-manual>
Broccoli Python Bindings <broccoli-python/README>
Broccoli Ruby Bindings <broccoli-ruby/README>
BroControl - Interactive Bro management shell <broctl/README>
Bro-Aux - Small auxiliary tools for Bro <bro-aux/README>
BTest - A unit testing framework <btest/README>
Capstats - Command-line packet statistic tool <capstats/README>
PySubnetTree - Python module for CIDR lookups<pysubnettree/README>
trace-summary - Script for generating break-downs of network traffic <trace-summary/README>
The `Broccoli API Reference <broccoli-api/index.html>`_ may also be of
interest.

View file

@ -0,0 +1,98 @@
.. -*- mode: rst-mode -*-
..
.. Version number is filled in automatically.
.. |version| replace:: 0.19-9
===============================================
PySubnetTree - A Python Module for CIDR Lookups
===============================================
.. rst-class:: opening
The PySubnetTree package provides a Python data structure
``SubnetTree`` which maps subnets given in `CIDR
<http://tools.ietf.org/html/rfc4632>`_ notation (incl.
corresponding IPv6 versions) to Python objects. Lookups are
performed by longest-prefix matching.
Download
--------
You can find the latest PySubnetTree release for download at
http://www.bro.org/download.
PySubnetTree's git repository is located at `git://git.bro.org/pysubnettree.git
<git://git.bro.org/pysubnettree.git>`__. You can browse the repository
`here <http://git.bro.org/pysubnettree.git>`__.
This document describes PySubnetTree |version|. See the ``CHANGES``
file for version history.
Example
-------
A simple example which associates CIDR prefixes with strings::
>>> import SubnetTree
>>> t = SubnetTree.SubnetTree()
>>> t["10.1.0.0/16"] = "Network 1"
>>> t["10.1.42.0/24"] = "Network 1, Subnet 42"
>>> t["10.2.0.0/16"] = "Network 2"
>>> print t["10.1.42.1"]
Network 1, Subnet 42
>>> print t["10.1.43.1"]
Network 1
>>> print "10.1.42.1" in t
True
>>> print "10.1.43.1" in t
True
>>> print "10.20.1.1" in t
False
>>> print t["10.20.1.1"]
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "SubnetTree.py", line 67, in __getitem__
def __getitem__(*args): return _SubnetTree.SubnetTree___getitem__(*args)
KeyError: '10.20.1.1'
By default, CIDR prefixes and IP addresses are given as strings.
Alternatively, a ``SubnetTree`` object can be switched into *binary
mode*, in which single addresses are passed in the form of packed
binary strings as, e.g., returned by `socket.inet_aton
<http://docs.python.org/lib/module-socket.html#l2h-3657>`_::
>>> t.get_binary_lookup_mode()
False
>>> t.set_binary_lookup_mode(True)
>>> t.binary_lookup_mode()
True
>>> import socket
>>> print t[socket.inet_aton("10.1.42.1")]
Network 1, Subnet 42
A SubnetTree also provides methods ``insert(prefix,object=None)`` for insertion
of prefixes (``object`` can be skipped to use the tree like a set), and
``remove(prefix)`` for removing entries (``remove`` performs an _exact_ match
rather than longest-prefix).
Internally, the CIDR prefixes of a ``SubnetTree`` are managed by a
Patricia tree data structure and lookups are therefore efficient
even with a large number of prefixes.
PySubnetTree comes with a BSD license.
Prerequisites
-------------
This package requires Python 2.4 or newer.
Installation
------------
Installation is pretty simple::
> python setup.py install

View file

@ -0,0 +1,154 @@
.. -*- mode: rst-mode -*-
..
.. Version number is filled in automatically.
.. |version| replace:: 0.8
====================================================
trace-summary - Generating network traffic summaries
====================================================
.. rst-class:: opening
``trace-summary`` is a Python script that generates break-downs of
network traffic, including lists of the top hosts, protocols,
ports, etc. Optionally, it can generate output separately for
incoming vs. outgoing traffic, per subnet, and per time-interval.
Download
--------
You can find the latest trace-summary release for download at
http://www.bro.org/download.
trace-summary's git repository is located at `git://git.bro.org/trace-summary.git
<git://git.bro.org/trace-summary.git>`__. You can browse the repository
`here <http://git.bro.org/trace-summary.git>`__.
This document describes trace-summary |version|. See the ``CHANGES``
file for version history.
Overview
--------
The ``trace-summary`` script reads both packet traces in `libpcap
<http://www.tcpdump.org>`_ format and connection logs produced by the
`Bro <http://www.bro.org>`_ network intrusion detection system
(for the latter, it supports both 1.x and 2.x output formats).
Here are two example outputs in the most basic form (note that IP
addresses are 'anonymized'). The first is from a packet trace and the
second from a Bro connection log::
>== Total === 2005-01-06-14-23-33 - 2005-01-06-15-23-43
- Bytes 918.3m - Payload 846.3m - Pkts 1.8m - Frags 0.9% - MBit/s 1.9 -
Ports | Sources | Destinations | Protocols |
80 33.8% | 131.243.89.214 8.5% | 131.243.89.214 7.7% | 6 76.0% |
22 16.7% | 128.3.2.102 6.2% | 128.3.2.102 5.4% | 17 23.3% |
11001 12.4% | 204.116.120.26 4.8% | 131.243.89.4 4.8% | 1 0.5% |
2049 10.7% | 128.3.161.32 3.6% | 131.243.88.227 3.6% | |
1023 10.6% | 131.243.89.4 3.5% | 204.116.120.26 3.4% | |
993 8.2% | 128.3.164.194 2.7% | 131.243.89.64 3.1% | |
1049 8.1% | 128.3.164.15 2.4% | 128.3.164.229 2.9% | |
524 6.6% | 128.55.82.146 2.4% | 131.243.89.155 2.5% | |
33305 4.5% | 131.243.88.227 2.3% | 128.3.161.32 2.3% | |
1085 3.7% | 131.243.89.155 2.3% | 128.55.82.146 2.1% | |
>== Total === 2005-01-06-14-23-33 - 2005-01-06-15-23-42
- Connections 43.4k - Payload 398.4m -
Ports | Sources | Destinations | Services | Protocols | States |
80 21.7% | 207.240.215.71 3.0% | 239.255.255.253 8.0% | other 51.0% | 17 55.8% | S0 46.2% |
427 13.0% | 131.243.91.71 2.2% | 131.243.91.255 4.0% | http 21.7% | 6 36.4% | SF 30.1% |
443 3.8% | 128.3.161.76 1.7% | 131.243.89.138 2.1% | i-echo 7.3% | 1 7.7% | OTH 7.8% |
138 3.7% | 131.243.90.138 1.6% | 255.255.255.255 1.7% | https 3.8% | | RSTO 5.8% |
515 2.4% | 131.243.88.159 1.6% | 128.3.97.204 1.5% | nb-dgm 3.7% | | SHR 4.4% |
11001 2.3% | 131.243.88.202 1.4% | 131.243.88.107 1.1% | printer 2.4% | | REJ 3.0% |
53 1.9% | 131.243.89.250 1.4% | 117.72.94.10 1.1% | dns 1.9% | | S1 1.0% |
161 1.6% | 131.243.89.80 1.3% | 131.243.88.64 1.1% | snmp 1.6% | | RSTR 0.9% |
137 1.4% | 131.243.90.52 1.3% | 131.243.88.159 1.1% | nb-ns 1.4% | | SH 0.3% |
2222 1.1% | 128.3.161.252 1.2% | 131.243.91.92 1.1% | ntp 1.0% | | RSTRH 0.2% |
Prerequisites
-------------
* This script requires Python 2.4 or newer.
* The `pysubnettree
<http://www.bro.org/documentation/pysubnettree.html>`_ Python
module.
* Eddie Kohler's `ipsumdump <http://www.cs.ucla.edu/~kohler/ipsumdump>`_
if using ``trace-summary`` with packet traces (versus Bro connection logs)
Installation
------------
Simply copy the script into some directory which is in your ``PATH``.
Usage
-----
The general usage is::
trace-summary [options] [input-file]
Per default, it assumes the ``input-file`` to be a ``libpcap`` trace
file. If it is a Bro connection log, use ``-c``. If ``input-file`` is
not given, the script reads from stdin. It writes its output to
stdout.
Options
~~~~~~~
The most important options are summarized
below. Run ``trace-summary --help`` to see the full list including
some more esoteric ones.
:-c:
Input is a Bro connection log instead of a ``libpcap`` trace
file.
:-b:
Counts all percentages in bytes rather than number of
packets/connections.
:-E <file>:
Gives a file which contains a list of networks to ignore for the
analysis. The file must contain one network per line, where each
network is of the CIDR form ``a.b.c.d/mask`` (including the
corresponding syntax for IPv6 prefixes, e.g., ``1:2:3:4::/64``).
Empty lines and lines starting with a "#" are ignored.
:-i <duration>:
Creates totals for each time interval of the given length
(default is seconds; add "``m``" for minutes and "``h``" for
hours). Use ``-v`` if you also want to see the breakdowns for
each interval.
:-l <file>:
Generates separate summaries for incoming and outgoing traffic.
``<file>`` is a file which contains a list of networks to be
considered local. Format as for ``-E``.
:-n <n>:
Show top n entries in each break-down. Default is 10.
:-r:
Resolves hostnames in the output.
:-s <n>:
Gives the sample factor if the input has been sampled.
:-S <n>:
Sample input with the given factor; less accurate but faster and
saves memory.
:-m:
Does skip memory-expensive statistics.
:-v:
Generates full break-downs for each time interval. Requires
``-i``.

16
doc/frameworks/index.rst Normal file
View file

@ -0,0 +1,16 @@
==========
Frameworks
==========
.. toctree::
:maxdepth: 1
notice
logging
input
intel
cluster
signatures
geoip

408
doc/frameworks/input.rst Normal file
View file

@ -0,0 +1,408 @@
===============
Input Framework
===============
.. rst-class:: opening
Bro now features a flexible input framework that allows users
to import data into Bro. Data is either read into Bro tables or
converted to events which can then be handled by scripts.
This document gives an overview of how to use the input framework
with some examples. For more complex scenarios it is
worthwhile to take a look at the unit tests in
``testing/btest/scripts/base/frameworks/input/``.
.. contents::
Reading Data into Tables
========================
Probably the most interesting use-case of the input framework is to
read data into a Bro table.
By default, the input framework reads the data in the same format
as it is written by the logging framework in Bro - a tab-separated
ASCII file.
We will show the ways to read files into Bro with a simple example.
For this example we assume that we want to import data from a blacklist
that contains server IP addresses as well as the timestamp and the reason
for the block.
An example input file could look like this:
::
#fields ip timestamp reason
192.168.17.1 1333252748 Malware host
192.168.27.2 1330235733 Botnet server
192.168.250.3 1333145108 Virus detected
To read a file into a Bro table, two record types have to be defined.
One contains the types and names of the columns that should constitute the
table keys and the second contains the types and names of the columns that
should constitute the table values.
In our case, we want to be able to lookup IPs. Hence, our key record
only contains the server IP. All other elements should be stored as
the table content.
The two records are defined as:
.. code:: bro
type Idx: record {
ip: addr;
};
type Val: record {
timestamp: time;
reason: string;
};
Note that the names of the fields in the record definitions have to correspond
to the column names listed in the '#fields' line of the log file, in this
case 'ip', 'timestamp', and 'reason'.
The log file is read into the table with a simple call of the ``add_table``
function:
.. code:: bro
global blacklist: table[addr] of Val = table();
Input::add_table([$source="blacklist.file", $name="blacklist", $idx=Idx, $val=Val, $destination=blacklist]);
Input::remove("blacklist");
With these three lines we first create an empty table that should contain the
blacklist data and then instruct the input framework to open an input stream
named ``blacklist`` to read the data into the table. The third line removes the
input stream again, because we do not need it any more after the data has been
read.
Because some data files can - potentially - be rather big, the input framework
works asynchronously. A new thread is created for each new input stream.
This thread opens the input data file, converts the data into a Bro format and
sends it back to the main Bro thread.
Because of this, the data is not immediately accessible. Depending on the
size of the data source it might take from a few milliseconds up to a few
seconds until all data is present in the table. Please note that this means
that when Bro is running without an input source or on very short captured
files, it might terminate before the data is present in the system (because
Bro already handled all packets before the import thread finished).
Subsequent calls to an input source are queued until the previous action has
been completed. Because of this, it is, for example, possible to call
``add_table`` and ``remove`` in two subsequent lines: the ``remove`` action
will remain queued until the first read has been completed.
Once the input framework finishes reading from a data source, it fires
the ``end_of_data`` event. Once this event has been received all data
from the input file is available in the table.
.. code:: bro
event Input::end_of_data(name: string, source: string) {
# now all data is in the table
print blacklist;
}
The table can also already be used while the data is still being read - it
just might not contain all lines in the input file when the event has not
yet fired. After it has been populated it can be used like any other Bro
table and blacklist entries can easily be tested:
.. code:: bro
if ( 192.168.18.12 in blacklist )
# take action
Re-reading and streaming data
-----------------------------
For many data sources, like for many blacklists, the source data is continually
changing. For these cases, the Bro input framework supports several ways to
deal with changing data files.
The first, very basic method is an explicit refresh of an input stream. When
an input stream is open, the function ``force_update`` can be called. This
will trigger a complete refresh of the table; any changed elements from the
file will be updated. After the update is finished the ``end_of_data``
event will be raised.
In our example the call would look like:
.. code:: bro
Input::force_update("blacklist");
The input framework also supports two automatic refresh modes. The first mode
continually checks if a file has been changed. If the file has been changed, it
is re-read and the data in the Bro table is updated to reflect the current
state. Each time a change has been detected and all the new data has been
read into the table, the ``end_of_data`` event is raised.
The second mode is a streaming mode. This mode assumes that the source data
file is an append-only file to which new data is continually appended. Bro
continually checks for new data at the end of the file and will add the new
data to the table. If newer lines in the file have the same index as previous
lines, they will overwrite the values in the output table. Because of the
nature of streaming reads (data is continually added to the table),
the ``end_of_data`` event is never raised when using streaming reads.
The reading mode can be selected by setting the ``mode`` option of the
add_table call. Valid values are ``MANUAL`` (the default), ``REREAD``
and ``STREAM``.
Hence, when adding ``$mode=Input::REREAD`` to the previous example, the
blacklist table will always reflect the state of the blacklist input file.
.. code:: bro
Input::add_table([$source="blacklist.file", $name="blacklist", $idx=Idx, $val=Val, $destination=blacklist, $mode=Input::REREAD]);
Receiving change events
-----------------------
When re-reading files, it might be interesting to know exactly which lines in
the source files have changed.
For this reason, the input framework can raise an event each time when a data
item is added to, removed from or changed in a table.
The event definition looks like this:
.. code:: bro
event entry(description: Input::TableDescription, tpe: Input::Event, left: Idx, right: Val) {
# act on values
}
The event has to be specified in ``$ev`` in the ``add_table`` call:
.. code:: bro
Input::add_table([$source="blacklist.file", $name="blacklist", $idx=Idx, $val=Val, $destination=blacklist, $mode=Input::REREAD, $ev=entry]);
The ``description`` field of the event contains the arguments that were
originally supplied to the add_table call. Hence, the name of the stream can,
for example, be accessed with ``description$name``. ``tpe`` is an enum
containing the type of the change that occurred.
If a line that was not previously present in the table has been added,
then ``tpe`` will contain ``Input::EVENT_NEW``. In this case ``left`` contains
the index of the added table entry and ``right`` contains the values of the
added entry.
If a table entry that already was present is altered during the re-reading or
streaming read of a file, ``tpe`` will contain ``Input::EVENT_CHANGED``. In
this case ``left`` contains the index of the changed table entry and ``right``
contains the values of the entry before the change. The reason for this is
that the table already has been updated when the event is raised. The current
value in the table can be ascertained by looking up the current table value.
Hence it is possible to compare the new and the old values of the table.
If a table element is removed because it was no longer present during a
re-read, then ``tpe`` will contain ``Input::REMOVED``. In this case ``left``
contains the index and ``right`` the values of the removed element.
Filtering data during import
----------------------------
The input framework also allows a user to filter the data during the import.
To this end, predicate functions are used. A predicate function is called
before a new element is added/changed/removed from a table. The predicate
can either accept or veto the change by returning true for an accepted
change and false for a rejected change. Furthermore, it can alter the data
before it is written to the table.
The following example filter will reject to add entries to the table when
they were generated over a month ago. It will accept all changes and all
removals of values that are already present in the table.
.. code:: bro
Input::add_table([$source="blacklist.file", $name="blacklist", $idx=Idx, $val=Val, $destination=blacklist, $mode=Input::REREAD,
$pred(typ: Input::Event, left: Idx, right: Val) = {
if ( typ != Input::EVENT_NEW ) {
return T;
}
return ( ( current_time() - right$timestamp ) < (30 day) );
}]);
To change elements while they are being imported, the predicate function can
manipulate ``left`` and ``right``. Note that predicate functions are called
before the change is committed to the table. Hence, when a table element is
changed (``tpe`` is ``INPUT::EVENT_CHANGED``), ``left`` and ``right``
contain the new values, but the destination (``blacklist`` in our example)
still contains the old values. This allows predicate functions to examine
the changes between the old and the new version before deciding if they
should be allowed.
Different readers
-----------------
The input framework supports different kinds of readers for different kinds
of source data files. At the moment, the default reader reads ASCII files
formatted in the Bro log file format (tab-separated values). At the moment,
Bro comes with two other readers. The ``RAW`` reader reads a file that is
split by a specified record separator (usually newline). The contents are
returned line-by-line as strings; it can, for example, be used to read
configuration files and the like and is probably
only useful in the event mode and not for reading data to tables.
Another included reader is the ``BENCHMARK`` reader, which is being used
to optimize the speed of the input framework. It can generate arbitrary
amounts of semi-random data in all Bro data types supported by the input
framework.
In the future, the input framework will get support for new data sources
like, for example, different databases.
Add_table options
-----------------
This section lists all possible options that can be used for the add_table
function and gives a short explanation of their use. Most of the options
already have been discussed in the previous sections.
The possible fields that can be set for a table stream are:
``source``
A mandatory string identifying the source of the data.
For the ASCII reader this is the filename.
``name``
A mandatory name for the filter that can later be used
to manipulate it further.
``idx``
Record type that defines the index of the table.
``val``
Record type that defines the values of the table.
``reader``
The reader used for this stream. Default is ``READER_ASCII``.
``mode``
The mode in which the stream is opened. Possible values are
``MANUAL``, ``REREAD`` and ``STREAM``. Default is ``MANUAL``.
``MANUAL`` means that the file is not updated after it has
been read. Changes to the file will not be reflected in the
data Bro knows. ``REREAD`` means that the whole file is read
again each time a change is found. This should be used for
files that are mapped to a table where individual lines can
change. ``STREAM`` means that the data from the file is
streamed. Events / table entries will be generated as new
data is appended to the file.
``destination``
The destination table.
``ev``
Optional event that is raised, when values are added to,
changed in, or deleted from the table. Events are passed an
Input::Event description as the first argument, the index
record as the second argument and the values as the third
argument.
``pred``
Optional predicate, that can prevent entries from being added
to the table and events from being sent.
``want_record``
Boolean value, that defines if the event wants to receive the
fields inside of a single record value, or individually
(default). This can be used if ``val`` is a record
containing only one type. In this case, if ``want_record`` is
set to false, the table will contain elements of the type
contained in ``val``.
Reading Data to Events
======================
The second supported mode of the input framework is reading data to Bro
events instead of reading them to a table using event streams.
Event streams work very similarly to table streams that were already
discussed in much detail. To read the blacklist of the previous example
into an event stream, the following Bro code could be used:
.. code:: bro
type Val: record {
ip: addr;
timestamp: time;
reason: string;
};
event blacklistentry(description: Input::EventDescription, tpe: Input::Event, ip: addr, timestamp: time, reason: string) {
# work with event data
}
event bro_init() {
Input::add_event([$source="blacklist.file", $name="blacklist", $fields=Val, $ev=blacklistentry]);
}
The main difference in the declaration of the event stream is, that an event
stream needs no separate index and value declarations -- instead, all source
data types are provided in a single record definition.
Apart from this, event streams work exactly the same as table streams and
support most of the options that are also supported for table streams.
The options that can be set when creating an event stream with
``add_event`` are:
``source``
A mandatory string identifying the source of the data.
For the ASCII reader this is the filename.
``name``
A mandatory name for the stream that can later be used
to remove it.
``fields``
Name of a record type containing the fields, which should be
retrieved from the input stream.
``ev``
The event which is fired, after a line has been read from the
input source. The first argument that is passed to the event
is an Input::Event structure, followed by the data, either
inside of a record (if ``want_record is set``) or as
individual fields. The Input::Event structure can contain
information, if the received line is ``NEW``, has been
``CHANGED`` or ``DELETED``. Since the ASCII reader cannot
track this information for event filters, the value is
always ``NEW`` at the moment.
``mode``
The mode in which the stream is opened. Possible values are
``MANUAL``, ``REREAD`` and ``STREAM``. Default is ``MANUAL``.
``MANUAL`` means that the file is not updated after it has
been read. Changes to the file will not be reflected in the
data Bro knows. ``REREAD`` means that the whole file is read
again each time a change is found. This should be used for
files that are mapped to a table where individual lines can
change. ``STREAM`` means that the data from the file is
streamed. Events / table entries will be generated as new
data is appended to the file.
``reader``
The reader used for this stream. Default is ``READER_ASCII``.
``want_record``
Boolean value, that defines if the event wants to receive the
fields inside of a single record value, or individually
(default). If this is set to true, the event will receive a
single record of the type provided in ``fields``.

View file

@ -1,5 +1,7 @@
Intel Framework
=============== ======================
Intelligence Framework
======================
Intro Intro
----- -----

View file

@ -0,0 +1,186 @@
=============================
Binary Output with DataSeries
=============================
.. rst-class:: opening
Bro's default ASCII log format is not exactly the most efficient
way for storing and searching large volumes of data. An an
alternative, Bro comes with experimental support for `DataSeries
<http://www.hpl.hp.com/techreports/2009/HPL-2009-323.html>`_
output, an efficient binary format for recording structured bulk
data. DataSeries is developed and maintained at HP Labs.
.. contents::
Installing DataSeries
---------------------
To use DataSeries, its libraries must be available at compile-time,
along with the supporting *Lintel* package. Generally, both are
distributed on `HP Labs' web site
<http://tesla.hpl.hp.com/opensource/>`_. Currently, however, you need
to use recent development versions for both packages, which you can
download from github like this::
git clone http://github.com/dataseries/Lintel
git clone http://github.com/dataseries/DataSeries
To build and install the two into ``<prefix>``, do::
( cd Lintel && mkdir build && cd build && cmake -DCMAKE_INSTALL_PREFIX=<prefix> .. && make && make install )
( cd DataSeries && mkdir build && cd build && cmake -DCMAKE_INSTALL_PREFIX=<prefix> .. && make && make install )
Please refer to the packages' documentation for more information about
the installation process. In particular, there's more information on
required and optional `dependencies for Lintel
<https://raw.github.com/dataseries/Lintel/master/doc/dependencies.txt>`_
and `dependencies for DataSeries
<https://raw.github.com/dataseries/DataSeries/master/doc/dependencies.txt>`_.
For users on RedHat-style systems, you'll need the following::
yum install libxml2-devel boost-devel
Compiling Bro with DataSeries Support
-------------------------------------
Once you have installed DataSeries, Bro's ``configure`` should pick it
up automatically as long as it finds it in a standard system location.
Alternatively, you can specify the DataSeries installation prefix
manually with ``--with-dataseries=<prefix>``. Keep an eye on
``configure``'s summary output, if it looks like the following, Bro
found DataSeries and will compile in the support::
# ./configure --with-dataseries=/usr/local
[...]
====================| Bro Build Summary |=====================
[...]
DataSeries: true
[...]
================================================================
Activating DataSeries
---------------------
The direct way to use DataSeries is to switch *all* log files over to
the binary format. To do that, just add ``redef
Log::default_writer=Log::WRITER_DATASERIES;`` to your ``local.bro``.
For testing, you can also just pass that on the command line::
bro -r trace.pcap Log::default_writer=Log::WRITER_DATASERIES
With that, Bro will now write all its output into DataSeries files
``*.ds``. You can inspect these using DataSeries's set of command line
tools, which its installation process installs into ``<prefix>/bin``.
For example, to convert a file back into an ASCII representation::
$ ds2txt conn.log
[... We skip a bunch of metadata here ...]
ts uid id.orig_h id.orig_p id.resp_h id.resp_p proto service duration orig_bytes resp_bytes conn_state local_orig missed_bytes history orig_pkts orig_ip_bytes resp_pkts resp_ip_bytes
1300475167.096535 CRCC5OdDlXe 141.142.220.202 5353 224.0.0.251 5353 udp dns 0.000000 0 0 S0 F 0 D 1 73 0 0
1300475167.097012 o7XBsfvo3U1 fe80::217:f2ff:fed7:cf65 5353 ff02::fb 5353 udp 0.000000 0 0 S0 F 0 D 1 199 0 0
1300475167.099816 pXPi1kPMgxb 141.142.220.50 5353 224.0.0.251 5353 udp 0.000000 0 0 S0 F 0 D 1 179 0 0
1300475168.853899 R7sOc16woCj 141.142.220.118 43927 141.142.2.2 53 udp dns 0.000435 38 89 SF F 0 Dd 1 66 1 117
1300475168.854378 Z6dfHVmt0X7 141.142.220.118 37676 141.142.2.2 53 udp dns 0.000420 52 99 SF F 0 Dd 1 80 1 127
1300475168.854837 k6T92WxgNAh 141.142.220.118 40526 141.142.2.2 53 udp dns 0.000392 38 183 SF F 0 Dd 1 66 1 211
[...]
(``--skip-all`` suppresses the metadata.)
Note that the ASCII conversion is *not* equivalent to Bro's default
output format.
You can also switch only individual files over to DataSeries by adding
code like this to your ``local.bro``:
.. code:: bro
event bro_init()
{
local f = Log::get_filter(Conn::LOG, "default"); # Get default filter for connection log.
f$writer = Log::WRITER_DATASERIES; # Change writer type.
Log::add_filter(Conn::LOG, f); # Replace filter with adapted version.
}
Bro's DataSeries writer comes with a few tuning options, see
:doc:`scripts/base/frameworks/logging/writers/dataseries`.
Working with DataSeries
=======================
Here are a few examples of using DataSeries command line tools to work
with the output files.
* Printing CSV::
$ ds2txt --csv conn.log
ts,uid,id.orig_h,id.orig_p,id.resp_h,id.resp_p,proto,service,duration,orig_bytes,resp_bytes,conn_state,local_orig,missed_bytes,history,orig_pkts,orig_ip_bytes,resp_pkts,resp_ip_bytes
1258790493.773208,ZTtgbHvf4s3,192.168.1.104,137,192.168.1.255,137,udp,dns,3.748891,350,0,S0,F,0,D,7,546,0,0
1258790451.402091,pOY6Rw7lhUd,192.168.1.106,138,192.168.1.255,138,udp,,0.000000,0,0,S0,F,0,D,1,229,0,0
1258790493.787448,pn5IiEslca9,192.168.1.104,138,192.168.1.255,138,udp,,2.243339,348,0,S0,F,0,D,2,404,0,0
1258790615.268111,D9slyIu3hFj,192.168.1.106,137,192.168.1.255,137,udp,dns,3.764626,350,0,S0,F,0,D,7,546,0,0
[...]
Add ``--separator=X`` to set a different separator.
* Extracting a subset of columns::
$ ds2txt --select '*' ts,id.resp_h,id.resp_p --skip-all conn.log
1258790493.773208 192.168.1.255 137
1258790451.402091 192.168.1.255 138
1258790493.787448 192.168.1.255 138
1258790615.268111 192.168.1.255 137
1258790615.289842 192.168.1.255 138
[...]
* Filtering rows::
$ ds2txt --where '*' 'duration > 5 && id.resp_p > 1024' --skip-all conn.ds
1258790631.532888 V8mV5WLITu5 192.168.1.105 55890 239.255.255.250 1900 udp 15.004568 798 0 S0 F 0 D 6 966 0 0
1258792413.439596 tMcWVWQptvd 192.168.1.105 55890 239.255.255.250 1900 udp 15.004581 798 0 S0 F 0 D 6 966 0 0
1258794195.346127 cQwQMRdBrKa 192.168.1.105 55890 239.255.255.250 1900 udp 15.005071 798 0 S0 F 0 D 6 966 0 0
1258795977.253200 i8TEjhWd2W8 192.168.1.105 55890 239.255.255.250 1900 udp 15.004824 798 0 S0 F 0 D 6 966 0 0
1258797759.160217 MsLsBA8Ia49 192.168.1.105 55890 239.255.255.250 1900 udp 15.005078 798 0 S0 F 0 D 6 966 0 0
1258799541.068452 TsOxRWJRGwf 192.168.1.105 55890 239.255.255.250 1900 udp 15.004082 798 0 S0 F 0 D 6 966 0 0
[...]
* Calculate some statistics:
Mean/stddev/min/max over a column::
$ dsstatgroupby '*' basic duration from conn.ds
# Begin DSStatGroupByModule
# processed 2159 rows, where clause eliminated 0 rows
# count(*), mean(duration), stddev, min, max
2159, 42.7938, 1858.34, 0, 86370
[...]
Quantiles of total connection volume::
$ dsstatgroupby '*' quantile 'orig_bytes + resp_bytes' from conn.ds
[...]
2159 data points, mean 24616 +- 343295 [0,1.26615e+07]
quantiles about every 216 data points:
10%: 0, 124, 317, 348, 350, 350, 601, 798, 1469
tails: 90%: 1469, 95%: 7302, 99%: 242629, 99.5%: 1226262
[...]
The ``man`` pages for these tools show further options, and their
``-h`` option gives some more information (either can be a bit cryptic
unfortunately though).
Deficiencies
------------
Due to limitations of the DataSeries format, one cannot inspect its
files before they have been fully written. In other words, when using
DataSeries, it's currently not possible to inspect the live log
files inside the spool directory before they are rotated to their
final location. It seems that this could be fixed with some effort,
and we will work with DataSeries development team on that if the
format gains traction among Bro users.
Likewise, we're considering writing custom command line tools for
interacting with DataSeries files, making that a bit more convenient
than what the standard utilities provide.

View file

@ -0,0 +1,89 @@
=========================================
Indexed Logging Output with ElasticSearch
=========================================
.. rst-class:: opening
Bro's default ASCII log format is not exactly the most efficient
way for searching large volumes of data. ElasticSearch
is a new data storage technology for dealing with tons of data.
It's also a search engine built on top of Apache's Lucene
project. It scales very well, both for distributed indexing and
distributed searching.
.. contents::
Warning
-------
This writer plugin is still in testing and is not yet recommended for
production use! The approach to how logs are handled in the plugin is "fire
and forget" at this time, there is no error handling if the server fails to
respond successfully to the insertion request.
Installing ElasticSearch
------------------------
Download the latest version from: <http://www.elasticsearch.org/download/>.
Once extracted, start ElasticSearch with::
# ./bin/elasticsearch
For more detailed information, refer to the ElasticSearch installation
documentation: http://www.elasticsearch.org/guide/reference/setup/installation.html
Compiling Bro with ElasticSearch Support
----------------------------------------
First, ensure that you have libcurl installed the run configure.::
# ./configure
[...]
====================| Bro Build Summary |=====================
[...]
cURL: true
[...]
ElasticSearch: true
[...]
================================================================
Activating ElasticSearch
------------------------
The easiest way to enable ElasticSearch output is to load the tuning/logs-to-
elasticsearch.bro script. If you are using BroControl, the following line in
local.bro will enable it.
.. console::
@load tuning/logs-to-elasticsearch
With that, Bro will now write most of its logs into ElasticSearch in addition
to maintaining the Ascii logs like it would do by default. That script has
some tunable options for choosing which logs to send to ElasticSearch, refer
to the autogenerated script documentation for those options.
There is an interface being written specifically to integrate with the data
that Bro outputs into ElasticSearch named Brownian. It can be found here::
https://github.com/grigorescu/Brownian
Tuning
------
A common problem encountered with ElasticSearch is too many files being held
open. The ElasticSearch website has some suggestions on how to increase the
open file limit.
- http://www.elasticsearch.org/tutorials/2011/04/06/too-many-open-files.html
TODO
----
Lots.
- Perform multicast discovery for server.
- Better error detection.
- Better defaults (don't index loaded-plugins, for instance).
-

387
doc/frameworks/logging.rst Normal file
View file

@ -0,0 +1,387 @@
=================
Logging Framework
=================
.. rst-class:: opening
Bro comes with a flexible key-value based logging interface that
allows fine-grained control of what gets logged and how it is
logged. This document describes how logging can be customized and
extended.
.. contents::
Terminology
===========
Bro's logging interface is built around three main abstractions:
Log streams
A stream corresponds to a single log. It defines the set of
fields that a log consists of with their names and fields.
Examples are the ``conn`` for recording connection summaries,
and the ``http`` stream for recording HTTP activity.
Filters
Each stream has a set of filters attached to it that determine
what information gets written out. By default, each stream has
one default filter that just logs everything directly to disk
with an automatically generated file name. However, further
filters can be added to record only a subset, split a stream
into different outputs, or to even duplicate the log to
multiple outputs. If all filters are removed from a stream,
all output is disabled.
Writers
A writer defines the actual output format for the information
being logged. At the moment, Bro comes with only one type of
writer, which produces tab separated ASCII files. In the
future we will add further writers, like for binary output and
direct logging into a database.
Basics
======
The data fields that a stream records are defined by a record type
specified when it is created. Let's look at the script generating Bro's
connection summaries as an example,
:doc:`scripts/base/protocols/conn/main`. It defines a record
:bro:type:`Conn::Info` that lists all the fields that go into
``conn.log``, each marked with a ``&log`` attribute indicating that it
is part of the information written out. To write a log record, the
script then passes an instance of :bro:type:`Conn::Info` to the logging
framework's :bro:id:`Log::write` function.
By default, each stream automatically gets a filter named ``default``
that generates the normal output by recording all record fields into a
single output file.
In the following, we summarize ways in which the logging can be
customized. We continue using the connection summaries as our example
to work with.
Filtering
---------
To create a new output file for an existing stream, you can add a
new filter. A filter can, e.g., restrict the set of fields being
logged:
.. code:: bro
event bro_init()
{
# Add a new filter to the Conn::LOG stream that logs only
# timestamp and originator address.
local filter: Log::Filter = [$name="orig-only", $path="origs", $include=set("ts", "id.orig_h")];
Log::add_filter(Conn::LOG, filter);
}
Note the fields that are set for the filter:
``name``
A mandatory name for the filter that can later be used
to manipulate it further.
``path``
The filename for the output file, without any extension (which
may be automatically added by the writer). Default path values
are generated by taking the stream's ID and munging it slightly.
:bro:enum:`Conn::LOG` is converted into ``conn``,
:bro:enum:`PacketFilter::LOG` is converted into
``packet_filter``, and :bro:enum:`Notice::POLICY_LOG` is
converted into ``notice_policy``.
``include``
A set limiting the fields to the ones given. The names
correspond to those in the :bro:type:`Conn::Info` record, with
sub-records unrolled by concatenating fields (separated with
dots).
Using the code above, you will now get a new log file ``origs.log``
that looks like this::
#separator \x09
#path origs
#fields ts id.orig_h
#types time addr
1128727430.350788 141.42.64.125
1128727435.450898 141.42.64.125
If you want to make this the only log file for the stream, you can
remove the default filter (which, conveniently, has the name
``default``):
.. code:: bro
event bro_init()
{
# Remove the filter called "default".
Log::remove_filter(Conn::LOG, "default");
}
An alternate approach to "turning off" a log is to completely disable
the stream:
.. code:: bro
event bro_init()
{
Log::disable_stream(Conn::LOG);
}
If you want to skip only some fields but keep the rest, there is a
corresponding ``exclude`` filter attribute that you can use instead of
``include`` to list only the ones you are not interested in.
A filter can also determine output paths *dynamically* based on the
record being logged. That allows, e.g., to record local and remote
connections into separate files. To do this, you define a function
that returns the desired path:
.. code:: bro
function split_log(id: Log::ID, path: string, rec: Conn::Info) : string
{
# Return "conn-local" if originator is a local IP, otherwise "conn-remote".
local lr = Site::is_local_addr(rec$id$orig_h) ? "local" : "remote";
return fmt("%s-%s", path, lr);
}
event bro_init()
{
local filter: Log::Filter = [$name="conn-split", $path_func=split_log, $include=set("ts", "id.orig_h")];
Log::add_filter(Conn::LOG, filter);
}
Running this will now produce two files, ``local.log`` and
``remote.log``, with the corresponding entries. One could extend this
further for example to log information by subnets or even by IP
address. Be careful, however, as it is easy to create many files very
quickly ...
.. sidebar:: A More Generic Path Function
The ``split_log`` method has one draw-back: it can be used
only with the :bro:enum:`Conn::LOG` stream as the record type is hardcoded
into its argument list. However, Bro allows to do a more generic
variant:
.. code:: bro
function split_log(id: Log::ID, path: string, rec: record { id: conn_id; } ) : string
{
return Site::is_local_addr(rec$id$orig_h) ? "local" : "remote";
}
This function can be used with all log streams that have records
containing an ``id: conn_id`` field.
While so far we have seen how to customize the columns being logged,
you can also control which records are written out by providing a
predicate that will be called for each log record:
.. code:: bro
function http_only(rec: Conn::Info) : bool
{
# Record only connections with successfully analyzed HTTP traffic
return rec$service == "http";
}
event bro_init()
{
local filter: Log::Filter = [$name="http-only", $path="conn-http", $pred=http_only];
Log::add_filter(Conn::LOG, filter);
}
This will result in a log file ``conn-http.log`` that contains only
traffic detected and analyzed as HTTP traffic.
Extending
---------
You can add further fields to a log stream by extending the record
type that defines its content. Let's say we want to add a boolean
field ``is_private`` to :bro:type:`Conn::Info` that indicates whether the
originator IP address is part of the :rfc:`1918` space:
.. code:: bro
# Add a field to the connection log record.
redef record Conn::Info += {
## Indicate if the originator of the connection is part of the
## "private" address space defined in RFC1918.
is_private: bool &default=F &log;
};
Now we need to set the field. A connection's summary is generated at
the time its state is removed from memory. We can add another handler
at that time that sets our field correctly:
.. code:: bro
event connection_state_remove(c: connection)
{
if ( c$id$orig_h in Site::private_address_space )
c$conn$is_private = T;
}
Now ``conn.log`` will show a new field ``is_private`` of type
``bool``.
Notes:
- For extending logs this way, one needs a bit of knowledge about how
the script that creates the log stream is organizing its state
keeping. Most of the standard Bro scripts attach their log state to
the :bro:type:`connection` record where it can then be accessed, just
as the ``c$conn`` above. For example, the HTTP analysis adds a field
``http`` of type :bro:type:`HTTP::Info` to the :bro:type:`connection`
record. See the script reference for more information.
- When extending records as shown above, the new fields must always be
declared either with a ``&default`` value or as ``&optional``.
Furthermore, you need to add the ``&log`` attribute or otherwise the
field won't appear in the output.
Hooking into the Logging
------------------------
Sometimes it is helpful to do additional analysis of the information
being logged. For these cases, a stream can specify an event that will
be generated every time a log record is written to it. All of Bro's
default log streams define such an event. For example, the connection
log stream raises the event :bro:id:`Conn::log_conn`. You
could use that for example for flagging when a connection to a
specific destination exceeds a certain duration:
.. code:: bro
redef enum Notice::Type += {
## Indicates that a connection remained established longer
## than 5 minutes.
Long_Conn_Found
};
event Conn::log_conn(rec: Conn::Info)
{
if ( rec$duration > 5mins )
NOTICE([$note=Long_Conn_Found,
$msg=fmt("unusually long conn to %s", rec$id$resp_h),
$id=rec$id]);
}
Often, these events can be an alternative to post-processing Bro logs
externally with Perl scripts. Much of what such an external script
would do later offline, one may instead do directly inside of Bro in
real-time.
Rotation
--------
By default, no log rotation occurs, but it's globally controllable for all
filters by redefining the :bro:id:`Log::default_rotation_interval` option:
.. code:: bro
redef Log::default_rotation_interval = 1 hr;
Or specifically for certain :bro:type:`Log::Filter` instances by setting
their ``interv`` field. Here's an example of changing just the
:bro:enum:`Conn::LOG` stream's default filter rotation.
.. code:: bro
event bro_init()
{
local f = Log::get_filter(Conn::LOG, "default");
f$interv = 1 min;
Log::remove_filter(Conn::LOG, "default");
Log::add_filter(Conn::LOG, f);
}
ASCII Writer Configuration
--------------------------
The ASCII writer has a number of options for customizing the format of
its output, see :doc:`scripts/base/frameworks/logging/writers/ascii`.
Adding Streams
==============
It's easy to create a new log stream for custom scripts. Here's an
example for the ``Foo`` module:
.. code:: bro
module Foo;
export {
# Create an ID for our new stream. By convention, this is
# called "LOG".
redef enum Log::ID += { LOG };
# Define the fields. By convention, the type is called "Info".
type Info: record {
ts: time &log;
id: conn_id &log;
};
# Define a hook event. By convention, this is called
# "log_<stream>".
global log_foo: event(rec: Info);
}
# This event should be handled at a higher priority so that when
# users modify your stream later and they do it at priority 0,
# their code runs after this.
event bro_init() &priority=5
{
# Create the stream. This also adds a default filter automatically.
Log::create_stream(Foo::LOG, [$columns=Info, $ev=log_foo]);
}
You can also add the state to the :bro:type:`connection` record to make
it easily accessible across event handlers:
.. code:: bro
redef record connection += {
foo: Info &optional;
}
Now you can use the :bro:id:`Log::write` method to output log records and
save the logged ``Foo::Info`` record into the connection record:
.. code:: bro
event connection_established(c: connection)
{
local rec: Foo::Info = [$ts=network_time(), $id=c$id];
c$foo = rec;
Log::write(Foo::LOG, rec);
}
See the existing scripts for how to work with such a new connection
field. A simple example is :doc:`scripts/base/protocols/syslog/main`.
When you are developing scripts that add data to the :bro:type:`connection`
record, care must be given to when and how long data is stored.
Normally data saved to the connection record will remain there for the
duration of the connection and from a practical perspective it's not
uncommon to need to delete that data before the end of the connection.
Other Writers
-------------
Bro supports the following output formats other than ASCII:
.. toctree::
:maxdepth: 1
logging-dataseries
logging-elasticsearch

357
doc/frameworks/notice.rst Normal file
View file

@ -0,0 +1,357 @@
================
Notice Framework
================
.. rst-class:: opening
One of the easiest ways to customize Bro is writing a local notice
policy. Bro can detect a large number of potentially interesting
situations, and the notice policy hook which of them the user wants to be
acted upon in some manner. In particular, the notice policy can specify
actions to be taken, such as sending an email or compiling regular
alarm emails. This page gives an introduction into writing such a notice
policy.
.. contents::
Overview
--------
Let's start with a little bit of background on Bro's philosophy on reporting
things. Bro ships with a large number of policy scripts which perform a wide
variety of analyses. Most of these scripts monitor for activity which might be
of interest for the user. However, none of these scripts determines the
importance of what it finds itself. Instead, the scripts only flag situations
as *potentially* interesting, leaving it to the local configuration to define
which of them are in fact actionable. This decoupling of detection and
reporting allows Bro to address the different needs that sites have.
Definitions of what constitutes an attack or even a compromise differ quite a
bit between environments, and activity deemed malicious at one site might be
fully acceptable at another.
Whenever one of Bro's analysis scripts sees something potentially
interesting it flags the situation by calling the :bro:see:`NOTICE`
function and giving it a single :bro:see:`Notice::Info` record. A Notice
has a :bro:see:`Notice::Type`, which reflects the kind of activity that
has been seen, and it is usually also augmented with further context
about the situation.
More information about raising notices can be found in the `Raising Notices`_
section.
Once a notice is raised, it can have any number of actions applied to it by
writing :bro:see:`Notice::policy` hooks which is described in the `Notice Policy`_
section below. Such actions can be to send a mail to the configured
address(es) or to simply ignore the notice. Currently, the following actions
are defined:
.. list-table::
:widths: 20 80
:header-rows: 1
* - Action
- Description
* - Notice::ACTION_LOG
- Write the notice to the :bro:see:`Notice::LOG` logging stream.
* - Notice::ACTION_ALARM
- Log into the :bro:see:`Notice::ALARM_LOG` stream which will rotate
hourly and email the contents to the email address or addresses
defined in the :bro:see:`Notice::mail_dest` variable.
* - Notice::ACTION_EMAIL
- Send the notice in an email to the email address or addresses given in
the :bro:see:`Notice::mail_dest` variable.
* - Notice::ACTION_PAGE
- Send an email to the email address or addresses given in the
:bro:see:`Notice::mail_page_dest` variable.
How these notice actions are applied to notices is discussed in the
`Notice Policy`_ and `Notice Policy Shortcuts`_ sections.
Processing Notices
------------------
Notice Policy
*************
The hook :bro:see:`Notice::policy` provides the mechanism for applying
actions and generally modifying the notice before it's sent onward to
the action plugins. Hooks can be thought of as multi-bodied functions
and using them looks very similar to handling events. The difference
is that they don't go through the event queue like events. Users should
directly make modifications to the :bro:see:`Notice::Info` record
given as the argument to the hook.
Here's a simple example which tells Bro to send an email for all notices of
type :bro:see:`SSH::Login` if the server is 10.0.0.1:
.. code:: bro
hook Notice::policy(n: Notice::Info)
{
if ( n$note == SSH::Login && n$id$resp_h == 10.0.0.1 )
add n$actions[Notice::ACTION_EMAIL];
}
.. note::
Keep in mind that the semantics of the SSH::Login notice are
such that it is only raised when Bro heuristically detects a successful
login. No apparently failed logins will raise this notice.
Hooks can also have priorities applied to order their execution like events
with a default priority of 0. Greater values are executed first. Setting
a hook body to run before default hook bodies might look like this:
.. code:: bro
hook Notice::policy(n: Notice::Info) &priority=5
{
if ( n$note == SSH::Login && n$id$resp_h == 10.0.0.1 )
add n$actions[Notice::ACTION_EMAIL];
}
Hooks can also abort later hook bodies with the ``break`` keyword. This
is primarily useful if one wants to completely preempt processing by
lower priority :bro:see:`Notice::policy` hooks.
Notice Policy Shortcuts
***********************
Although the notice framework provides a great deal of flexibility and
configurability there are many times that the full expressiveness isn't needed
and actually becomes a hindrance to achieving results. The framework provides
a default :bro:see:`Notice::policy` hook body as a way of giving users the
shortcuts to easily apply many common actions to notices.
These are implemented as sets and tables indexed with a
:bro:see:`Notice::Type` enum value. The following table shows and describes
all of the variables available for shortcut configuration of the notice
framework.
.. list-table::
:widths: 32 40
:header-rows: 1
* - Variable name
- Description
* - :bro:see:`Notice::ignored_types`
- Adding a :bro:see:`Notice::Type` to this set results in the notice
being ignored. It won't have any other action applied to it, not even
:bro:see:`Notice::ACTION_LOG`.
* - :bro:see:`Notice::emailed_types`
- Adding a :bro:see:`Notice::Type` to this set results in
:bro:see:`Notice::ACTION_EMAIL` being applied to the notices of
that type.
* - :bro:see:`Notice::alarmed_types`
- Adding a :bro:see:`Notice::Type` to this set results in
:bro:see:`Notice::ACTION_ALARM` being applied to the notices of
that type.
* - :bro:see:`Notice::not_suppressed_types`
- Adding a :bro:see:`Notice::Type` to this set results in that notice
no longer undergoing the normal notice suppression that would
take place. Be careful when using this in production it could
result in a dramatic increase in the number of notices being
processed.
* - :bro:see:`Notice::type_suppression_intervals`
- This is a table indexed on :bro:see:`Notice::Type` and yielding an
interval. It can be used as an easy way to extend the default
suppression interval for an entire :bro:see:`Notice::Type`
without having to create a whole :bro:see:`Notice::policy` entry
and setting the ``$suppress_for`` field.
Raising Notices
---------------
A script should raise a notice for any occurrence that a user may want
to be notified about or take action on. For example, whenever the base
SSH analysis scripts sees an SSH session where it is heuristically
guessed to be a successful login, it raises a Notice of the type
:bro:see:`SSH::Login`. The code in the base SSH analysis script looks
like this:
.. code:: bro
NOTICE([$note=SSH::Login,
$msg="Heuristically detected successful SSH login.",
$conn=c]);
:bro:see:`NOTICE` is a normal function in the global namespace which
wraps a function within the ``Notice`` namespace. It takes a single
argument of the :bro:see:`Notice::Info` record type. The most common
fields used when raising notices are described in the following table:
.. list-table::
:widths: 32 40
:header-rows: 1
* - Field name
- Description
* - ``$note``
- This field is required and is an enum value which represents the
notice type.
* - ``$msg``
- This is a human readable message which is meant to provide more
information about this particular instance of the notice type.
* - ``$sub``
- This is a sub-message meant for human readability but will
frequently also be used to contain data meant to be matched with the
``Notice::policy``.
* - ``$conn``
- If a connection record is available when the notice is being raised
and the notice represents some attribute of the connection, then the
connection record can be given here. Other fields such as ``$id`` and
``$src`` will automatically be populated from this value.
* - ``$id``
- If a conn_id record is available when the notice is being raised and
the notice represents some attribute of the connection, then the
connection can be given here. Other fields such as ``$src`` will
automatically be populated from this value.
* - ``$src``
- If the notice represents an attribute of a single host then it's
possible that only this field should be filled out to represent the
host that is being "noticed".
* - ``$n``
- This normally represents a number if the notice has to do with some
number. It's most frequently used for numeric tests in the
``Notice::policy`` for making policy decisions.
* - ``$identifier``
- This represents a unique identifier for this notice. This field is
described in more detail in the `Automated Suppression`_ section.
* - ``$suppress_for``
- This field can be set if there is a natural suppression interval for
the notice that may be different than the default value. The
value set to this field can also be modified by a user's
:bro:see:`Notice::policy` so the value is not set permanently
and unchangeably.
When writing Bro scripts which raise notices, some thought should be given to
what the notice represents and what data should be provided to give a consumer
of the notice the best information about the notice. If the notice is
representative of many connections and is an attribute of a host (e.g. a
scanning host) it probably makes most sense to fill out the ``$src`` field and
not give a connection or conn_id. If a notice is representative of a
connection attribute (e.g. an apparent SSH login) then it makes sense to fill
out either ``$conn`` or ``$id`` based on the data that is available when the
notice is raised. Using care when inserting data into a notice will make later
analysis easier when only the data to fully represent the occurrence that
raised the notice is available. If complete connection information is
available when an SSL server certificate is expiring, the logs will be very
confusing because the connection that the certificate was detected on is a
side topic to the fact that an expired certificate was detected. It's possible
in many cases that two or more separate notices may need to be generated. As
an example, one could be for the detection of the expired SSL certificate and
another could be for if the client decided to go ahead with the connection
neglecting the expired certificate.
Automated Suppression
---------------------
The notice framework supports suppression for notices if the author of the
script that is generating the notice has indicated to the notice framework how
to identify notices that are intrinsically the same. Identification of these
"intrinsically duplicate" notices is implemented with an optional field in
:bro:see:`Notice::Info` records named ``$identifier`` which is a simple string.
If the ``$identifier`` and ``$type`` fields are the same for two notices, the
notice framework actually considers them to be the same thing and can use that
information to suppress duplicates for a configurable period of time.
.. note::
If the ``$identifier`` is left out of a notice, no notice suppression
takes place due to the framework's inability to identify duplicates. This
could be completely legitimate usage if no notices could ever be
considered to be duplicates.
The ``$identifier`` field is typically comprised of several pieces of
data related to the notice that when combined represent a unique
instance of that notice. Here is an example of the script
:doc:`scripts/policy/protocols/ssl/validate-certs` raising a notice
for session negotiations where the certificate or certificate chain did
not validate successfully against the available certificate authority
certificates.
.. code:: bro
NOTICE([$note=SSL::Invalid_Server_Cert,
$msg=fmt("SSL certificate validation failed with (%s)", c$ssl$validation_status),
$sub=c$ssl$subject,
$conn=c,
$identifier=cat(c$id$resp_h,c$id$resp_p,c$ssl$validation_status,c$ssl$cert_hash)]);
In the above example you can see that the ``$identifier`` field contains a
string that is built from the responder IP address and port, the validation
status message, and the MD5 sum of the server certificate. Those fields in
particular are chosen because different SSL certificates could be seen on any
port of a host, certificates could fail validation for different reasons, and
multiple server certificates could be used on that combination of IP address
and port with the ``server_name`` SSL extension (explaining the addition of
the MD5 sum of the certificate). The result is that if a certificate fails
validation and all four pieces of data match (IP address, port, validation
status, and certificate hash) that particular notice won't be raised again for
the default suppression period.
Setting the ``$identifier`` field is left to those raising notices because
it's assumed that the script author who is raising the notice understands the
full problem set and edge cases of the notice which may not be readily
apparent to users. If users don't want the suppression to take place or simply
want a different interval, they can set a notice's suppression
interval to ``0secs`` or delete the value from the ``$identifier`` field in
a :bro:see:`Notice::policy` hook.
Extending Notice Framework
--------------------------
There are a couple of mechanism currently for extending the notice framework
and adding new capability.
Extending Notice Emails
***********************
If there is extra information that you would like to add to emails, that is
possible to add by writing :bro:see:`Notice::policy` hooks.
There is a field in the :bro:see:`Notice::Info` record named
``$email_body_sections`` which will be included verbatim when email is being
sent. An example of including some information from an HTTP request is
included below.
.. code:: bro
hook Notice::policy(n: Notice::Info)
{
if ( n?$conn && n$conn?$http && n$conn$http?$host )
n$email_body_sections[|email_body_sections|] = fmt("HTTP host header: %s", n$conn$http$host);
}
Cluster Considerations
----------------------
As a user/developer of Bro, the main cluster concern with the notice framework
is understanding what runs where. When a notice is generated on a worker, the
worker checks to see if the notice shoudl be suppressed based on information
locally maintained in the worker process. If it's not being
suppressed, the worker forwards the notice directly to the manager and does no more
local processing. The manager then runs the :bro:see:`Notice::policy` hook and
executes all of the actions determined to be run.

View file

@ -0,0 +1,394 @@
===================
Signature Framework
===================
.. rst-class:: opening
Bro relies primarily on its extensive scripting language for
defining and analyzing detection policies. In addition, however,
Bro also provides an independent *signature language* for doing
low-level, Snort-style pattern matching. While signatures are
*not* Bro's preferred detection tool, they sometimes come in handy
and are closer to what many people are familiar with from using
other NIDS. This page gives a brief overview on Bro's signatures
and covers some of their technical subtleties.
.. contents::
:depth: 2
Basics
======
Let's look at an example signature first:
.. code:: bro-sig
signature my-first-sig {
ip-proto == tcp
dst-port == 80
payload /.*root/
event "Found root!"
}
This signature asks Bro to match the regular expression ``.*root`` on
all TCP connections going to port 80. When the signature triggers, Bro
will raise an event :bro:id:`signature_match` of the form:
.. code:: bro
event signature_match(state: signature_state, msg: string, data: string)
Here, ``state`` contains more information on the connection that
triggered the match, ``msg`` is the string specified by the
signature's event statement (``Found root!``), and data is the last
piece of payload which triggered the pattern match.
To turn such :bro:id:`signature_match` events into actual alarms, you can
load Bro's :doc:`/scripts/base/frameworks/signatures/main` script.
This script contains a default event handler that raises
:bro:enum:`Signatures::Sensitive_Signature` :doc:`Notices <notice>`
(as well as others; see the beginning of the script).
As signatures are independent of Bro's policy scripts, they are put into
their own file(s). There are three ways to specify which files contain
signatures: By using the ``-s`` flag when you invoke Bro, or by
extending the Bro variable :bro:id:`signature_files` using the ``+=``
operator, or by using the ``@load-sigs`` directive inside a Bro script.
If a signature file is given without a full path, it is searched for
along the normal ``BROPATH``. Additionally, the ``@load-sigs``
directive can be used to load signature files in a path relative to the
Bro script in which it's placed, e.g. ``@load-sigs ./mysigs.sig`` will
expect that signature file in the same directory as the Bro script. The
default extension of the file name is ``.sig``, and Bro appends that
automatically when necessary.
Signature language
==================
Let's look at the format of a signature more closely. Each individual
signature has the format ``signature <id> { <attributes> }``. ``<id>``
is a unique label for the signature. There are two types of
attributes: *conditions* and *actions*. The conditions define when the
signature matches, while the actions declare what to do in the case of
a match. Conditions can be further divided into four types: *header*,
*content*, *dependency*, and *context*. We discuss these all in more
detail in the following.
Conditions
----------
Header Conditions
~~~~~~~~~~~~~~~~~
Header conditions limit the applicability of the signature to a subset
of traffic that contains matching packet headers. This type of matching
is performed only for the first packet of a connection.
There are pre-defined header conditions for some of the most used
header fields. All of them generally have the format ``<keyword> <cmp>
<value-list>``, where ``<keyword>`` names the header field; ``cmp`` is
one of ``==``, ``!=``, ``<``, ``<=``, ``>``, ``>=``; and
``<value-list>`` is a list of comma-separated values to compare
against. The following keywords are defined:
``src-ip``/``dst-ip <cmp> <address-list>``
Source and destination address, respectively. Addresses can be given
as IPv4 or IPv6 addresses or CIDR masks. For IPv6 addresses/masks
the colon-hexadecimal representation of the address must be enclosed
in square brackets (e.g. ``[fe80::1]`` or ``[fe80::0]/16``).
``src-port``/``dst-port <cmp> <int-list>``
Source and destination port, respectively.
``ip-proto <cmp> tcp|udp|icmp|icmp6|ip|ip6``
IPv4 header's Protocol field or the Next Header field of the final
IPv6 header (i.e. either Next Header field in the fixed IPv6 header
if no extension headers are present or that field from the last
extension header in the chain). Note that the IP-in-IP forms of
tunneling are automatically decapsulated by default and signatures
apply to only the inner-most packet, so specifying ``ip`` or ``ip6``
is a no-op.
For lists of multiple values, they are sequentially compared against
the corresponding header field. If at least one of the comparisons
evaluates to true, the whole header condition matches (exception: with
``!=``, the header condition only matches if all values differ).
In addition to these pre-defined header keywords, a general header
condition can be defined either as
.. code:: bro-sig
header <proto>[<offset>:<size>] [& <integer>] <cmp> <value-list>
This compares the value found at the given position of the packet header
with a list of values. ``offset`` defines the position of the value
within the header of the protocol defined by ``proto`` (which can be
``ip``, ``ip6``, ``tcp``, ``udp``, ``icmp`` or ``icmp6``). ``size`` is
either 1, 2, or 4 and specifies the value to have a size of this many
bytes. If the optional ``& <integer>`` is given, the packet's value is
first masked with the integer before it is compared to the value-list.
``cmp`` is one of ``==``, ``!=``, ``<``, ``<=``, ``>``, ``>=``.
``value-list`` is a list of comma-separated integers similar to those
described above. The integers within the list may be followed by an
additional ``/ mask`` where ``mask`` is a value from 0 to 32. This
corresponds to the CIDR notation for netmasks and is translated into a
corresponding bitmask applied to the packet's value prior to the
comparison (similar to the optional ``& integer``). IPv6 address values
are not allowed in the value-list, though you can still inspect any 1,
2, or 4 byte section of an IPv6 header using this keyword.
Putting it all together, this is an example condition that is
equivalent to ``dst-ip == 1.2.3.4/16, 5.6.7.8/24``:
.. code:: bro-sig
header ip[16:4] == 1.2.3.4/16, 5.6.7.8/24
Note that the analogous example for IPv6 isn't currently possible since
4 bytes is the max width of a value that can be compared.
Content Conditions
~~~~~~~~~~~~~~~~~~
Content conditions are defined by regular expressions. We
differentiate two kinds of content conditions: first, the expression
may be declared with the ``payload`` statement, in which case it is
matched against the raw payload of a connection (for reassembled TCP
streams) or of each packet (for ICMP, UDP, and non-reassembled TCP).
Second, it may be prefixed with an analyzer-specific label, in which
case the expression is matched against the data as extracted by the
corresponding analyzer.
A ``payload`` condition has the form:
.. code:: bro-sig
payload /<regular expression>/
Currently, the following analyzer-specific content conditions are
defined (note that the corresponding analyzer has to be activated by
loading its policy script):
``http-request /<regular expression>/``
The regular expression is matched against decoded URIs of HTTP
requests. Obsolete alias: ``http``.
``http-request-header /<regular expression>/``
The regular expression is matched against client-side HTTP headers.
``http-request-body /<regular expression>/``
The regular expression is matched against client-side bodys of
HTTP requests.
``http-reply-header /<regular expression>/``
The regular expression is matched against server-side HTTP headers.
``http-reply-body /<regular expression>/``
The regular expression is matched against server-side bodys of
HTTP replys.
``ftp /<regular expression>/``
The regular expression is matched against the command line input
of FTP sessions.
``finger /<regular expression>/``
The regular expression is matched against finger requests.
For example, ``http-request /.*(etc/(passwd|shadow)/`` matches any URI
containing either ``etc/passwd`` or ``etc/shadow``. To filter on request
types, e.g. ``GET``, use ``payload /GET /``.
Note that HTTP pipelining (that is, multiple HTTP transactions in a
single TCP connection) has some side effects on signature matches. If
multiple conditions are specified within a single signature, this
signature matches if all conditions are met by any HTTP transaction
(not necessarily always the same!) in a pipelined connection.
Dependency Conditions
~~~~~~~~~~~~~~~~~~~~~
To define dependencies between signatures, there are two conditions:
``requires-signature [!] <id>``
Defines the current signature to match only if the signature given
by ``id`` matches for the same connection. Using ``!`` negates the
condition: The current signature only matches if ``id`` does not
match for the same connection (using this defers the match
decision until the connection terminates).
``requires-reverse-signature [!] <id>``
Similar to ``requires-signature``, but ``id`` has to match for the
opposite direction of the same connection, compared to the current
signature. This allows to model the notion of requests and
replies.
Context Conditions
~~~~~~~~~~~~~~~~~~
Context conditions pass the match decision on to other components of
Bro. They are only evaluated if all other conditions have already
matched. The following context conditions are defined:
``eval <policy-function>``
The given policy function is called and has to return a boolean
confirming the match. If false is returned, no signature match is
going to be triggered. The function has to be of type ``function
cond(state: signature_state, data: string): bool``. Here,
``data`` may contain the most recent content chunk available at
the time the signature was matched. If no such chunk is available,
``data`` will be the empty string. See :bro:type:`signature_state`
for its definition.
``payload-size <cmp> <integer>``
Compares the integer to the size of the payload of a packet. For
reassembled TCP streams, the integer is compared to the size of
the first in-order payload chunk. Note that the latter is not very
well defined.
``same-ip``
Evaluates to true if the source address of the IP packets equals
its destination address.
``tcp-state <state-list>``
Imposes restrictions on the current TCP state of the connection.
``state-list`` is a comma-separated list of the keywords
``established`` (the three-way handshake has already been
performed), ``originator`` (the current data is send by the
originator of the connection), and ``responder`` (the current data
is send by the responder of the connection).
Actions
-------
Actions define what to do if a signature matches. Currently, there are
two actions defined:
``event <string>``
Raises a :bro:id:`signature_match` event. The event handler has the
following type:
.. code:: bro
event signature_match(state: signature_state, msg: string, data: string)
The given string is passed in as ``msg``, and data is the current
part of the payload that has eventually lead to the signature
match (this may be empty for signatures without content
conditions).
``enable <string>``
Enables the protocol analyzer ``<string>`` for the matching
connection (``"http"``, ``"ftp"``, etc.). This is used by Bro's
dynamic protocol detection to activate analyzers on the fly.
Things to keep in mind when writing signatures
==============================================
* Each signature is reported at most once for every connection,
further matches of the same signature are ignored.
* The content conditions perform pattern matching on elements
extracted from an application protocol dialogue. For example, ``http
/.*passwd/`` scans URLs requested within HTTP sessions. The thing to
keep in mind here is that these conditions only perform any matching
when the corresponding application analyzer is actually *active* for
a connection. Note that by default, analyzers are not enabled if the
corresponding Bro script has not been loaded. A good way to
double-check whether an analyzer "sees" a connection is checking its
log file for corresponding entries. If you cannot find the
connection in the analyzer's log, very likely the signature engine
has also not seen any application data.
* As the name indicates, the ``payload`` keyword matches on packet
*payload* only. You cannot use it to match on packet headers; use
the header conditions for that.
* For TCP connections, header conditions are only evaluated for the
*first packet from each endpoint*. If a header condition does not
match the initial packets, the signature will not trigger. Bro
optimizes for the most common application here, which is header
conditions selecting the connections to be examined more closely
with payload statements.
* For UDP and ICMP flows, the payload matching is done on a per-packet
basis; i.e., any content crossing packet boundaries will not be
found. For TCP connections, the matching semantics depend on whether
Bro is *reassembling* the connection (i.e., putting all of a
connection's packets in sequence). By default, Bro is reassembling
the first 1K of every TCP connection, which means that within this
window, matches will be found without regards to packet order or
boundaries (i.e., *stream-wise matching*).
* For performance reasons, by default Bro *stops matching* on a
connection after seeing 1K of payload; see the section on options
below for how to change this behaviour. The default was chosen with
Bro's main user of signatures in mind: dynamic protocol detection
works well even when examining just connection heads.
* Regular expressions are implicitly anchored, i.e., they work as if
prefixed with the ``^`` operator. For reassembled TCP connections,
they are anchored at the first byte of the payload *stream*. For all
other connections, they are anchored at the first payload byte of
each packet. To match at arbitrary positions, you can prefix the
regular expression with ``.*``, as done in the examples above.
* To match on non-ASCII characters, Bro's regular expressions support
the ``\x<hex>`` operator. CRs/LFs are not treated specially by the
signature engine and can be matched with ``\r`` and ``\n``,
respectively. Generally, Bro follows `flex's regular expression
syntax
<http://flex.sourceforge.net/manual/Patterns.html>`_.
See the DPD signatures in ``base/frameworks/dpd/dpd.sig`` for some examples
of fairly complex payload patterns.
* The data argument of the :bro:id:`signature_match` handler might not carry
the full text matched by the regular expression. Bro performs the
matching incrementally as packets come in; when the signature
eventually fires, it can only pass on the most recent chunk of data.
Options
=======
The following options control details of Bro's matching process:
``dpd_reassemble_first_packets: bool`` (default: ``T``)
If true, Bro reassembles the beginning of every TCP connection (of
up to ``dpd_buffer_size`` bytes, see below), to facilitate
reliable matching across packet boundaries. If false, only
connections are reassembled for which an application-layer
analyzer gets activated (e.g., by Bro's dynamic protocol
detection).
``dpd_match_only_beginning : bool`` (default: ``T``)
If true, Bro performs packet matching only within the initial
payload window of ``dpd_buffer_size``. If false, it keeps matching
on subsequent payload as well.
``dpd_buffer_size: count`` (default: ``1024``)
Defines the buffer size for the two preceding options. In
addition, this value determines the amount of bytes Bro buffers
for each connection in order to activate application analyzers
even after parts of the payload have already passed through. This
is needed by the dynamic protocol detection capability to defer
the decision which analyzers to use.
So, how about using Snort signatures with Bro?
==============================================
There was once a script, ``snort2bro``, that converted Snort
signatures automatically into Bro's signature syntax. However, in our
experience this didn't turn out to be a very useful thing to do
because by simply using Snort signatures, one can't benefit from the
additional capabilities that Bro provides; the approaches of the two
systems are just too different. We therefore stopped maintaining the
``snort2bro`` script, and there are now many newer Snort options which
it doesn't support. The script is now no longer part of the Bro
distribution.

View file

@ -5,50 +5,15 @@
Bro Documentation Bro Documentation
================= =================
Guides
------
.. toctree:: .. toctree::
:maxdepth: 1 :maxdepth: 2
INSTALL
upgrade
quickstart
faq
reporting-problems
xFrameworks
----------
.. toctree::
:maxdepth: 1
notice
logging
input
cluster
signatures
How-Tos
-------
.. toctree::
:maxdepth: 2
:numbered:
user-manual/index
reference/index
Just Testing
============
.. code:: bro
print "Hey Bro!"
.. btest:: test
@TEST-COPY-FILE: ${TRACES}/wikipedia.trace
@TEST-EXEC: btest-rst-cmd bro -r wikipedia.trace
@TEST-EXEC: btest-rst-cmd "cat http.log | bro-cut ts id.orig_h | head -5"
intro/index.rst
using/index.rst
scripting/index.rst
frameworks/index.rst
cluster/index.rst
scripts/index.rst
misc/index.rst
components/index.rst
indices/index.rst

7
doc/indices/index.rst Normal file
View file

@ -0,0 +1,7 @@
=======
Indices
=======
* :ref:`General Index <genindex>`
* :ref:`search`

13
doc/intro/index.rst Normal file
View file

@ -0,0 +1,13 @@
============
Introduction
============
.. toctree::
:maxdepth: 2
overview
quickstart
upgrade
reporting-problems

View file

@ -1,7 +1,5 @@
================== ==================
Language (Missing) Overview (Missing)
================== ==================

View file

@ -0,0 +1,194 @@
Reporting Problems
==================
.. rst-class:: opening
Here we summarize some steps to follow when you see Bro doing
something it shouldn't. To provide help, it is often crucial for
us to have a way of reliably reproducing the effect you're seeing.
Unfortunately, reproducing problems can be rather tricky with Bro
because more often than not, they occur only in either very rare
situations or only after Bro has been running for some time. In
particular, getting a small trace showing a specific effect can be
a real problem. In the following, we'll summarize some strategies
to this end.
Reporting Problems
------------------
Generally, when you encounter a problem with Bro, the best thing to do
is opening a new ticket in `Bro's issue tracker
<http://tracker.bro.org/>`__ and include information on how to
reproduce the issue. Ideally, your ticket should come with the
following:
* The Bro version you're using (if working directly from the git
repository, the branch and revision number.)
* The output you're seeing along with a description of what you'd expect
Bro to do instead.
* A *small* trace in `libpcap format <http://www.tcpdump.org>`__
demonstrating the effect (assuming the problem doesn't happen right
at startup already).
* The exact command-line you're using to run Bro with that trace. If
you can, please try to run the Bro binary directly from the command
line rather than using BroControl.
* Any non-standard scripts you're using (but please only those really
necessary; just a small code snippet triggering the problem would
be perfect).
* If you encounter a crash, information from the core dump, such as
the stack backtrace, can be very helpful. See below for more on
this.
How Do I Get a Trace File?
--------------------------
As Bro is usually running live, coming up with a small trace file that
reproduces a problem can turn out to be quite a challenge. Often it
works best to start with a large trace that triggers the problem,
and then successively thin it out as much as possible.
To get to the initial large trace, here are a few things you can try:
* Capture a trace with `tcpdump <http://www.tcpdump.org/>`__, either
on the same interface Bro is running on, or on another host where
you can generate traffic of the kind likely triggering the problem
(e.g., if you're seeing problems with the HTTP analyzer, record some
of your Web browsing on your desktop.) When using tcpdump, don't
forget to record *complete* packets (``tcpdump -s 0 ...``). You can
reduce the amount of traffic captured by using a suitable BPF filter
(e.g., for HTTP only, try ``port 80``).
* Bro's command-line option ``-w <trace>`` records all packets it
processes into the given file. You can then later run Bro
offline on this trace and it will process the packets in the same
way as it did live. This is particularly helpful with problems that
only occur after Bro has already been running for some time. For
example, sometimes a crash may be triggered by a particular kind of
traffic only occurring rarely. Running Bro live with ``-w`` and
then, after the crash, offline on the recorded trace might, with a
little bit of luck, reproduce the problem reliably. However, be
careful with ``-w``: it can result in huge trace files, quickly
filling up your disk. (One way to mitigate the space issues is to
periodically delete the trace file by configuring
``rotate-logs.bro`` accordingly. BroControl does that for you if you
set its ``SaveTraces`` option.)
* Finally, you can try running Bro on a publically available trace
file, such as `anonymized FTP traffic <http://www-nrg.ee.lbl.gov
/anonymized-traces.html>`__, `headers-only enterprise traffic
<http://www.icir.org/enterprise-tracing/Overview.html>`__, or
`Defcon traffic <http://cctf.shmoo.com/>`__. Some of these
particularly stress certain components of Bro (e.g., the Defcon
traces contain tons of scans).
Once you have a trace that demonstrates the effect, you will often
notice that it's pretty big, in particular if recorded from the link
you're monitoring. Therefore, the next step is to shrink its size as
much as possible. Here are a few things you can try to this end:
* Very often, a single connection is able to demonstrate the problem.
If you can identify which one it is (e.g., from one of Bro's
``*.log`` files) you can extract the connection's packets from the
trace using tcpdump by filtering for the corresponding 4-tuple of
addresses and ports:
.. console::
> tcpdump -r large.trace -w small.trace host <ip1> and port <port1> and host <ip2> and port <port2>
* If you can't reduce the problem to a connection, try to identify
either a host pair or a single host triggering it, and filter down
the trace accordingly.
* You can try to extract a smaller time slice from the trace using
`TCPslice <http://www.tcpdump.org/related.html>`__. For example, to
extract the first 100 seconds from the trace:
.. console::
# Test comment
> tcpslice +100 <in >out
Alternatively, tcpdump extracts the first ``n`` packets with its
option ``-c <n>``.
Getting More Information After a Crash
--------------------------------------
If Bro crashes, a *core dump* can be very helpful to nail down the
problem. Examining a core is not for the faint of heart but can reveal
extremely useful information.
First, you should configure Bro with the option ``--enable-debug`` and
recompile; this will disable all compiler optimizations and thus make
the core dump more useful (don't expect great performance with this
version though; compiling Bro without optimization has a noticeable
impact on its CPU usage.). Then enable core dumps if you haven't
already (e.g., ``ulimit -c unlimited`` if you're using bash).
Once Bro has crashed, start gdb with the Bro binary and the file
containing the core dump. (Alternatively, you can also run Bro
directly inside gdb instead of working from a core file.) The first
helpful information to include with your tracker ticket is a stack
backtrace, which you get with gdb's ``bt`` command:
.. console::
> gdb bro core
[...]
> bt
If the crash occurs inside Bro's script interpreter, the next thing to
do is identifying the line of script code processed just before the
abnormal termination. Look for methods in the stack backtrace which
belong to any of the script interpreter's classes. Roughly speaking,
these are all classes with names ending in ``Expr``, ``Stmt``, or
``Val``. Then climb up the stack with ``up`` until you reach the first
of these methods. The object to which ``this`` is pointing will have a
``Location`` object, which in turn contains the file name and line
number of the corresponding piece of script code. Continuing the
example from above, here's how to get that information:
.. console::
[in gdb]
> up
> ...
> up
> print this->location->filename
> print this->location->first_line
If the crash occurs while processing input packets but you cannot
directly tell which connection is responsible (and thus not extract
its packets from the trace as suggested above), try getting the
4-tuple of the connection currently being processed from the core dump
by again examining the stack backtrace, this time looking for methods
belonging to the ``Connection`` class. That class has members
``orig_addr``/``resp_addr`` and ``orig_port``/``resp_port`` storing
(pointers to) the IP addresses and ports respectively:
.. console::
[in gdb]
> up
> ...
> up
> printf "%08x:%04x %08x:%04x\n", *this->orig_addr, this->orig_port, *this->resp_addr, this->resp_port
Note that these values are stored in `network byte order
<http://en.wikipedia.org/wiki/Endianness#Endianness_in_networking>`__
so you will need to flip the bytes around if you are on a low-endian
machine (which is why the above example prints them in hex). For
example, if an IP address prints as ``0100007f`` , that's 127.0.0.1 .

308
doc/intro/upgrade.rst Normal file
View file

@ -0,0 +1,308 @@
==========================================
Upgrading From the Previous Version of Bro
==========================================
.. rst-class:: opening
This guide details specific differences between Bro versions
that may be important for users to know as they work on updating
their Bro deployment/configuration to the later version.
.. contents::
Upgrading From Bro 2.0 to 2.1
=============================
In Bro 2.1, IPv6 is enabled by default. Therefore, when building Bro from
source, the "--enable-brov6" configure option has been removed because it
is no longer relevant.
Other configure changes include renaming the "--enable-perftools" option
to "--enable-perftools-debug" to indicate that the option is only relevant
for debugging the heap. One other change involves what happens when
tcmalloc (part of Google perftools) is found at configure time. On Linux,
it will automatically be linked with Bro, but on other platforms you
need to use the "--enable-perftools" option to enable linking to tcmalloc.
There are a couple of changes to the Bro scripting language to better
support IPv6. First, IPv6 literals appearing in a Bro script must now be
enclosed in square brackets (for example, ``[fe80::db15]``). For subnet
literals, the slash "/" appears after the closing square bracket (for
example, ``[fe80:1234::]/32``). Second, when an IP address variable or IP
address literal is enclosed in pipes (for example, ``|[fe80::db15]|``) the
result is now the size of the address in bits (32 for IPv4 and 128 for IPv6).
In the Bro scripting language, "match" and "using" are no longer reserved
keywords.
Some built-in functions have been removed: "addr_to_count" (use
"addr_to_counts" instead), "bro_has_ipv6" (this is no longer relevant
because Bro now always supports IPv6), "active_connection" (use
"connection_exists" instead), and "connection_record" (use "lookup_connection"
instead).
The "NFS3::mode2string" built-in function has been renamed to "file_mode".
Some built-in functions have been changed: "exit" (now takes the exit code
as a parameter), "to_port" (now takes a string as parameter instead
of a count and transport protocol, but "count_to_port" is still available),
"connect" (now takes an additional string parameter specifying the zone of
a non-global IPv6 address), and "listen" (now takes three additional
parameters to enable listening on IPv6 addresses).
Some Bro script variables have been renamed: "LogAscii::header_prefix"
has been renamed to "LogAscii::meta_prefix", "LogAscii::include_header"
has been renamed to "LogAscii::include_meta".
Some Bro script variables have been removed: "tunnel_port",
"parse_udp_tunnels", "use_connection_compressor", "cc_handle_resets",
"cc_handle_only_syns", and "cc_instantiate_on_data".
A couple events have changed: the "icmp_redirect" event now includes
the target and destination addresses and any Neighbor Discovery options
in the message, and the last parameter of the "dns_AAAA_reply" event has
been removed because it was unused.
The format of the ASCII log files has changed very slightly. Two new lines
are automatically added, one to record the time when the log was opened,
and the other to record the time when the log was closed.
In BroControl, the option (in broctl.cfg) "CFlowAddr" was renamed
to "CFlowAddress".
Upgrading From Bro 1.5 to 2.0
=============================
As the version number jump suggests, Bro 2.0 is a major upgrade and
lots of things have changed. Most importantly, we have rewritten
almost all of Bro's default scripts from scratch, using quite
different structure now and focusing more on operational deployment.
The result is a system that works much better "out of the box", even
without much initial site-specific configuration. The down-side is
that 1.x configurations will need to be adapted to work with the new
version. The two rules of thumb are:
(1) If you have written your own Bro scripts
that do not depend on any of the standard scripts formerly
found in ``policy/``, they will most likely just keep working
(although you might want to adapt them to use some of the new
features, like the new logging framework; see below).
(2) If you have custom code that depends on specifics of 1.x
default scripts (including most configuration tuning), that is
unlikely to work with 2.x. We recommend to start by using just
the new scripts first, and then port over any customizations
incrementally as necessary (they may be much easier to do now,
or even unnecessary). Send mail to the Bro user mailing list
if you need help.
Below we summarize changes from 1.x to 2.x in more detail. This list
isn't complete, see the :download:`CHANGES <CHANGES>` file in the
distribution for the full story.
Default Scripts
===============
Organization
------------
In versions before 2.0, Bro scripts were all maintained in a flat
directory called ``policy/`` in the source tree. This directory is now
renamed to ``scripts/`` and contains major subdirectories ``base/``,
``policy/``, and ``site/``, each of which may also be subdivided
further.
The contents of the new ``scripts/`` directory, like the old/flat
``policy/`` still gets installed under the ``share/bro``
subdirectory of the installation prefix path just like previous
versions. For example, if Bro was compiled like ``./configure
--prefix=/usr/local/bro && make && make install``, then the script
hierarchy can be found in ``/usr/local/bro/share/bro``.
The main
subdirectories of that hierarchy are as follows:
- ``base/`` contains all scripts that are loaded by Bro by default
(unless the ``-b`` command line option is used to run Bro in a
minimal configuration). Note that is a major conceptual change:
rather than not loading anything by default, Bro now uses an
extensive set of default scripts out of the box.
The scripts under this directory generally either accumulate/log
useful state/protocol information for monitored traffic, configure a
default/recommended mode of operation, or provide extra Bro
scripting-layer functionality that has no significant performance cost.
- ``policy/`` contains all scripts that a user will need to explicitly
tell Bro to load. These are scripts that implement
functionality/analysis that not all users may want to use and may have
more significant performance costs. For a new installation, you
should go through these and see what appears useful to load.
- ``site/`` remains a directory that can be used to store locally
developed scripts. It now comes with some preinstalled example
scripts that contain recommended default configurations going beyond
the ``base/`` setup. E.g. ``local.bro`` loads extra scripts from
``policy/`` and does extra tuning. These files can be customized in
place without being overwritten by upgrades/reinstalls, unlike
scripts in other directories.
With version 2.0, the default ``BROPATH`` is set to automatically
search for scripts in ``policy/``, ``site/`` and their parent
directory, but **not** ``base/``. Generally, everything under
``base/`` is loaded automatically, but for users of the ``-b`` option,
it's important to know that loading a script in that directory
requires the extra ``base/`` path qualification. For example, the
following two scripts:
* ``$PREFIX/share/bro/base/protocols/ssl/main.bro``
* ``$PREFIX/share/bro/policy/protocols/ssl/validate-certs.bro``
are referenced from another Bro script like:
.. code:: bro
@load base/protocols/ssl/main
@load protocols/ssl/validate-certs
Notice how ``policy/`` can be omitted as a convenience in the second
case. ``@load`` can now also use relative path, e.g., ``@load
../main``.
Logging Framework
-----------------
- The logs generated by scripts that ship with Bro are entirely redone
to use a standardized, machine parsable format via the new logging
framework. Generally, the log content has been restructured towards
making it more directly useful to operations. Also, several
analyzers have been significantly extended and thus now log more
information. Take a look at ``ssl.log``.
* A particular format change that may be useful to note is that the
``conn.log`` ``service`` field is derived from DPD instead of
well-known ports (while that was already possible in 1.5, it was
not the default).
* Also, ``conn.log`` now reports raw number of packets/bytes per
endpoint.
- The new logging framework makes it possible to extend, customize,
and filter logs very easily. See the :doc:`logging framework <logging>`
for more information on usage.
- A common pattern found in the new scripts is to store logging stream
records for protocols inside the ``connection`` records so that
state can be collected until enough is seen to log a coherent unit
of information regarding the activity of that connection. This
state is now frequently seen/accessible in event handlers, for
example, like ``c$<protocol>`` where ``<protocol>`` is replaced by
the name of the protocol. This field is added to the ``connection``
record by ``redef``'ing it in a
``base/protocols/<protocol>/main.bro`` script.
- The logging code has been rewritten internally, with script-level
interface and output backend now clearly separated. While ASCII
logging is still the default, we will add further output types in
the future (binary format, direct database logging).
Notice Framework
----------------
The way users interact with "notices" has changed significantly in
order to make it easier to define a site policy and more extensible
for adding customized actions. See the :doc:`notice framework <notice>`.
New Default Settings
--------------------
- Dynamic Protocol Detection (DPD) is now enabled/loaded by default.
- The default packet filter now examines all packets instead of
dynamically building a filter based on which protocol analysis scripts
are loaded. See ``PacketFilter::all_packets`` for how to revert to old
behavior.
API Changes
-----------
- The ``@prefixes`` directive works differently now.
Any added prefixes are now searched for and loaded *after* all input
files have been parsed. After all input files are parsed, Bro
searches ``BROPATH`` for prefixed, flattened versions of all of the
parsed input files. For example, if ``lcl`` is in ``@prefixes``, and
``site.bro`` is loaded, then a file named ``lcl.site.bro`` that's in
``BROPATH`` would end up being automatically loaded as well. Packages
work similarly, e.g. loading ``protocols/http`` means a file named
``lcl.protocols.http.bro`` in ``BROPATH`` gets loaded automatically.
- The ``make_addr`` BIF now returns a ``subnet`` versus an ``addr``
Variable Naming
---------------
- ``Module`` is more widely used for namespacing. E.g. the new
``site.bro`` exports the ``local_nets`` identifier (among other
things) into the ``Site`` module.
- Identifiers may have been renamed to conform to new `scripting
conventions
<http://www.bro.org/development/script-conventions.html>`_
BroControl
==========
BroControl looks pretty much similar to the version coming with Bro 1.x,
but has been cleaned up and streamlined significantly internally.
BroControl has a new ``process`` command to process a trace on disk
offline using a similar configuration to what BroControl installs for
live analysis.
BroControl now has an extensive plugin interface for adding new
commands and options. Note that this is still considered experimental.
We have removed the ``analysis`` command, and BroControl currently
does not send daily alarm summaries anymore (this may be restored
later).
Removed Functionality
=====================
We have remove a bunch of functionality that was rarely used and/or
had not been maintained for a while already:
- The ``net`` script data type.
- The ``alarm`` statement; use the notice framework instead.
- Trace rewriting.
- DFA state expiration in regexp engine.
- Active mapping.
- Native DAG support (may come back eventually)
- ClamAV support.
- The connection compressor is now disabled by default, and will
be removed in the future.
Development Infrastructure
==========================
Bro development has moved from using SVN to Git for revision control.
Users that want to use the latest Bro development snapshot by checking it out
from the source repositories should see the `development process
<http://www.bro.org/development/process.html>`_. Note that all the various
sub-components now reside in their own repositories. However, the
top-level Bro repository includes them as git submodules so it's easy
to check them all out simultaneously.
Bro now uses `CMake <http://www.cmake.org>`_ for its build system so
that is a new required dependency when building from source.
Bro now comes with a growing suite of regression tests in
``testing/``.

102
doc/misc/geoip.rst Normal file
View file

@ -0,0 +1,102 @@
===========
GeoLocation
===========
.. rst-class:: opening
During the process of creating policy scripts the need may arise
to find the geographic location for an IP address. Bro has support
for the `GeoIP library <http://www.maxmind.com/app/c>`__ at the
policy script level beginning with release 1.3 to account for this
need.
.. contents::
GeoIPLite Database Installation
------------------------------------
A country database for GeoIPLite is included when you do the C API
install, but for Bro, we are using the city database which includes
cities and regions in addition to countries.
`Download <http://www.maxmind.com/app/geolitecity>`__ the geolitecity
binary database and follow the directions to install it.
FreeBSD Quick Install
---------------------
.. console::
pkg_add -r GeoIP
wget http://geolite.maxmind.com/download/geoip/database/GeoLiteCity.dat.gz
gunzip GeoLiteCity.dat.gz
mv GeoLiteCity.dat /usr/local/share/GeoIP/GeoIPCity.dat
# Set your environment correctly before running Bro's configure script
export CFLAGS=-I/usr/local/include
export LDFLAGS=-L/usr/local/lib
CentOS Quick Install
--------------------
.. console::
yum install GeoIP-devel
wget http://geolite.maxmind.com/download/geoip/database/GeoLiteCity.dat.gz
gunzip GeoLiteCity.dat.gz
mkdir -p /var/lib/GeoIP/
mv GeoLiteCity.dat /var/lib/GeoIP/GeoIPCity.dat
# Set your environment correctly before running Bro's configure script
export CFLAGS=-I/usr/local/include
export LDFLAGS=-L/usr/local/lib
Usage
-----
There is a single built in function that provides the GeoIP
functionality:
.. code:: bro
function lookup_location(a:addr): geo_location
There is also the ``geo_location`` data structure that is returned
from the ``lookup_location`` function:
.. code:: bro
type geo_location: record {
country_code: string;
region: string;
city: string;
latitude: double;
longitude: double;
};
Example
-------
To write a line in a log file for every ftp connection from hosts in
Ohio, this is now very easy:
.. code:: bro
global ftp_location_log: file = open_log_file("ftp-location");
event ftp_reply(c: connection, code: count, msg: string, cont_resp: bool)
{
local client = c$id$orig_h;
local loc = lookup_location(client);
if (loc$region == "OH" && loc$country_code == "US")
{
print ftp_location_log, fmt("FTP Connection from:%s (%s,%s,%s)", client, loc$city, loc$region, loc$country_code);
}
}

9
doc/misc/index.rst Normal file
View file

@ -0,0 +1,9 @@
====================
Miscellaneous Topics
====================
.. toctree::
:maxdepth: 2
geoip

View file

@ -1,5 +0,0 @@
================
Events (Missing)
================

View file

@ -1,5 +0,0 @@
====================
Frameworks (Missing)
====================

View file

@ -1,13 +0,0 @@
=========
Reference
=========
.. toctree::
:maxdepth: 2
:numbered:
frameworks.rst
events.rst
language.rst
subsystems.rst

View file

@ -1,4 +0,0 @@
====================
Subsystems (Missing)
====================

View file

@ -1,7 +1,7 @@
========= ===================
Scripting Writing Bro Scripts
========= ===================
.. toctree:: .. toctree::
:maxdepth: 2 :maxdepth: 2

View file

@ -1,8 +1,21 @@
.. This is a stub doc to which broxygen appends during the build process .. This is a stub doc to which broxygen appends during the build process
Index of All Individual Bro Scripts ================
=================================== Script Reference
================
.. toctree:: .. toctree::
:maxdepth: 1 :maxdepth: 1
builtins
bifs
scripts
packages
internal
Indices
=======
* `Notice Index <bro-noticeindex.html>`_

8
doc/scripts/scripts.rst Normal file
View file

@ -0,0 +1,8 @@
.. This is a stub doc to which broxygen appends during the build process
========================
Index of All Bro Scripts
========================
.. toctree::
:maxdepth: 1

View file

@ -1,13 +0,0 @@
===========
User Manual
===========
.. toctree::
:maxdepth: 2
:numbered:
intro.rst
quickstart.rst
scripting.rst

View file

@ -1,4 +0,0 @@
======================
Introduction (Missing)
======================

6
doc/using/index.rst Normal file
View file

@ -0,0 +1,6 @@
===================
Using Bro (Missing)
===================
TODO.