Fix minor typos in input framework doc

Also simplified the opening paragraph, and reformatted input text to fit
on 80-column display for better readability.
This commit is contained in:
Daniel Thayer 2012-07-05 12:59:19 -05:00
parent 8dc1e41876
commit cee78f8f5d

View file

@ -4,19 +4,13 @@ Loading Data into Bro with the Input Framework
.. rst-class:: opening .. rst-class:: opening
Bro now features a flexible input frameworks that allows users Bro now features a flexible input framework that allows users
to import data into Bro. Data is either read into Bro tables or to import data into Bro. Data is either read into Bro tables or
converted to events which can then be handled by scripts. converted to events which can then be handled by scripts.
This document gives an overview of how to use the input framework
The input framework is merged into the git master and we with some examples. For more complex scenarios it is
will give a short summary on how to use it. worthwhile to take a look at the unit tests in
The input framework is automatically compiled and installed ``testing/btest/scripts/base/frameworks/input/``.
together with Bro. The interface to it is exposed via the
scripting layer.
This document gives the most common examples. For more complex
scenarios it is worthwhile to take a look at the unit tests in
``testing/btest/scripts/base/frameworks/input/``.
.. contents:: .. contents::
@ -66,11 +60,12 @@ The two records are defined as:
reason: string; reason: string;
}; };
ote that the names of the fields in the record definitions have to correspond to Note that the names of the fields in the record definitions have to correspond
the column names listed in the '#fields' line of the log file, in this case 'ip', to the column names listed in the '#fields' line of the log file, in this
'timestamp', and 'reason'. case 'ip', 'timestamp', and 'reason'.
The log file is read into the table with a simple call of the add_table function: The log file is read into the table with a simple call of the ``add_table``
function:
.. code:: bro .. code:: bro
@ -80,7 +75,7 @@ The log file is read into the table with a simple call of the add_table function
Input::remove("blacklist"); Input::remove("blacklist");
With these three lines we first create an empty table that should contain the With these three lines we first create an empty table that should contain the
blacklist data and then instruct the Input framework to open an input stream blacklist data and then instruct the input framework to open an input stream
named ``blacklist`` to read the data into the table. The third line removes the named ``blacklist`` to read the data into the table. The third line removes the
input stream again, because we do not need it any more after the data has been input stream again, because we do not need it any more after the data has been
read. read.
@ -91,20 +86,20 @@ This thread opens the input data file, converts the data into a Bro format and
sends it back to the main Bro thread. sends it back to the main Bro thread.
Because of this, the data is not immediately accessible. Depending on the Because of this, the data is not immediately accessible. Depending on the
size of the data source it might take from a few milliseconds up to a few seconds size of the data source it might take from a few milliseconds up to a few
until all data is present in the table. Please note that this means that when Bro seconds until all data is present in the table. Please note that this means
is running without an input source or on very short captured files, it might terminate that when Bro is running without an input source or on very short captured
before the data is present in the system (because Bro already handled all packets files, it might terminate before the data is present in the system (because
before the import thread finished). Bro already handled all packets before the import thread finished).
Subsequent calls to an input source are queued until the previous action has been Subsequent calls to an input source are queued until the previous action has
completed. Because of this, it is, for example, possible to call ``add_table`` and been completed. Because of this, it is, for example, possible to call
``remove`` in two subsequent lines: the ``remove`` action will remain queued until ``add_table`` and ``remove`` in two subsequent lines: the ``remove`` action
the first read has been completed. will remain queued until the first read has been completed.
Once the input framework finishes reading from a data source, it fires the ``update_finished`` Once the input framework finishes reading from a data source, it fires
event. Once this event has been received all data from the input file is available the ``update_finished`` event. Once this event has been received all data
in the table. from the input file is available in the table.
.. code:: bro .. code:: bro
@ -113,10 +108,10 @@ in the table.
print blacklist; print blacklist;
} }
The table can also already be used while the data is still being read - it just might The table can also already be used while the data is still being read - it
not contain all lines in the input file when the event has not yet fired. After it has just might not contain all lines in the input file when the event has not
been populated it can be used like any other Bro table and blacklist entries easily be yet fired. After it has been populated it can be used like any other Bro
tested: table and blacklist entries can easily be tested:
.. code:: bro .. code:: bro
@ -128,13 +123,14 @@ Re-reading and streaming data
----------------------------- -----------------------------
For many data sources, like for many blacklists, the source data is continually For many data sources, like for many blacklists, the source data is continually
changing. For this cases, the Bro input framework supports several ways to changing. For these cases, the Bro input framework supports several ways to
deal with changing data files. deal with changing data files.
The first, very basic method is an explicit refresh of an input stream. When an input The first, very basic method is an explicit refresh of an input stream. When
stream is open, the function ``force_update`` can be called. This will trigger an input stream is open, the function ``force_update`` can be called. This
a complete refresh of the table; any changed elements from the file will be updated. will trigger a complete refresh of the table; any changed elements from the
After the update is finished the ``update_finished`` event will be raised. file will be updated. After the update is finished the ``update_finished``
event will be raised.
In our example the call would look like: In our example the call would look like:
@ -142,25 +138,26 @@ In our example the call would look like:
Input::force_update("blacklist"); Input::force_update("blacklist");
The input framework also supports two automatic refresh mode. The first mode The input framework also supports two automatic refresh modes. The first mode
continually checks if a file has been changed. If the file has been changed, it continually checks if a file has been changed. If the file has been changed, it
is re-read and the data in the Bro table is updated to reflect the current state. is re-read and the data in the Bro table is updated to reflect the current
Each time a change has been detected and all the new data has been read into the state. Each time a change has been detected and all the new data has been
table, the ``update_finished`` event is raised. read into the table, the ``update_finished`` event is raised.
The second mode is a streaming mode. This mode assumes that the source data file The second mode is a streaming mode. This mode assumes that the source data
is an append-only file to which new data is continually appended. Bro continually file is an append-only file to which new data is continually appended. Bro
checks for new data at the end of the file and will add the new data to the table. continually checks for new data at the end of the file and will add the new
If newer lines in the file have the same index as previous lines, they will overwrite data to the table. If newer lines in the file have the same index as previous
the values in the output table. lines, they will overwrite the values in the output table. Because of the
Because of the nature of streaming reads (data is continually added to the table), nature of streaming reads (data is continually added to the table),
the ``update_finished`` event is never raised when using streaming reads. the ``update_finished`` event is never raised when using streaming reads.
The reading mode can be selected by setting the ``mode`` option of the add_table call. The reading mode can be selected by setting the ``mode`` option of the
Valid values are ``MANUAL`` (the default), ``REREAD`` and ``STREAM``. add_table call. Valid values are ``MANUAL`` (the default), ``REREAD``
and ``STREAM``.
Hence, when using adding ``$mode=Input::REREAD`` to the previous example, the blacklists Hence, when adding ``$mode=Input::REREAD`` to the previous example, the
table will always reflect the state of the blacklist input file. blacklist table will always reflect the state of the blacklist input file.
.. code:: bro .. code:: bro
@ -169,11 +166,11 @@ table will always reflect the state of the blacklist input file.
Receiving change events Receiving change events
----------------------- -----------------------
When re-reading files, it might be interesting to know exactly which lines in the source When re-reading files, it might be interesting to know exactly which lines in
files have changed. the source files have changed.
For this reason, the input framework can raise an event each time when a data item is added to, For this reason, the input framework can raise an event each time when a data
removed from or changed in a table. item is added to, removed from or changed in a table.
The event definition looks like this: The event definition looks like this:
@ -189,34 +186,42 @@ The event has to be specified in ``$ev`` in the ``add_table`` call:
Input::add_table([$source="blacklist.file", $name="blacklist", $idx=Idx, $val=Val, $destination=blacklist, $mode=Input::REREAD, $ev=entry]); Input::add_table([$source="blacklist.file", $name="blacklist", $idx=Idx, $val=Val, $destination=blacklist, $mode=Input::REREAD, $ev=entry]);
The ``description`` field of the event contains the arguments that were originally supplied to the add_table call. The ``description`` field of the event contains the arguments that were
Hence, the name of the stream can, for example, be accessed with ``description$name``. ``tpe`` is an enum containing originally supplied to the add_table call. Hence, the name of the stream can,
the type of the change that occurred. for example, be accessed with ``description$name``. ``tpe`` is an enum
containing the type of the change that occurred.
It will contain ``Input::EVENT_NEW``, when a line that was not previously been If a line that was not previously present in the table has been added,
present in the table has been added. In this case ``left`` contains the Index of the added table entry and ``right`` contains then ``tpe`` will contain ``Input::EVENT_NEW``. In this case ``left`` contains
the values of the added entry. the index of the added table entry and ``right`` contains the values of the
added entry.
If a table entry that already was present is altered during the re-reading or streaming read of a file, ``tpe`` will contain If a table entry that already was present is altered during the re-reading or
``Input::EVENT_CHANGED``. In this case ``left`` contains the Index of the changed table entry and ``right`` contains the streaming read of a file, ``tpe`` will contain ``Input::EVENT_CHANGED``. In
values of the entry before the change. The reason for this is, that the table already has been updated when the event is this case ``left`` contains the index of the changed table entry and ``right``
raised. The current value in the table can be ascertained by looking up the current table value. Hence it is possible to compare contains the values of the entry before the change. The reason for this is
the new and the old value of the table. that the table already has been updated when the event is raised. The current
value in the table can be ascertained by looking up the current table value.
Hence it is possible to compare the new and the old values of the table.
``tpe`` contains ``Input::REMOVED``, when a table element is removed because it was no longer present during a re-read. If a table element is removed because it was no longer present during a
In this case ``left`` contains the index and ``right`` the values of the removed element. re-read, then ``tpe`` will contain ``Input::REMOVED``. In this case ``left``
contains the index and ``right`` the values of the removed element.
Filtering data during import Filtering data during import
---------------------------- ----------------------------
The input framework also allows a user to filter the data during the import. To this end, predicate functions are used. A predicate The input framework also allows a user to filter the data during the import.
function is called before a new element is added/changed/removed from a table. The predicate can either accept or veto To this end, predicate functions are used. A predicate function is called
the change by returning true for an accepted change and false for an rejected change. Furthermore, it can alter the data before a new element is added/changed/removed from a table. The predicate
can either accept or veto the change by returning true for an accepted
change and false for a rejected change. Furthermore, it can alter the data
before it is written to the table. before it is written to the table.
The following example filter will reject to add entries to the table when they were generated over a month ago. It The following example filter will reject to add entries to the table when
will accept all changes and all removals of values that are already present in the table. they were generated over a month ago. It will accept all changes and all
removals of values that are already present in the table.
.. code:: bro .. code:: bro
@ -228,34 +233,43 @@ will accept all changes and all removals of values that are already present in t
return ( ( current_time() - right$timestamp ) < (30 day) ); return ( ( current_time() - right$timestamp ) < (30 day) );
}]); }]);
To change elements while they are being imported, the predicate function can manipulate ``left`` and ``right``. Note To change elements while they are being imported, the predicate function can
that predicate functions are called before the change is committed to the table. Hence, when a table element is changed ( ``tpe`` manipulate ``left`` and ``right``. Note that predicate functions are called
is ``INPUT::EVENT_CHANGED`` ), ``left`` and ``right`` contain the new values, but the destination (``blacklist`` in our example) before the change is committed to the table. Hence, when a table element is
still contains the old values. This allows predicate functions to examine the changes between the old and the new version before changed (``tpe`` is ``INPUT::EVENT_CHANGED``), ``left`` and ``right``
deciding if they should be allowed. contain the new values, but the destination (``blacklist`` in our example)
still contains the old values. This allows predicate functions to examine
the changes between the old and the new version before deciding if they
should be allowed.
Different readers Different readers
----------------- -----------------
The input framework supports different kinds of readers for different kinds of source data files. At the moment, the default The input framework supports different kinds of readers for different kinds
reader reads ASCII files formatted in the Bro log-file-format (tab-separated values). At the moment, Bro comes with two of source data files. At the moment, the default reader reads ASCII files
other readers. The ``RAW`` reader reads a file that is split by a specified record separator (usually newline). The contents formatted in the Bro log file format (tab-separated values). At the moment,
are returned line-by-line as strings; it can, for example, be used to read configuration files and the like and is probably Bro comes with two other readers. The ``RAW`` reader reads a file that is
split by a specified record separator (usually newline). The contents are
returned line-by-line as strings; it can, for example, be used to read
configuration files and the like and is probably
only useful in the event mode and not for reading data to tables. only useful in the event mode and not for reading data to tables.
Another included reader is the ``BENCHMARK`` reader, which is being used to optimize the speed of the input framework. It Another included reader is the ``BENCHMARK`` reader, which is being used
can generate arbitrary amounts of semi-random data in all Bro data types supported by the input framework. to optimize the speed of the input framework. It can generate arbitrary
amounts of semi-random data in all Bro data types supported by the input
framework.
In the future, the input framework will get support for new data sources like, for example, different databases. In the future, the input framework will get support for new data sources
like, for example, different databases.
Add_table options Add_table options
----------------- -----------------
This section lists all possible options that can be used for the add_table function and gives This section lists all possible options that can be used for the add_table
a short explanation of their use. Most of the options already have been discussed in the function and gives a short explanation of their use. Most of the options
previous sections. already have been discussed in the previous sections.
The possible fields that can be set for an table stream are: The possible fields that can be set for a table stream are:
``source`` ``source``
A mandatory string identifying the source of the data. A mandatory string identifying the source of the data.
@ -266,51 +280,57 @@ The possible fields that can be set for an table stream are:
to manipulate it further. to manipulate it further.
``idx`` ``idx``
Record type that defines the index of the table Record type that defines the index of the table.
``val`` ``val``
Record type that defines the values of the table Record type that defines the values of the table.
``reader`` ``reader``
The reader used for this stream. Default is ``READER_ASCII``. The reader used for this stream. Default is ``READER_ASCII``.
``mode`` ``mode``
The mode in which the stream is opened. Possible values are ``MANUAL``, ``REREAD`` and ``STREAM``. The mode in which the stream is opened. Possible values are
Default is ``MANUAL``. ``MANUAL``, ``REREAD`` and ``STREAM``. Default is ``MANUAL``.
``MANUAL`` means, that the files is not updated after it has been read. Changes to the file will not ``MANUAL`` means that the file is not updated after it has
be reflected in the data Bro knows. been read. Changes to the file will not be reflected in the
``REREAD`` means that the whole file is read again each time a change is found. This should be used for data Bro knows. ``REREAD`` means that the whole file is read
files that are mapped to a table where individual lines can change. again each time a change is found. This should be used for
``STREAM`` means that the data from the file is streamed. Events / table entries will be generated as new files that are mapped to a table where individual lines can
data is added to the file. change. ``STREAM`` means that the data from the file is
streamed. Events / table entries will be generated as new
data is appended to the file.
``destination`` ``destination``
The destination table The destination table.
``ev`` ``ev``
Optional event that is raised, when values are added to, changed in or deleted from the table. Optional event that is raised, when values are added to,
Events are passed an Input::Event description as the first argument, the index record as the second argument changed in, or deleted from the table. Events are passed an
and the values as the third argument. Input::Event description as the first argument, the index
record as the second argument and the values as the third
argument.
``pred`` ``pred``
Optional predicate, that can prevent entries from being added to the table and events from being sent. Optional predicate, that can prevent entries from being added
to the table and events from being sent.
``want_record`` ``want_record``
Boolean value, that defines if the event wants to receive the fields inside of Boolean value, that defines if the event wants to receive the
a single record value, or individually (default). fields inside of a single record value, or individually
This can be used, if ``val`` is a record containing only one type. In this case, (default). This can be used if ``val`` is a record
if ``want_record`` is set to false, the table will contain elements of the type containing only one type. In this case, if ``want_record`` is
set to false, the table will contain elements of the type
contained in ``val``. contained in ``val``.
Reading data to events Reading Data to Events
====================== ======================
The second supported mode of the input framework is reading data to Bro events instead The second supported mode of the input framework is reading data to Bro
of reading them to a table using event streams. events instead of reading them to a table using event streams.
Event streams work very similarly to table streams that were already discussed in much Event streams work very similarly to table streams that were already
detail. To read the blacklist of the previous example into an event stream, the following discussed in much detail. To read the blacklist of the previous example
Bro code could be used: into an event stream, the following Bro code could be used:
.. code:: bro .. code:: bro
@ -329,14 +349,15 @@ Bro code could be used:
} }
The main difference in the declaration of the event stream is, that an event stream needs no The main difference in the declaration of the event stream is, that an event
separate index and value declarations -- instead, all source data types are provided in a single stream needs no separate index and value declarations -- instead, all source
record definition. data types are provided in a single record definition.
Apart from this, event streams work exactly the same as table streams and support most of the options Apart from this, event streams work exactly the same as table streams and
that are also supported for table streams. support most of the options that are also supported for table streams.
The options that can be set for when creating an event stream with ``add_event`` are: The options that can be set when creating an event stream with
``add_event`` are:
``source`` ``source``
A mandatory string identifying the source of the data. A mandatory string identifying the source of the data.
@ -347,35 +368,40 @@ The options that can be set for when creating an event stream with ``add_event``
to remove it. to remove it.
``fields`` ``fields``
Name of a record type containing the fields, which should be retrieved from Name of a record type containing the fields, which should be
the input stream. retrieved from the input stream.
``ev`` ``ev``
The event which is fired, after a line has been read from the input source. The event which is fired, after a line has been read from the
The first argument that is passed to the event is an Input::Event structure, input source. The first argument that is passed to the event
followed by the data, either inside of a record (if ``want_record is set``) or as is an Input::Event structure, followed by the data, either
individual fields. inside of a record (if ``want_record is set``) or as
The Input::Event structure can contain information, if the received line is ``NEW``, has individual fields. The Input::Event structure can contain
been ``CHANGED`` or ``DELETED``. Singe the ASCII reader cannot track this information information, if the received line is ``NEW``, has been
for event filters, the value is always ``NEW`` at the moment. ``CHANGED`` or ``DELETED``. Since the ASCII reader cannot
track this information for event filters, the value is
always ``NEW`` at the moment.
``mode`` ``mode``
The mode in which the stream is opened. Possible values are ``MANUAL``, ``REREAD`` and ``STREAM``. The mode in which the stream is opened. Possible values are
Default is ``MANUAL``. ``MANUAL``, ``REREAD`` and ``STREAM``. Default is ``MANUAL``.
``MANUAL`` means, that the files is not updated after it has been read. Changes to the file will not ``MANUAL`` means that the file is not updated after it has
be reflected in the data Bro knows. been read. Changes to the file will not be reflected in the
``REREAD`` means that the whole file is read again each time a change is found. This should be used for data Bro knows. ``REREAD`` means that the whole file is read
files that are mapped to a table where individual lines can change. again each time a change is found. This should be used for
``STREAM`` means that the data from the file is streamed. Events / table entries will be generated as new files that are mapped to a table where individual lines can
data is added to the file. change. ``STREAM`` means that the data from the file is
streamed. Events / table entries will be generated as new
data is appended to the file.
``reader`` ``reader``
The reader used for this stream. Default is ``READER_ASCII``. The reader used for this stream. Default is ``READER_ASCII``.
``want_record`` ``want_record``
Boolean value, that defines if the event wants to receive the fields inside of Boolean value, that defines if the event wants to receive the
a single record value, or individually (default). If this is set to true, the fields inside of a single record value, or individually
event will receive a single record of the type provided in ``fields``. (default). If this is set to true, the event will receive a
single record of the type provided in ``fields``.