mirror of
https://github.com/zeek/zeek.git
synced 2025-10-02 06:38:20 +00:00
Fix minor typos in input framework doc
Also simplified the opening paragraph, and reformatted input text to fit on 80-column display for better readability.
This commit is contained in:
parent
8dc1e41876
commit
cee78f8f5d
1 changed files with 167 additions and 141 deletions
308
doc/input.rst
308
doc/input.rst
|
@ -4,19 +4,13 @@ Loading Data into Bro with the Input Framework
|
|||
|
||||
.. rst-class:: opening
|
||||
|
||||
Bro now features a flexible input frameworks that allows users
|
||||
Bro now features a flexible input framework that allows users
|
||||
to import data into Bro. Data is either read into Bro tables or
|
||||
converted to events which can then be handled by scripts.
|
||||
|
||||
The input framework is merged into the git master and we
|
||||
will give a short summary on how to use it.
|
||||
The input framework is automatically compiled and installed
|
||||
together with Bro. The interface to it is exposed via the
|
||||
scripting layer.
|
||||
|
||||
This document gives the most common examples. For more complex
|
||||
scenarios it is worthwhile to take a look at the unit tests in
|
||||
``testing/btest/scripts/base/frameworks/input/``.
|
||||
This document gives an overview of how to use the input framework
|
||||
with some examples. For more complex scenarios it is
|
||||
worthwhile to take a look at the unit tests in
|
||||
``testing/btest/scripts/base/frameworks/input/``.
|
||||
|
||||
.. contents::
|
||||
|
||||
|
@ -66,11 +60,12 @@ The two records are defined as:
|
|||
reason: string;
|
||||
};
|
||||
|
||||
ote that the names of the fields in the record definitions have to correspond to
|
||||
the column names listed in the '#fields' line of the log file, in this case 'ip',
|
||||
'timestamp', and 'reason'.
|
||||
Note that the names of the fields in the record definitions have to correspond
|
||||
to the column names listed in the '#fields' line of the log file, in this
|
||||
case 'ip', 'timestamp', and 'reason'.
|
||||
|
||||
The log file is read into the table with a simple call of the add_table function:
|
||||
The log file is read into the table with a simple call of the ``add_table``
|
||||
function:
|
||||
|
||||
.. code:: bro
|
||||
|
||||
|
@ -80,7 +75,7 @@ The log file is read into the table with a simple call of the add_table function
|
|||
Input::remove("blacklist");
|
||||
|
||||
With these three lines we first create an empty table that should contain the
|
||||
blacklist data and then instruct the Input framework to open an input stream
|
||||
blacklist data and then instruct the input framework to open an input stream
|
||||
named ``blacklist`` to read the data into the table. The third line removes the
|
||||
input stream again, because we do not need it any more after the data has been
|
||||
read.
|
||||
|
@ -91,20 +86,20 @@ This thread opens the input data file, converts the data into a Bro format and
|
|||
sends it back to the main Bro thread.
|
||||
|
||||
Because of this, the data is not immediately accessible. Depending on the
|
||||
size of the data source it might take from a few milliseconds up to a few seconds
|
||||
until all data is present in the table. Please note that this means that when Bro
|
||||
is running without an input source or on very short captured files, it might terminate
|
||||
before the data is present in the system (because Bro already handled all packets
|
||||
before the import thread finished).
|
||||
size of the data source it might take from a few milliseconds up to a few
|
||||
seconds until all data is present in the table. Please note that this means
|
||||
that when Bro is running without an input source or on very short captured
|
||||
files, it might terminate before the data is present in the system (because
|
||||
Bro already handled all packets before the import thread finished).
|
||||
|
||||
Subsequent calls to an input source are queued until the previous action has been
|
||||
completed. Because of this, it is, for example, possible to call ``add_table`` and
|
||||
``remove`` in two subsequent lines: the ``remove`` action will remain queued until
|
||||
the first read has been completed.
|
||||
Subsequent calls to an input source are queued until the previous action has
|
||||
been completed. Because of this, it is, for example, possible to call
|
||||
``add_table`` and ``remove`` in two subsequent lines: the ``remove`` action
|
||||
will remain queued until the first read has been completed.
|
||||
|
||||
Once the input framework finishes reading from a data source, it fires the ``update_finished``
|
||||
event. Once this event has been received all data from the input file is available
|
||||
in the table.
|
||||
Once the input framework finishes reading from a data source, it fires
|
||||
the ``update_finished`` event. Once this event has been received all data
|
||||
from the input file is available in the table.
|
||||
|
||||
.. code:: bro
|
||||
|
||||
|
@ -113,10 +108,10 @@ in the table.
|
|||
print blacklist;
|
||||
}
|
||||
|
||||
The table can also already be used while the data is still being read - it just might
|
||||
not contain all lines in the input file when the event has not yet fired. After it has
|
||||
been populated it can be used like any other Bro table and blacklist entries easily be
|
||||
tested:
|
||||
The table can also already be used while the data is still being read - it
|
||||
just might not contain all lines in the input file when the event has not
|
||||
yet fired. After it has been populated it can be used like any other Bro
|
||||
table and blacklist entries can easily be tested:
|
||||
|
||||
.. code:: bro
|
||||
|
||||
|
@ -128,13 +123,14 @@ Re-reading and streaming data
|
|||
-----------------------------
|
||||
|
||||
For many data sources, like for many blacklists, the source data is continually
|
||||
changing. For this cases, the Bro input framework supports several ways to
|
||||
changing. For these cases, the Bro input framework supports several ways to
|
||||
deal with changing data files.
|
||||
|
||||
The first, very basic method is an explicit refresh of an input stream. When an input
|
||||
stream is open, the function ``force_update`` can be called. This will trigger
|
||||
a complete refresh of the table; any changed elements from the file will be updated.
|
||||
After the update is finished the ``update_finished`` event will be raised.
|
||||
The first, very basic method is an explicit refresh of an input stream. When
|
||||
an input stream is open, the function ``force_update`` can be called. This
|
||||
will trigger a complete refresh of the table; any changed elements from the
|
||||
file will be updated. After the update is finished the ``update_finished``
|
||||
event will be raised.
|
||||
|
||||
In our example the call would look like:
|
||||
|
||||
|
@ -142,25 +138,26 @@ In our example the call would look like:
|
|||
|
||||
Input::force_update("blacklist");
|
||||
|
||||
The input framework also supports two automatic refresh mode. The first mode
|
||||
The input framework also supports two automatic refresh modes. The first mode
|
||||
continually checks if a file has been changed. If the file has been changed, it
|
||||
is re-read and the data in the Bro table is updated to reflect the current state.
|
||||
Each time a change has been detected and all the new data has been read into the
|
||||
table, the ``update_finished`` event is raised.
|
||||
is re-read and the data in the Bro table is updated to reflect the current
|
||||
state. Each time a change has been detected and all the new data has been
|
||||
read into the table, the ``update_finished`` event is raised.
|
||||
|
||||
The second mode is a streaming mode. This mode assumes that the source data file
|
||||
is an append-only file to which new data is continually appended. Bro continually
|
||||
checks for new data at the end of the file and will add the new data to the table.
|
||||
If newer lines in the file have the same index as previous lines, they will overwrite
|
||||
the values in the output table.
|
||||
Because of the nature of streaming reads (data is continually added to the table),
|
||||
The second mode is a streaming mode. This mode assumes that the source data
|
||||
file is an append-only file to which new data is continually appended. Bro
|
||||
continually checks for new data at the end of the file and will add the new
|
||||
data to the table. If newer lines in the file have the same index as previous
|
||||
lines, they will overwrite the values in the output table. Because of the
|
||||
nature of streaming reads (data is continually added to the table),
|
||||
the ``update_finished`` event is never raised when using streaming reads.
|
||||
|
||||
The reading mode can be selected by setting the ``mode`` option of the add_table call.
|
||||
Valid values are ``MANUAL`` (the default), ``REREAD`` and ``STREAM``.
|
||||
The reading mode can be selected by setting the ``mode`` option of the
|
||||
add_table call. Valid values are ``MANUAL`` (the default), ``REREAD``
|
||||
and ``STREAM``.
|
||||
|
||||
Hence, when using adding ``$mode=Input::REREAD`` to the previous example, the blacklists
|
||||
table will always reflect the state of the blacklist input file.
|
||||
Hence, when adding ``$mode=Input::REREAD`` to the previous example, the
|
||||
blacklist table will always reflect the state of the blacklist input file.
|
||||
|
||||
.. code:: bro
|
||||
|
||||
|
@ -169,11 +166,11 @@ table will always reflect the state of the blacklist input file.
|
|||
Receiving change events
|
||||
-----------------------
|
||||
|
||||
When re-reading files, it might be interesting to know exactly which lines in the source
|
||||
files have changed.
|
||||
When re-reading files, it might be interesting to know exactly which lines in
|
||||
the source files have changed.
|
||||
|
||||
For this reason, the input framework can raise an event each time when a data item is added to,
|
||||
removed from or changed in a table.
|
||||
For this reason, the input framework can raise an event each time when a data
|
||||
item is added to, removed from or changed in a table.
|
||||
|
||||
The event definition looks like this:
|
||||
|
||||
|
@ -189,34 +186,42 @@ The event has to be specified in ``$ev`` in the ``add_table`` call:
|
|||
|
||||
Input::add_table([$source="blacklist.file", $name="blacklist", $idx=Idx, $val=Val, $destination=blacklist, $mode=Input::REREAD, $ev=entry]);
|
||||
|
||||
The ``description`` field of the event contains the arguments that were originally supplied to the add_table call.
|
||||
Hence, the name of the stream can, for example, be accessed with ``description$name``. ``tpe`` is an enum containing
|
||||
the type of the change that occurred.
|
||||
The ``description`` field of the event contains the arguments that were
|
||||
originally supplied to the add_table call. Hence, the name of the stream can,
|
||||
for example, be accessed with ``description$name``. ``tpe`` is an enum
|
||||
containing the type of the change that occurred.
|
||||
|
||||
It will contain ``Input::EVENT_NEW``, when a line that was not previously been
|
||||
present in the table has been added. In this case ``left`` contains the Index of the added table entry and ``right`` contains
|
||||
the values of the added entry.
|
||||
If a line that was not previously present in the table has been added,
|
||||
then ``tpe`` will contain ``Input::EVENT_NEW``. In this case ``left`` contains
|
||||
the index of the added table entry and ``right`` contains the values of the
|
||||
added entry.
|
||||
|
||||
If a table entry that already was present is altered during the re-reading or streaming read of a file, ``tpe`` will contain
|
||||
``Input::EVENT_CHANGED``. In this case ``left`` contains the Index of the changed table entry and ``right`` contains the
|
||||
values of the entry before the change. The reason for this is, that the table already has been updated when the event is
|
||||
raised. The current value in the table can be ascertained by looking up the current table value. Hence it is possible to compare
|
||||
the new and the old value of the table.
|
||||
If a table entry that already was present is altered during the re-reading or
|
||||
streaming read of a file, ``tpe`` will contain ``Input::EVENT_CHANGED``. In
|
||||
this case ``left`` contains the index of the changed table entry and ``right``
|
||||
contains the values of the entry before the change. The reason for this is
|
||||
that the table already has been updated when the event is raised. The current
|
||||
value in the table can be ascertained by looking up the current table value.
|
||||
Hence it is possible to compare the new and the old values of the table.
|
||||
|
||||
``tpe`` contains ``Input::REMOVED``, when a table element is removed because it was no longer present during a re-read.
|
||||
In this case ``left`` contains the index and ``right`` the values of the removed element.
|
||||
If a table element is removed because it was no longer present during a
|
||||
re-read, then ``tpe`` will contain ``Input::REMOVED``. In this case ``left``
|
||||
contains the index and ``right`` the values of the removed element.
|
||||
|
||||
|
||||
Filtering data during import
|
||||
----------------------------
|
||||
|
||||
The input framework also allows a user to filter the data during the import. To this end, predicate functions are used. A predicate
|
||||
function is called before a new element is added/changed/removed from a table. The predicate can either accept or veto
|
||||
the change by returning true for an accepted change and false for an rejected change. Furthermore, it can alter the data
|
||||
The input framework also allows a user to filter the data during the import.
|
||||
To this end, predicate functions are used. A predicate function is called
|
||||
before a new element is added/changed/removed from a table. The predicate
|
||||
can either accept or veto the change by returning true for an accepted
|
||||
change and false for a rejected change. Furthermore, it can alter the data
|
||||
before it is written to the table.
|
||||
|
||||
The following example filter will reject to add entries to the table when they were generated over a month ago. It
|
||||
will accept all changes and all removals of values that are already present in the table.
|
||||
The following example filter will reject to add entries to the table when
|
||||
they were generated over a month ago. It will accept all changes and all
|
||||
removals of values that are already present in the table.
|
||||
|
||||
.. code:: bro
|
||||
|
||||
|
@ -228,34 +233,43 @@ will accept all changes and all removals of values that are already present in t
|
|||
return ( ( current_time() - right$timestamp ) < (30 day) );
|
||||
}]);
|
||||
|
||||
To change elements while they are being imported, the predicate function can manipulate ``left`` and ``right``. Note
|
||||
that predicate functions are called before the change is committed to the table. Hence, when a table element is changed ( ``tpe``
|
||||
is ``INPUT::EVENT_CHANGED`` ), ``left`` and ``right`` contain the new values, but the destination (``blacklist`` in our example)
|
||||
still contains the old values. This allows predicate functions to examine the changes between the old and the new version before
|
||||
deciding if they should be allowed.
|
||||
To change elements while they are being imported, the predicate function can
|
||||
manipulate ``left`` and ``right``. Note that predicate functions are called
|
||||
before the change is committed to the table. Hence, when a table element is
|
||||
changed (``tpe`` is ``INPUT::EVENT_CHANGED``), ``left`` and ``right``
|
||||
contain the new values, but the destination (``blacklist`` in our example)
|
||||
still contains the old values. This allows predicate functions to examine
|
||||
the changes between the old and the new version before deciding if they
|
||||
should be allowed.
|
||||
|
||||
Different readers
|
||||
-----------------
|
||||
|
||||
The input framework supports different kinds of readers for different kinds of source data files. At the moment, the default
|
||||
reader reads ASCII files formatted in the Bro log-file-format (tab-separated values). At the moment, Bro comes with two
|
||||
other readers. The ``RAW`` reader reads a file that is split by a specified record separator (usually newline). The contents
|
||||
are returned line-by-line as strings; it can, for example, be used to read configuration files and the like and is probably
|
||||
The input framework supports different kinds of readers for different kinds
|
||||
of source data files. At the moment, the default reader reads ASCII files
|
||||
formatted in the Bro log file format (tab-separated values). At the moment,
|
||||
Bro comes with two other readers. The ``RAW`` reader reads a file that is
|
||||
split by a specified record separator (usually newline). The contents are
|
||||
returned line-by-line as strings; it can, for example, be used to read
|
||||
configuration files and the like and is probably
|
||||
only useful in the event mode and not for reading data to tables.
|
||||
|
||||
Another included reader is the ``BENCHMARK`` reader, which is being used to optimize the speed of the input framework. It
|
||||
can generate arbitrary amounts of semi-random data in all Bro data types supported by the input framework.
|
||||
Another included reader is the ``BENCHMARK`` reader, which is being used
|
||||
to optimize the speed of the input framework. It can generate arbitrary
|
||||
amounts of semi-random data in all Bro data types supported by the input
|
||||
framework.
|
||||
|
||||
In the future, the input framework will get support for new data sources like, for example, different databases.
|
||||
In the future, the input framework will get support for new data sources
|
||||
like, for example, different databases.
|
||||
|
||||
Add_table options
|
||||
-----------------
|
||||
|
||||
This section lists all possible options that can be used for the add_table function and gives
|
||||
a short explanation of their use. Most of the options already have been discussed in the
|
||||
previous sections.
|
||||
This section lists all possible options that can be used for the add_table
|
||||
function and gives a short explanation of their use. Most of the options
|
||||
already have been discussed in the previous sections.
|
||||
|
||||
The possible fields that can be set for an table stream are:
|
||||
The possible fields that can be set for a table stream are:
|
||||
|
||||
``source``
|
||||
A mandatory string identifying the source of the data.
|
||||
|
@ -266,51 +280,57 @@ The possible fields that can be set for an table stream are:
|
|||
to manipulate it further.
|
||||
|
||||
``idx``
|
||||
Record type that defines the index of the table
|
||||
Record type that defines the index of the table.
|
||||
|
||||
``val``
|
||||
Record type that defines the values of the table
|
||||
Record type that defines the values of the table.
|
||||
|
||||
``reader``
|
||||
``reader``
|
||||
The reader used for this stream. Default is ``READER_ASCII``.
|
||||
|
||||
``mode``
|
||||
The mode in which the stream is opened. Possible values are ``MANUAL``, ``REREAD`` and ``STREAM``.
|
||||
Default is ``MANUAL``.
|
||||
``MANUAL`` means, that the files is not updated after it has been read. Changes to the file will not
|
||||
be reflected in the data Bro knows.
|
||||
``REREAD`` means that the whole file is read again each time a change is found. This should be used for
|
||||
files that are mapped to a table where individual lines can change.
|
||||
``STREAM`` means that the data from the file is streamed. Events / table entries will be generated as new
|
||||
data is added to the file.
|
||||
The mode in which the stream is opened. Possible values are
|
||||
``MANUAL``, ``REREAD`` and ``STREAM``. Default is ``MANUAL``.
|
||||
``MANUAL`` means that the file is not updated after it has
|
||||
been read. Changes to the file will not be reflected in the
|
||||
data Bro knows. ``REREAD`` means that the whole file is read
|
||||
again each time a change is found. This should be used for
|
||||
files that are mapped to a table where individual lines can
|
||||
change. ``STREAM`` means that the data from the file is
|
||||
streamed. Events / table entries will be generated as new
|
||||
data is appended to the file.
|
||||
|
||||
``destination``
|
||||
The destination table
|
||||
The destination table.
|
||||
|
||||
``ev``
|
||||
Optional event that is raised, when values are added to, changed in or deleted from the table.
|
||||
Events are passed an Input::Event description as the first argument, the index record as the second argument
|
||||
and the values as the third argument.
|
||||
Optional event that is raised, when values are added to,
|
||||
changed in, or deleted from the table. Events are passed an
|
||||
Input::Event description as the first argument, the index
|
||||
record as the second argument and the values as the third
|
||||
argument.
|
||||
|
||||
``pred``
|
||||
Optional predicate, that can prevent entries from being added to the table and events from being sent.
|
||||
Optional predicate, that can prevent entries from being added
|
||||
to the table and events from being sent.
|
||||
|
||||
``want_record``
|
||||
Boolean value, that defines if the event wants to receive the fields inside of
|
||||
a single record value, or individually (default).
|
||||
This can be used, if ``val`` is a record containing only one type. In this case,
|
||||
if ``want_record`` is set to false, the table will contain elements of the type
|
||||
Boolean value, that defines if the event wants to receive the
|
||||
fields inside of a single record value, or individually
|
||||
(default). This can be used if ``val`` is a record
|
||||
containing only one type. In this case, if ``want_record`` is
|
||||
set to false, the table will contain elements of the type
|
||||
contained in ``val``.
|
||||
|
||||
Reading data to events
|
||||
Reading Data to Events
|
||||
======================
|
||||
|
||||
The second supported mode of the input framework is reading data to Bro events instead
|
||||
of reading them to a table using event streams.
|
||||
The second supported mode of the input framework is reading data to Bro
|
||||
events instead of reading them to a table using event streams.
|
||||
|
||||
Event streams work very similarly to table streams that were already discussed in much
|
||||
detail. To read the blacklist of the previous example into an event stream, the following
|
||||
Bro code could be used:
|
||||
Event streams work very similarly to table streams that were already
|
||||
discussed in much detail. To read the blacklist of the previous example
|
||||
into an event stream, the following Bro code could be used:
|
||||
|
||||
.. code:: bro
|
||||
|
||||
|
@ -329,14 +349,15 @@ Bro code could be used:
|
|||
}
|
||||
|
||||
|
||||
The main difference in the declaration of the event stream is, that an event stream needs no
|
||||
separate index and value declarations -- instead, all source data types are provided in a single
|
||||
record definition.
|
||||
The main difference in the declaration of the event stream is, that an event
|
||||
stream needs no separate index and value declarations -- instead, all source
|
||||
data types are provided in a single record definition.
|
||||
|
||||
Apart from this, event streams work exactly the same as table streams and support most of the options
|
||||
that are also supported for table streams.
|
||||
Apart from this, event streams work exactly the same as table streams and
|
||||
support most of the options that are also supported for table streams.
|
||||
|
||||
The options that can be set for when creating an event stream with ``add_event`` are:
|
||||
The options that can be set when creating an event stream with
|
||||
``add_event`` are:
|
||||
|
||||
``source``
|
||||
A mandatory string identifying the source of the data.
|
||||
|
@ -347,35 +368,40 @@ The options that can be set for when creating an event stream with ``add_event``
|
|||
to remove it.
|
||||
|
||||
``fields``
|
||||
Name of a record type containing the fields, which should be retrieved from
|
||||
the input stream.
|
||||
Name of a record type containing the fields, which should be
|
||||
retrieved from the input stream.
|
||||
|
||||
``ev``
|
||||
The event which is fired, after a line has been read from the input source.
|
||||
The first argument that is passed to the event is an Input::Event structure,
|
||||
followed by the data, either inside of a record (if ``want_record is set``) or as
|
||||
individual fields.
|
||||
The Input::Event structure can contain information, if the received line is ``NEW``, has
|
||||
been ``CHANGED`` or ``DELETED``. Singe the ASCII reader cannot track this information
|
||||
for event filters, the value is always ``NEW`` at the moment.
|
||||
The event which is fired, after a line has been read from the
|
||||
input source. The first argument that is passed to the event
|
||||
is an Input::Event structure, followed by the data, either
|
||||
inside of a record (if ``want_record is set``) or as
|
||||
individual fields. The Input::Event structure can contain
|
||||
information, if the received line is ``NEW``, has been
|
||||
``CHANGED`` or ``DELETED``. Since the ASCII reader cannot
|
||||
track this information for event filters, the value is
|
||||
always ``NEW`` at the moment.
|
||||
|
||||
``mode``
|
||||
The mode in which the stream is opened. Possible values are ``MANUAL``, ``REREAD`` and ``STREAM``.
|
||||
Default is ``MANUAL``.
|
||||
``MANUAL`` means, that the files is not updated after it has been read. Changes to the file will not
|
||||
be reflected in the data Bro knows.
|
||||
``REREAD`` means that the whole file is read again each time a change is found. This should be used for
|
||||
files that are mapped to a table where individual lines can change.
|
||||
``STREAM`` means that the data from the file is streamed. Events / table entries will be generated as new
|
||||
data is added to the file.
|
||||
The mode in which the stream is opened. Possible values are
|
||||
``MANUAL``, ``REREAD`` and ``STREAM``. Default is ``MANUAL``.
|
||||
``MANUAL`` means that the file is not updated after it has
|
||||
been read. Changes to the file will not be reflected in the
|
||||
data Bro knows. ``REREAD`` means that the whole file is read
|
||||
again each time a change is found. This should be used for
|
||||
files that are mapped to a table where individual lines can
|
||||
change. ``STREAM`` means that the data from the file is
|
||||
streamed. Events / table entries will be generated as new
|
||||
data is appended to the file.
|
||||
|
||||
``reader``
|
||||
The reader used for this stream. Default is ``READER_ASCII``.
|
||||
|
||||
``want_record``
|
||||
Boolean value, that defines if the event wants to receive the fields inside of
|
||||
a single record value, or individually (default). If this is set to true, the
|
||||
event will receive a single record of the type provided in ``fields``.
|
||||
Boolean value, that defines if the event wants to receive the
|
||||
fields inside of a single record value, or individually
|
||||
(default). If this is set to true, the event will receive a
|
||||
single record of the type provided in ``fields``.
|
||||
|
||||
|
||||
|
||||
|
|
Loading…
Add table
Add a link
Reference in a new issue