zeek/doc/intel.rst
Robin Sommer fb7ba82bab Merge remote-tracking branch 'origin/topic/seth/intel-framework'
* origin/topic/seth/intel-framework: (21 commits)
  Extracting URLs from message bodies over SMTP and sending them to Intel framework.
  Small comment updates in the Intel framework CIF support.
  Intelligence framework documentation first draft.
  Only the manager tries to read files with the input framework now.
  Initial support for Bro's Intel framework with the Collective Intelligence Framework.
  Initial API for Intel framework is complete.
  Fixed an issue with cluster data distribution.
  Updating some intel framework test baselines.
  Reworked cluster intelligence data distribution mechanism and fixed tests.
  Lots more intelligence checking in SMTP traffic.
  Added intelligence check for "Received" path checking and a bit of reshuffling.
  Added sources to the intel log.
  Fixing a problem with intel distribution on clusters.
  Updated intel framework test to include matching.
  Restructuring the scripts that feed data into the intel framework slightly.
  One test for cluster transparency of the intel framework.
  Fixed a cluster support bug.
  Intelligence framework checkpoint
  Major updates to fix the Intel framework API.
  Checkpoint commit.  This is all a huge mess right now. :)
  ...

Closes #914.
2012-11-01 08:21:52 -07:00

125 lines
4.9 KiB
ReStructuredText

Intel Framework
===============
Intro
-----
Intelligence data is critical to the process of monitoring for
security purposes. There is always data which will be discovered
through the incident response process and data which is shared through
private communities. The goals of Bro's Intelligence Framework are to
consume that data, make it available for matching, and provide
infrastructure around improving performance, memory utilization, and
generally making all of this easier.
Data in the Intelligence Framework is the atomic piece of intelligence
such as an IP address or an e-mail address along with a suite of
metadata about it such as a freeform source field, a freeform
descriptive field and a URL which might lead to more information about
the specific item. The metadata in the default scripts has been
deliberately kept minimal so that the community can find the
appropriate fields that need added by writing scripts which extend the
base record using the normal record extension mechanism.
Quick Start
-----------
Load the package of scripts that sends data into the Intelligence
Framework to be checked by loading this script in local.bro::
@load policy/frameworks/intel
(TODO: find some good mechanism for getting setup with good data
quickly)
Refer to the "Loading Intelligence" section below to see the format
for Intelligence Framework text files, then load those text files with
this line in local.bro::
redef Intel::read_files += { "/somewhere/yourdata.txt" };
The data itself only needs to reside on the manager if running in a
cluster.
Architecture
------------
The Intelligence Framework can be thought of as containing three
separate portions. The first part is how intelligence is loaded,
followed by the mechanism for indicating to the intelligence framework
that a piece of data which needs to be checked has been seen, and
thirdly the part where a positive match has been discovered.
Loading Intelligence
********************
Intelligence data can only be loaded through plain text files using
the Input Framework conventions. Additionally, on clusters the
manager is the only node that needs the intelligence data. The
intelligence framework has distribution mechanisms which will push
data out to all of the nodes that need it.
Here is an example of the intelligence data format. Note that all
whitespace separators are literal tabs and fields containing only a
hyphen a considered to be null values.::
#fields host net str str_type meta.source meta.desc meta.url
1.2.3.4 - - - source1 Sending phishing email http://source1.com/badhosts/1.2.3.4
- 31.131.248.0/21 - - spamhaus-drop SBL154982 - -
- - a.b.com Intel::DOMAIN source2 Name used for data exfiltration -
For more examples of built in `str_type` values, please refer to the
autogenerated documentation for the intelligence framework (TODO:
figure out how to do this link).
To load the data once files are created, use the following example
code to define files to load with your own file names of course::
redef Intel::read_files += {
"/somewhere/feed1.txt",
"/somewhere/feed2.txt",
};
Remember, the files only need to be present on the file system of the
manager node on cluster deployments.
Seen Data
*********
When some bit of data is extracted (such as an email address in the
"From" header in a message over SMTP), the Intelligence Framework
needs to be informed that this data was discovered and it's presence
should be checked within the intelligence data set. This is
accomplished through the Intel::seen (TODO: do a reference link)
function.
Typically users won't need to work with this function due to built in
hook scripts that Bro ships with that will "see" data and send it into
the intelligence framework. A user may only need to load the entire
package of hook scripts as a module or pick and choose specific
scripts to load. Keep in mind that as more data is sent into the
intelligence framework, the CPU load consumed by Bro will increase
depending on how many times the Intel::seen function is being called
which is heavily traffic dependent.
The full package of hook scripts that Bro ships with for sending this
"seen" data into the intelligence framework can be loading by adding
this line to local.bro::
@load policy/frameworks/intel
Intelligence Matches
********************
Against all hopes, most networks will eventually have a hit on
intelligence data which could indicate a possible compromise or other
unwanted activity. The Intelligence Framework provides an event that
is generated whenever a match is discovered named Intel::match (TODO:
make a link to inline docs). Due to design restrictions placed upon
the intelligence framework, there is no assurance as to where this
event will be generated. It could be generated on the worker where
the data was seen or on the manager. When the Intel::match event is
handled, only the data given as event arguments to the event can be
assured since the host where the data was seen may not be where
Intel::match is handled.