diff --git a/doc/frameworks/index.rst b/doc/frameworks/index.rst index d5b771b15e..f8c681d795 100644 --- a/doc/frameworks/index.rst +++ b/doc/frameworks/index.rst @@ -13,4 +13,5 @@ Frameworks logging notice signatures + sumstats diff --git a/doc/frameworks/sumstats-countconns.bro b/doc/frameworks/sumstats-countconns.bro new file mode 100644 index 0000000000..a10be54376 --- /dev/null +++ b/doc/frameworks/sumstats-countconns.bro @@ -0,0 +1,36 @@ +@load base/frameworks/sumstats + +event connection_established(c: connection) + { + # Make an observation! + # This observation is global so the key is empty. + # Each established connection counts as one so the observation is always 1. + SumStats::observe("conn established", + SumStats::Key(), + SumStats::Observation($num=1)); + } + +event bro_init() + { + # Create the reducer. + # The reducer attaches to the "conn established" observation stream + # and uses the summing calculation on the observations. + local r1 = SumStats::Reducer($stream="conn established", + $apply=set(SumStats::SUM)); + + # Create the final sumstat. + # We give it an arbitrary name and make it collect data every minute. + # The reducer is then attached and a $epoch_result callback is given + # to finally do something with the data collected. + SumStats::create([$name = "counting connections", + $epoch = 1min, + $reducers = set(r1), + $epoch_result(ts: time, key: SumStats::Key, result: SumStats::Result) = + { + # This is the body of the callback that is called when a single + # result has been collected. We are just printing the total number + # of connections that were seen. The $sum field is provided as a + # double type value so we need to use %f as the format specifier. + print fmt("Number of connections established: %.0f", result["conn established"]$sum); + }]); + } \ No newline at end of file diff --git a/doc/frameworks/sumstats-toy-scan.bro b/doc/frameworks/sumstats-toy-scan.bro new file mode 100644 index 0000000000..c435fb8997 --- /dev/null +++ b/doc/frameworks/sumstats-toy-scan.bro @@ -0,0 +1,45 @@ +@load base/frameworks/sumstats + +# We use the connection_attempted event limit our observations to those +# which were attempted and not successful. +event connection_attempt(c: connection) + { + # Make an observation! + # This observation is about the host attempting the connection. + # Each established connection counts as one so the observation is always 1. + SumStats::observe("conn attempted", + SumStats::Key($host=c$id$orig_h), + SumStats::Observation($num=1)); + } + +event bro_init() + { + # Create the reducer. + # The reducer attaches to the "conn attempted" observation stream + # and uses the summing calculation on the observations. Keep + # in mind that there will be one result per key (connection originator). + local r1 = SumStats::Reducer($stream="conn attempted", + $apply=set(SumStats::SUM)); + + # Create the final sumstat. + # This is slightly different from the last example since we're providing + # a callback to calculate a value to check against the threshold with + # $threshold_val. The actual threshold itself is provided with $threshold. + # Another callback is + SumStats::create([$name = "finding scanners", + $epoch = 5min, + $reducers = set(r1), + # Provide a threshold. + $threshold = 5.0, + # Provide a callback to calculate a value from the result + # to check against the threshold field. + $threshold_val(key: SumStats::Key, result: SumStats::Result) = + { + return result["conn attempted"]$sum; + }, + # Provide a callback for when a key crosses the threshold. + $threshold_crossed(key: SumStats::Key, result: SumStats::Result) = + { + print fmt("%s attempted %.0f or more connections", key$host, result["conn attempted"]$sum); + }]); + } \ No newline at end of file diff --git a/doc/frameworks/sumstats.rst b/doc/frameworks/sumstats.rst new file mode 100644 index 0000000000..e06ceaf2c8 --- /dev/null +++ b/doc/frameworks/sumstats.rst @@ -0,0 +1,102 @@ +================== +Summary Statistics +================== + +.. rst-class:: opening + + Measuring aspects of network traffic is an extremely common task in Bro. + Bro provides data structures which make this very easy as wellin + simplistic cases such as size limited trace file processing. In real- + world deployments though, there are difficulties that arise from + clusterization (many processes sniffing traffic) and unbounded data sets + (traffic never stops). The Summary Statistics (otherwise referred to as + SumStats) framework aims to define a mechanism for consuming unbounded + data sets and making them measurable in practice on large clustered and + non-clustered Bro deployments. + +.. contents:: + +Overview +======== + +The Sumstat processing flow is broken into three pieces. Observations, where +some aspect of an event is observed and fed into the Sumstats framework. +Reducers, where observations are collected and measured, typically by taking +some sort of summary statistic measurement like average or variance (among +others). Sumstats, where reducers have an epoch (time interval) that their +measurements are performed over along with callbacks for monitoring thresholds +or viewing the collected and measured data. + +Terminology +=========== + + Observation + + A single point of data. Observations have a few components of their + own. They are part of an arbitrarily named observation stream, they + have a key that is something the observation is about, and the actual + observation itself. + + Reducer + + Calculations are applied to an observation stream here to reduce the + full unbounded set of observations down to a smaller representation. + Results are collected within each reducer per-key so care must be + taken to keep the total number of keys tracked down to a reasonable + level. + + Sumstat + + The final definition of a Sumstat where one or more reducers is + collected over an interval, also known as an epoch. Thresholding can + be applied here along with a callback in the event that a threshold is + crossed. Additionally, a callback can be provided to access each + result (per-key) at the end of each epoch. + +Examples +======== + +These examples may seem very simple to an experienced Bro script developer and +they're intended to look that way. Keep in mind that these scripts will work +on small single process Bro instances as well as large many-worker clusters. +The complications from dealing with flow based load balancing can be ignored +by developers writing scripts that use Sumstats due to it's built in cluster +transparency. + +Printing the number of connections +---------------------------------- + +Sumstats provides a simple way of approaching the problem of trying to count +the number of connections over a given time interval. Here is a script with +inline documentation that does this with the Sumstats framework: + +.. btest-include:: ${DOC_ROOT}/frameworks/sumstats-countconns.bro + +When run on a sample PCAP file from the Bro test suite, the following output +is created: + +.. btest:: sumstats-countconns + + @TEST-EXEC: btest-rst-cmd bro -r ${TRACES}/workshop_2011_browse.trace ${DOC_ROOT}/frameworks/sumstats-countconns.bro + + +Toy Scan detection +------------------ + +Taking the previous example even further, we can implement a simple detection +to demonstrate the thresholding functionality. This example is a toy to +demonstate how thresholding works in Sumstats and is not meant to be a real- +world functional example, that is left to the scan.bro script that is included +with Bro. + +.. btest-include:: ${DOC_ROOT}/frameworks/sumstats-toy-scan.bro + +Let's see if there any hosts that crossed the threshold in a PCAP file +containing a host running nmap: + +.. btest:: sumstats-toy-scan + + @TEST-EXEC: btest-rst-cmd bro -r ${TRACES}/nmap-vsn.trace ${DOC_ROOT}/frameworks/sumstats-toy-scan.bro + +It seems the host running nmap was detected! + diff --git a/testing/btest/Traces/nmap-vsn.trace b/testing/btest/Traces/nmap-vsn.trace new file mode 100644 index 0000000000..b276ed3d2f Binary files /dev/null and b/testing/btest/Traces/nmap-vsn.trace differ