Merge branch 'master' into topic/jsiwek/faf-updates

Conflicts: testing/btest/Baseline/coverage.default-load-baseline/canonified_loaded_scripts.log
2025-10-02 06:38:20 +00:00 · 2013-07-31 10:05:36 -05:00 · 2013-07-31 10:05:36 -05:00 · 9bd7a65071
commit 9bd7a65071
parent 5fa9c5865b af9e181731
91 changed files with 14058 additions and 402 deletions
--- a/10118
+++ b/10118
--- a/21
+++ b/21
@ -80,7 +80,7 @@ New Functionality
  with the following user-visibible functionality (some of that was
  already available before, but done differently):

-  [TODO: This will probably change with further script updates.]
+  [TODO: Update with changes from 984e9793db56.]

      - A binary input reader interfaces the input framework with file
        analysis, allowing to inject files on disk into Bro's
@ -108,6 +108,25 @@ New Functionality
  shunting, and sampling; plus plugin support to customize filters
  dynamically.

+- Bro now provides Bloom filters of two kinds: basic Bloom filters
+  supporting membership tests, and counting Bloom filters that track
+  the frequency of elements. The corresponding functions are:
+
+    bloomfilter_basic_init(fp: double, capacity: count, name: string &default=""): opaque of bloomfilter
+    bloomfilter_counting_init(k: count, cells: count, max: count, name: string &default=""): opaque of bloomfilter
+    bloomfilter_add(bf: opaque of bloomfilter, x: any)
+    bloomfilter_lookup(bf: opaque of bloomfilter, x: any): count
+    bloomfilter_merge(bf1: opaque of bloomfilter, bf2: opaque of bloomfilter): opaque of bloomfilter
+    bloomfilter_clear(bf: opaque of bloomfilter)
+
+  See <INSERT LINK> for full documentation.
+
+- base/utils/exec.bro provides a module to start external processes
+  asynchronously and retrieve their output on termination.
+  base/utils/dir.bro uses it to monitor a directory for changes, and
+  base/utils/active-http.bro for providing an interface for querying
+  remote web servers.
+
 Changed Functionality
 ~~~~~~~~~~~~~~~~~~~~~

--- a/2
+++ b/2
@ -1 +1 @@
-2.1-824
+2.1-945
--- a/aux/binpac
+++ b/aux/binpac
@ -1 +1 @@
-Subproject commit c39bd478b9d0ecd05b1b83aa9d09a7887893977c
+Subproject commit 314fa8f65fc240e960c23c3bba98623436a72b98
--- a/aux/bro-aux
+++ b/aux/bro-aux
@ -1 +1 @@
-Subproject commit a9942558c7d3dfd80148b8aaded64c82ade3d117
+Subproject commit 91d258cc8b2f74cd02fc93dfe61f73ec9f0dd489
--- a/aux/broccoli
+++ b/aux/broccoli
@ -1 +1 @@
-Subproject commit 889f9c65944ceac20ad9230efc39d33e6e1221c3
+Subproject commit d59c73b6e0966ad63bbc63a35741b5f68263e7b1
--- a/aux/broctl
+++ b/aux/broctl
@ -1 +1 @@
-Subproject commit 0cd102805e73343cab3f9fd4a76552e13940dad9
+Subproject commit 52fd91261f41fa1528f7b964837a364d7991889e
--- a/2
+++ b/2
@ -1 +1 @@
-Subproject commit 0187b33a29d5ec824f940feff60dc5d8c2fe314f
+Subproject commit 026639f8368e56742c0cb5d9fb390ea64e60ec50
--- a/doc/intel.rst
+++ b/doc/intel.rst
@ -27,10 +27,7 @@ Quick Start
 Load the package of scripts that sends data into the Intelligence
 Framework to be checked by loading this script in local.bro::

-	@load policy/frameworks/intel
-
-(TODO: find some good mechanism for getting setup with good data
-quickly)
+	@load policy/frameworks/intel/seen

 Refer to the "Loading Intelligence" section below to see the format
 for Intelligence Framework text files, then load those text files with
@ -61,16 +58,14 @@ data out to all of the nodes that need it.

 Here is an example of the intelligence data format.  Note that all
 whitespace separators are literal tabs and fields containing only a
-hyphen a considered to be null values.::
+hyphen are considered to be null values.::

-	#fields	host	net	str	str_type	meta.source	meta.desc	meta.url
-	1.2.3.4	-	-	-	source1	Sending phishing email	http://source1.com/badhosts/1.2.3.4
-	-	31.131.248.0/21	-	-	spamhaus-drop	SBL154982	-	-
-	-	-	a.b.com	Intel::DOMAIN	source2	Name used for data exfiltration	-
+	#fields	indicator	indicator_type	meta.source	meta.desc	meta.url
+	1.2.3.4	Intel::ADDR	source1	Sending phishing email	http://source1.com/badhosts/1.2.3.4
+	a.b.com	Intel::DOMAIN	source2	Name used for data exfiltration	-

-For more examples of built in `str_type` values, please refer to the
-autogenerated documentation for the intelligence framework (TODO:
-figure out how to do this link).
+For more examples of built in `indicator_type` values, please refer to the
+autogenerated documentation for the intelligence framework.

 To load the data once files are created, use the following example
 code to define files to load with your own file names of course::
@ -90,8 +85,7 @@ When some bit of data is extracted (such as an email address in the
 "From" header in a message over SMTP), the Intelligence Framework
 needs to be informed that this data was discovered and it's presence
 should be checked within the intelligence data set.  This is
-accomplished through the Intel::seen (TODO: do a reference link)
-function.
+accomplished through the Intel::seen function.

 Typically users won't need to work with this function due to built in
 hook scripts that Bro ships with that will "see" data and send it into
@ -106,7 +100,7 @@ The full package of hook scripts that Bro ships with for sending this
 "seen" data into the intelligence framework can be loading by adding
 this line to local.bro::

-	@load policy/frameworks/intel
+	@load policy/frameworks/intel/seen

 Intelligence Matches
 ********************
--- a/doc/scripts/DocSourcesList.cmake
+++ b/doc/scripts/DocSourcesList.cmake
@ -17,6 +17,7 @@ rest_target(${psd} base/init-default.bro internal)
 rest_target(${psd} base/init-bare.bro internal)

 rest_target(${CMAKE_BINARY_DIR}/scripts base/bif/analyzer.bif.bro)
+rest_target(${CMAKE_BINARY_DIR}/scripts base/bif/bloom-filter.bif.bro)
 rest_target(${CMAKE_BINARY_DIR}/scripts base/bif/bro.bif.bro)
 rest_target(${CMAKE_BINARY_DIR}/scripts base/bif/const.bif.bro)
 rest_target(${CMAKE_BINARY_DIR}/scripts base/bif/event.bif.bro)
@ -164,9 +165,12 @@ rest_target(${psd} base/protocols/ssl/main.bro)
 rest_target(${psd} base/protocols/ssl/mozilla-ca-list.bro)
 rest_target(${psd} base/protocols/syslog/consts.bro)
 rest_target(${psd} base/protocols/syslog/main.bro)
+rest_target(${psd} base/utils/active-http.bro)
 rest_target(${psd} base/utils/addrs.bro)
 rest_target(${psd} base/utils/conn-ids.bro)
+rest_target(${psd} base/utils/dir.bro)
 rest_target(${psd} base/utils/directions-and-hosts.bro)
+rest_target(${psd} base/utils/exec.bro)
 rest_target(${psd} base/utils/files.bro)
 rest_target(${psd} base/utils/numbers.bro)
 rest_target(${psd} base/utils/paths.bro)
@ -184,15 +188,16 @@ rest_target(${psd} policy/frameworks/dpd/detect-protocols.bro)
 rest_target(${psd} policy/frameworks/dpd/packet-segment-logging.bro)
 rest_target(${psd} policy/frameworks/files/detect-MHR.bro)
 rest_target(${psd} policy/frameworks/files/hash-all-files.bro)
-rest_target(${psd} policy/frameworks/intel/conn-established.bro)
-rest_target(${psd} policy/frameworks/intel/dns.bro)
-rest_target(${psd} policy/frameworks/intel/http-host-header.bro)
-rest_target(${psd} policy/frameworks/intel/http-url.bro)
-rest_target(${psd} policy/frameworks/intel/http-user-agents.bro)
-rest_target(${psd} policy/frameworks/intel/smtp-url-extraction.bro)
-rest_target(${psd} policy/frameworks/intel/smtp.bro)
-rest_target(${psd} policy/frameworks/intel/ssl.bro)
-rest_target(${psd} policy/frameworks/intel/where-locations.bro)
+rest_target(${psd} policy/frameworks/intel/do_notice.bro)
+rest_target(${psd} policy/frameworks/intel/seen/conn-established.bro)
+rest_target(${psd} policy/frameworks/intel/seen/dns.bro)
+rest_target(${psd} policy/frameworks/intel/seen/http-host-header.bro)
+rest_target(${psd} policy/frameworks/intel/seen/http-url.bro)
+rest_target(${psd} policy/frameworks/intel/seen/http-user-agents.bro)
+rest_target(${psd} policy/frameworks/intel/seen/smtp-url-extraction.bro)
+rest_target(${psd} policy/frameworks/intel/seen/smtp.bro)
+rest_target(${psd} policy/frameworks/intel/seen/ssl.bro)
+rest_target(${psd} policy/frameworks/intel/seen/where-locations.bro)
 rest_target(${psd} policy/frameworks/packet-filter/shunt.bro)
 rest_target(${psd} policy/frameworks/software/version-changes.bro)
 rest_target(${psd} policy/frameworks/software/vulnerable.bro)
--- a/scripts/base/frameworks/intel/main.bro
+++ b/scripts/base/frameworks/intel/main.bro
@ -10,13 +10,14 @@ module Intel;
 export {
 	redef enum Log::ID += { LOG };
 	
-	## String data needs to be further categoried since it could represent
-	## and number of types of data.
-	type StrType: enum {
+	## Enum type to represent various types of intelligence data.
+	type Type: enum {
+		## An IP address.
+		ADDR,
 		## A complete URL without the prefix "http://".
 		URL,
-		## User-Agent string, typically HTTP or mail message body.
-		USER_AGENT,
+		## Software name.
+		SOFTWARE,
 		## Email address.
 		EMAIL,
 		## DNS domain name.
@ -44,16 +45,13 @@ export {
 	
 	## Represents a piece of intelligence.
 	type Item: record {
-		## The IP address if the intelligence is about an IP address.
-		host:        addr           &optional;
-		## The network if the intelligence is about a CIDR block.
-		net:         subnet         &optional;
-		## The string if the intelligence is about a string.
-		str:         string         &optional;
-		## The type of data that is in the string if the $str field is set.
-		str_type:    StrType        &optional;
+		## The intelligence indicator.
+		indicator:      string;

-		## Metadata for the item.  Typically represents more deeply \
+		## The type of data that the indicator field represents.
+		indicator_type: Type;
+		
+		## Metadata for the item.  Typically represents more deeply
 		## descriptive data for a piece of intelligence.
 		meta:           MetaData;
 	};
@ -65,16 +63,16 @@ export {
 		IN_ANYWHERE,
 	};

-	## The $host field and combination of $str and $str_type fields are mutually 
-	## exclusive.  These records *must* represent either an IP address being
-	## seen or a string being seen.
 	type Seen: record {
-		## The IP address if the data seen is an IP address.
-		host:      addr          &log &optional;
 		## The string if the data is about a string.
-		str:       string        &log &optional;
-		## The type of data that is in the string if the $str field is set.
-		str_type:  StrType       &log &optional;
+		indicator:       string        &log &optional;
+
+		## The type of data that the indicator represents.
+		indicator_type:  Type          &log &optional;
+
+		## If the indicator type was :bro:enum:`Intel::ADDR`, then this 
+		## field will be present.
+		host:            addr          &optional;

 		## Where the data was discovered.
 		where:           Where         &log;
@ -100,7 +98,7 @@ export {
 		## Where the data was seen.
 		seen:     Seen           &log;
 		## Sources which supplied data that resulted in this match.
-		sources:  set[string]    &log;
+		sources:  set[string]    &log &default=string_set();
 	};

 	## Intelligence data manipulation functions.
@ -135,8 +133,8 @@ const have_full_data = T &redef;

 # The in memory data structure for holding intelligence.
 type DataStore: record {
-	net_data:    table[subnet] of set[MetaData];
-	string_data: table[string, StrType] of set[MetaData];
+	host_data:    table[addr] of set[MetaData];
+	string_data:  table[string, Type] of set[MetaData];
 };
 global data_store: DataStore &redef;

@ -144,8 +142,8 @@ global data_store: DataStore &redef;
 # This is primarily for workers to do the initial quick matches and store
 # a minimal amount of data for the full match to happen on the manager.
 type MinDataStore: record {
-	net_data:    set[subnet];
-	string_data: set[string, StrType];
+	host_data:    set[addr];
+	string_data:  set[string, Type];
 };
 global min_data_store: MinDataStore &redef;

@ -157,15 +155,13 @@ event bro_init() &priority=5

 function find(s: Seen): bool
 	{
-	if ( s?$host && 
-	     ((have_full_data && s$host in data_store$net_data) || 
-	      (s$host in min_data_store$net_data)))
+	if ( s?$host )
 		{
-		return T;
+		return ((s$host in min_data_store$host_data) || 
+		        (have_full_data && s$host in data_store$host_data));
 		}
-	else if ( s?$str && s?$str_type &&
-	          ((have_full_data && [s$str, s$str_type] in data_store$string_data) ||
-	           ([s$str, s$str_type] in min_data_store$string_data)))
+	else if ( ([to_lower(s$indicator), s$indicator_type] in min_data_store$string_data) ||
+	           (have_full_data && [to_lower(s$indicator), s$indicator_type] in data_store$string_data) )
 		{
 		return T;
 		}
@ -177,8 +173,7 @@ function find(s: Seen): bool

 function get_items(s: Seen): set[Item]
 	{
-	local item: Item;
-	local return_data: set[Item] = set();
+	local return_data: set[Item];

 	if ( ! have_full_data )
 		{
@ -191,26 +186,23 @@ function get_items(s: Seen): set[Item]
 	if ( s?$host )
 		{
 		# See if the host is known about and it has meta values
-		if ( s$host in data_store$net_data )
+		if ( s$host in data_store$host_data )
 			{
-			for ( m in data_store$net_data[s$host] )
+			for ( m in data_store$host_data[s$host] )
 				{
-				# TODO: the lookup should be finding all and not just most specific
-				#       and $host/$net should have the correct value.
-				item = [$host=s$host, $meta=m];
-				add return_data[item];
+				add return_data[Item($indicator=cat(s$host), $indicator_type=ADDR, $meta=m)];
 				}
 			}
 		}
-	else if ( s?$str && s?$str_type )
+	else
 		{
+		local lower_indicator = to_lower(s$indicator);
 		# See if the string is known about and it has meta values
-		if ( [s$str, s$str_type] in data_store$string_data )
+		if ( [lower_indicator, s$indicator_type] in data_store$string_data )
 			{
-			for ( m in data_store$string_data[s$str, s$str_type] )
+			for ( m in data_store$string_data[lower_indicator, s$indicator_type] )
 				{
-				item = [$str=s$str, $str_type=s$str_type, $meta=m];
-				add return_data[item];
+				add return_data[Item($indicator=s$indicator, $indicator_type=s$indicator_type, $meta=m)];
 				}
 			}
 		}
@ -222,6 +214,12 @@ function Intel::seen(s: Seen)
 	{
 	if ( find(s) )
 		{
+		if ( s?$host )
+			{
+			s$indicator = cat(s$host);
+			s$indicator_type = Intel::ADDR;
+			}
+
 		if ( have_full_data )
 			{
 			local items = get_items(s);
@ -250,8 +248,7 @@ function has_meta(check: MetaData, metas: set[MetaData]): bool

 event Intel::match(s: Seen, items: set[Item]) &priority=5
 	{
-	local empty_set: set[string] = set();
-	local info: Info = [$ts=network_time(), $seen=s, $sources=empty_set];
+	local info: Info = [$ts=network_time(), $seen=s];

 	if ( s?$conn )
 		{
@ -267,52 +264,37 @@ event Intel::match(s: Seen, items: set[Item]) &priority=5

 function insert(item: Item)
 	{
-	if ( item?$str && !item?$str_type )
-		{
-		event reporter_warning(network_time(), fmt("You must provide a str_type for strings or this item doesn't make sense.  Item: %s", item), "");
-		return;
-		}
-
 	# Create and fill out the meta data item.
 	local meta = item$meta;
 	local metas: set[MetaData];

-	if ( item?$host )
+	# All intelligence is case insensitive at the moment.
+	local lower_indicator = to_lower(item$indicator);
+
+	if ( item$indicator_type == ADDR )
 		{
-		local host = mask_addr(item$host, is_v4_addr(item$host) ? 32 : 128);
+		local host = to_addr(item$indicator);
 		if ( have_full_data )
 			{
-			if ( host !in data_store$net_data )
-				data_store$net_data[host] = set();
+			if ( host !in data_store$host_data )
+				data_store$host_data[host] = set();

-			metas = data_store$net_data[host];
+			metas = data_store$host_data[host];
 			}

-		add min_data_store$net_data[host];
+		add min_data_store$host_data[host];
 		}
-	else if ( item?$net )
+	else
 		{
 		if ( have_full_data )
 			{
-			if ( item$net !in data_store$net_data )
-				data_store$net_data[item$net] = set();
+			if ( [lower_indicator, item$indicator_type] !in data_store$string_data )
+				data_store$string_data[lower_indicator, item$indicator_type] = set();

-			metas = data_store$net_data[item$net];
+			metas = data_store$string_data[lower_indicator, item$indicator_type];
 			}

-		add min_data_store$net_data[item$net];
-		}
-	else if ( item?$str )
-		{
-		if ( have_full_data )
-			{
-			if ( [item$str, item$str_type] !in data_store$string_data )
-				data_store$string_data[item$str, item$str_type] = set();
-
-			metas = data_store$string_data[item$str, item$str_type];
-			}
-
-		add min_data_store$string_data[item$str, item$str_type];
+		add min_data_store$string_data[lower_indicator, item$indicator_type];
 		}

 	local updated = F;
--- a/scripts/base/init-bare.bro
+++ b/scripts/base/init-bare.bro
@ -702,6 +702,7 @@ type entropy_test_result: record {
@load base/bif/strings.bif
@load base/bif/bro.bif
@load base/bif/reporter.bif
+@load base/bif/bloom-filter.bif

 ## Deprecated. This is superseded by the new logging framework.
 global log_file_name: function(tag: string): string &redef;
@ -3047,3 +3048,5 @@ const snaplen = 8192 &redef;
@load base/frameworks/input
@load base/frameworks/analyzer
@load base/frameworks/files
+
+@load base/bif
--- a/scripts/base/init-default.bro
+++ b/scripts/base/init-default.bro
@ -5,9 +5,12 @@
 ##! you actually want.

@load base/utils/site
+@load base/utils/active-http
@load base/utils/addrs
@load base/utils/conn-ids
+@load base/utils/dir
@load base/utils/directions-and-hosts
+@load base/utils/exec
@load base/utils/files
@load base/utils/numbers
@load base/utils/paths
--- a/scripts/base/protocols/smtp/main.bro
+++ b/scripts/base/protocols/smtp/main.bro
@ -226,7 +226,10 @@ event mime_one_header(c: connection, h: mime_header_rec) &priority=5
 		{
 		if ( ! c$smtp?$to )
 			c$smtp$to = set();
-		add c$smtp$to[h$value];
+
+		local to_parts = split(h$value, /[[:blank:]]*,[[:blank:]]*/);
+		for ( i in to_parts )
+			add c$smtp$to[to_parts[i]];
 		}

 	else if ( h$name == "X-ORIGINATING-IP" )
--- a/scripts/base/utils/active-http.bro
+++ b/scripts/base/utils/active-http.bro
@ -0,0 +1,123 @@
+##! A module for performing active HTTP requests and
+##! getting the reply at runtime.
+
+@load ./exec
+
+module ActiveHTTP;
+
+export {
+	## The default timeout for HTTP requests.
+	const default_max_time = 1min &redef;
+
+	## The default HTTP method/verb to use for requests.
+	const default_method = "GET" &redef;
+
+	type Response: record {
+		## Numeric response code from the server.
+		code:      count;
+		## String response message from the server.
+		msg:       string;
+		## Full body of the response.
+		body:      string                  &optional;
+		## All headers returned by the server.
+		headers:   table[string] of string &optional;
+	};
+
+	type Request: record {
+		## The URL being requested.
+		url:             string;
+		## The HTTP method/verb to use for the request.
+		method:          string                  &default=default_method;
+		## Data to send to the server in the client body.  Keep in
+		## mind that you will probably need to set the *method* field
+		## to "POST" or "PUT".
+		client_data:     string                  &optional;
+		## Arbitrary headers to pass to the server.  Some headers
+		## will be included by libCurl.
+		#custom_headers: table[string] of string &optional;
+		## Timeout for the request.
+		max_time:        interval                &default=default_max_time;
+		## Additional curl command line arguments.  Be very careful
+		## with this option since shell injection could take place
+		## if careful handling of untrusted data is not applied.
+		addl_curl_args:  string                  &optional;
+	};
+
+	## Perform an HTTP request according to the :bro:type:`Request` record.
+	## This is an asynchronous function and must be called within a "when"
+	## statement.
+	##
+	## req: A record instance representing all options for an HTTP request.
+	##
+	## Returns: A record with the full response message.
+	global request: function(req: ActiveHTTP::Request): ActiveHTTP::Response;
+}
+
+function request2curl(r: Request, bodyfile: string, headersfile: string): string
+	{
+	local cmd = fmt("curl -s -g -o \"%s\" -D \"%s\" -X \"%s\"",
+	                str_shell_escape(bodyfile),
+	                str_shell_escape(headersfile),
+	                str_shell_escape(r$method));
+
+	cmd = fmt("%s -m %.0f", cmd, r$max_time);
+
+	if ( r?$client_data )
+		cmd = fmt("%s -d -", cmd);
+
+	if ( r?$addl_curl_args )
+		cmd = fmt("%s %s", cmd, r$addl_curl_args);
+
+	cmd = fmt("%s \"%s\"", cmd, str_shell_escape(r$url));
+	return cmd;
+	}
+
+function request(req: Request): ActiveHTTP::Response
+	{
+	local tmpfile     = "/tmp/bro-activehttp-" + unique_id("");
+	local bodyfile    = fmt("%s_body", tmpfile);
+	local headersfile = fmt("%s_headers", tmpfile);
+
+	local cmd = request2curl(req, bodyfile, headersfile);
+	local stdin_data = req?$client_data ? req$client_data : "";
+
+	local resp: Response;
+	resp$code = 0;
+	resp$msg = "";
+	resp$body = "";
+	resp$headers = table();
+	return when ( local result = Exec::run([$cmd=cmd, $stdin=stdin_data, $read_files=set(bodyfile, headersfile)]) )
+		{
+		# If there is no response line then nothing else will work either.
+		if ( ! (result?$files && headersfile in result$files) )
+			{
+			Reporter::error(fmt("There was a failure when requesting \"%s\" with ActiveHTTP.", req$url));
+			return resp;
+			}
+
+		local headers = result$files[headersfile];
+		for ( i in headers )
+			{
+			# The reply is the first line.
+			if ( i == 0 )
+				{
+				local response_line = split_n(headers[0], /[[:blank:]]+/, F, 2);
+				if ( |response_line| != 3 )
+					return resp;
+
+				resp$code = to_count(response_line[2]);
+				resp$msg = response_line[3];
+				resp$body = join_string_vec(result$files[bodyfile], "");
+				}
+			else
+				{
+				local line = headers[i];
+				local h = split1(line, /:/);
+				if ( |h| != 2 )
+					next;
+				resp$headers[h[1]] = sub_bytes(h[2], 0, |h[2]|-1);
+				}
+			}
+		return resp;
+		}
+	}
--- a/scripts/base/utils/dir.bro
+++ b/scripts/base/utils/dir.bro
@ -0,0 +1,66 @@
+@load base/utils/exec
+@load base/frameworks/reporter
+@load base/utils/paths
+
+module Dir;
+
+export {
+	## The default interval this module checks for files in directories when
+	## using the :bro:see:`Dir::monitor` function.
+	const polling_interval = 30sec &redef;
+
+	## Register a directory to monitor with a callback that is called
+	## every time a previously unseen file is seen.  If a file is deleted
+	## and seen to be gone, the file is available for being seen again in
+	## the future.
+	##
+	## dir: The directory to monitor for files.
+	##
+	## callback: Callback that gets executed with each file name
+	##           that is found.  Filenames are provided with the full path.
+	##
+	## poll_interval: An interval at which to check for new files.
+	global monitor: function(dir: string, callback: function(fname: string),
+	                         poll_interval: interval &default=polling_interval);
+}
+
+event Dir::monitor_ev(dir: string, last_files: set[string],
+                      callback: function(fname: string),
+                      poll_interval: interval)
+	{
+	when ( local result = Exec::run([$cmd=fmt("ls -i \"%s/\"", str_shell_escape(dir))]) )
+		{
+		if ( result$exit_code != 0 )
+			{
+			Reporter::warning(fmt("Requested monitoring of non-existent directory (%s).", dir));
+			return;
+			}
+
+		local current_files: set[string] = set();
+		local files: vector of string = vector();
+
+		if ( result?$stdout )
+			files = result$stdout;
+
+		for ( i in files )
+			{
+			local parts = split1(files[i], / /);
+			if ( parts[1] !in last_files )
+				callback(build_path_compressed(dir, parts[2]));
+			add current_files[parts[1]];
+			}
+
+		schedule poll_interval
+			{
+			Dir::monitor_ev(dir, current_files, callback, poll_interval)
+			};
+		}
+	}
+
+function monitor(dir: string, callback: function(fname: string),
+                 poll_interval: interval &default=polling_interval)
+	{
+	event Dir::monitor_ev(dir, set(), callback, poll_interval);
+	}
+
+
--- a/scripts/base/utils/exec.bro
+++ b/scripts/base/utils/exec.bro
@ -0,0 +1,185 @@
+##! A module for executing external command line programs.
+
+@load base/frameworks/input
+
+module Exec;
+
+export {
+	type Command: record {
+		## The command line to execute.  Use care to avoid injection attacks.
+		## I.e. if the command uses untrusted/variable data, sanitize
+                ## it with str_shell_escape().
+		cmd:         string;
+		## Provide standard in to the program as a string.
+		stdin:       string      &default="";
+		## If additional files are required to be read in as part of the output
+		## of the command they can be defined here.
+		read_files:  set[string] &optional;
+		# The unique id for tracking executors.
+		uid: string &default=unique_id("");
+	};
+
+	type Result: record {
+		## Exit code from the program.
+		exit_code:    count            &default=0;
+		## True if the command was terminated with a signal.
+		signal_exit:  bool             &default=F;
+		## Each line of standard out.
+		stdout:       vector of string &optional;
+		## Each line of standard error.
+		stderr:       vector of string &optional;
+		## If additional files were requested to be read in
+		## the content of the files will be available here.
+		files:        table[string] of string_vec &optional;
+	};
+
+	## Function for running command line programs and getting
+	## output.  This is an asynchronous function which is meant
+	## to be run with the `when` statement.
+	##
+	## cmd: The command to run.  Use care to avoid injection attacks!
+	##
+	## returns: A record representing the full results from the
+	##          external program execution.
+	global run: function(cmd: Command): Result;
+
+	## The system directory for temp files.
+	const tmp_dir = "/tmp" &redef;
+}
+
+# Indexed by command uid.
+global results: table[string] of Result;
+global pending_commands: set[string];
+global pending_files: table[string] of set[string];
+
+type OneLine: record {
+	s: string;
+	is_stderr: bool;
+};
+
+type FileLine: record {
+	s: string;
+};
+
+event Exec::line(description: Input::EventDescription, tpe: Input::Event, s: string, is_stderr: bool)
+	{
+	local result = results[description$name];
+	if ( is_stderr )
+		{
+		if ( ! result?$stderr )
+			result$stderr = vector(s);
+		else
+			result$stderr[|result$stderr|] = s;
+		}
+	else
+		{
+		if ( ! result?$stdout )
+			result$stdout = vector(s);
+		else
+			result$stdout[|result$stdout|] = s;
+		}
+	}
+
+event Exec::file_line(description: Input::EventDescription, tpe: Input::Event, s: string)
+	{
+	local parts = split1(description$name, /_/);
+	local name = parts[1];
+	local track_file = parts[2];
+
+	local result = results[name];
+	if ( ! result?$files )
+		result$files = table();
+
+	if ( track_file !in result$files )
+		result$files[track_file] = vector(s);
+	else
+		result$files[track_file][|result$files[track_file]|] = s;
+	}
+
+event Input::end_of_data(name: string, source:string)
+	{
+	local parts = split1(name, /_/);
+	name = parts[1];
+
+	if ( name !in pending_commands || |parts| < 2 )
+		return;
+
+	local track_file = parts[2];
+
+	Input::remove(name);
+
+	if ( name !in pending_files )
+		delete pending_commands[name];
+	else
+		{
+		delete pending_files[name][track_file];
+		if ( |pending_files[name]| == 0 )
+			delete pending_commands[name];
+		system(fmt("rm \"%s\"", str_shell_escape(track_file)));
+		}
+	}
+
+event InputRaw::process_finished(name: string, source:string, exit_code:count, signal_exit:bool)
+	{
+	if ( name !in pending_commands )
+		return;
+
+	Input::remove(name);
+	results[name]$exit_code = exit_code;
+	results[name]$signal_exit = signal_exit;
+
+	if ( name !in pending_files || |pending_files[name]| == 0 )
+		# No extra files to read, command is done.
+		delete pending_commands[name];
+	else
+		for ( read_file in pending_files[name] )
+			Input::add_event([$source=fmt("%s", read_file),
+			                  $name=fmt("%s_%s", name, read_file),
+			                  $reader=Input::READER_RAW,
+			                  $want_record=F,
+			                  $fields=FileLine,
+			                  $ev=Exec::file_line]);
+	}
+
+function run(cmd: Command): Result
+	{
+	add pending_commands[cmd$uid];
+	results[cmd$uid] = [];
+
+	if ( cmd?$read_files )
+		{
+		for ( read_file in cmd$read_files )
+			{
+			if ( cmd$uid !in pending_files )
+				pending_files[cmd$uid] = set();
+			add pending_files[cmd$uid][read_file];
+			}
+		}
+
+	local config_strings: table[string] of string = {
+		["stdin"]       = cmd$stdin,
+		["read_stderr"] = "1",
+	};
+	Input::add_event([$name=cmd$uid,
+	                  $source=fmt("%s |", cmd$cmd),
+	                  $reader=Input::READER_RAW,
+	                  $fields=Exec::OneLine,
+	                  $ev=Exec::line,
+	                  $want_record=F,
+	                  $config=config_strings]);
+
+	return when ( cmd$uid !in pending_commands )
+		{
+		local result = results[cmd$uid];
+		delete results[cmd$uid];
+		return result;
+		}
+	}
+
+event bro_done()
+	{
+	# We are punting here and just deleting any unprocessed files.
+	for ( uid in pending_files )
+		for ( fname in pending_files[uid] )
+			system(fmt("rm \"%s\"", str_shell_escape(fname)));
+	}
--- a/scripts/policy/frameworks/intel/conn-established.bro
+++ b/scripts/policy/frameworks/intel/conn-established.bro
@ -1,8 +0,0 @@
-@load base/frameworks/intel
-@load ./where-locations
-
-event connection_established(c: connection)
-	{
-	Intel::seen([$host=c$id$orig_h, $conn=c, $where=Conn::IN_ORIG]);
-	Intel::seen([$host=c$id$resp_h, $conn=c, $where=Conn::IN_RESP]);
-	}
--- a/scripts/policy/frameworks/intel/do_notice.bro
+++ b/scripts/policy/frameworks/intel/do_notice.bro
@ -0,0 +1,44 @@
+
+@load base/frameworks/intel
+@load base/frameworks/notice
+
+module Intel;
+
+export {
+	redef enum Notice::Type += {
+		## Intel::Notice is a notice that happens when an intelligence 
+		## indicator is denoted to be notice-worthy.
+		Intel::Notice
+	};
+
+	redef record Intel::MetaData += {
+		## A boolean value to allow the data itself to represent
+		## if the indicator that this metadata is attached to 
+		## is notice worthy.
+		do_notice: bool &default=F;
+
+		## Restrictions on when notices are created to only create
+		## them if the do_notice field is T and the notice was
+		## seen in the indicated location.
+		if_in: Intel::Where &optional;
+	};
+}
+
+event Intel::match(s: Seen, items: set[Item])
+	{
+	for ( item in items )
+		{
+		if ( item$meta$do_notice &&
+		     (! item$meta?$if_in || s$where == item$meta$if_in) )
+			{
+			local n = Notice::Info($note=Intel::Notice,
+			                       $msg=fmt("Intel hit on %s at %s", s$indicator, s$where),
+			                       $sub=s$indicator);
+
+			if ( s?$conn )
+				n$conn = s$conn;
+
+			NOTICE(n);
+			}
+		}
+	}
--- a/scripts/policy/frameworks/intel/seen/load.bro
+++ b/scripts/policy/frameworks/intel/seen/load.bro
--- a/scripts/policy/frameworks/intel/seen/conn-established.bro
+++ b/scripts/policy/frameworks/intel/seen/conn-established.bro
@ -0,0 +1,12 @@
+@load base/frameworks/intel
+@load ./where-locations
+
+event connection_established(c: connection)
+	{
+	if ( c$orig$state == TCP_ESTABLISHED &&
+	     c$resp$state == TCP_ESTABLISHED )
+		{
+		Intel::seen([$host=c$id$orig_h, $conn=c, $where=Conn::IN_ORIG]);
+		Intel::seen([$host=c$id$resp_h, $conn=c, $where=Conn::IN_RESP]);
+		}
+	}
--- a/scripts/policy/frameworks/intel/seen/dns.bro
+++ b/scripts/policy/frameworks/intel/seen/dns.bro
@ -3,8 +3,8 @@

 event dns_request(c: connection, msg: dns_msg, query: string, qtype: count, qclass: count)
 	{
-	Intel::seen([$str=query,
-	             $str_type=Intel::DOMAIN,
+	Intel::seen([$indicator=query,
+	             $indicator_type=Intel::DOMAIN,
 	             $conn=c,
 	             $where=DNS::IN_REQUEST]);
 	}
--- a/scripts/policy/frameworks/intel/seen/http-host-header.bro
+++ b/scripts/policy/frameworks/intel/seen/http-host-header.bro
@ -4,8 +4,8 @@
 event http_header(c: connection, is_orig: bool, name: string, value: string)
 	{
 	if ( is_orig && name == "HOST" )
-		Intel::seen([$str=value,
-		             $str_type=Intel::DOMAIN,
+		Intel::seen([$indicator=value,
+		             $indicator_type=Intel::DOMAIN,
 		             $conn=c,
 		             $where=HTTP::IN_HOST_HEADER]);
 	}
--- a/scripts/policy/frameworks/intel/seen/http-url.bro
+++ b/scripts/policy/frameworks/intel/seen/http-url.bro
@ -5,8 +5,8 @@
 event http_message_done(c: connection, is_orig: bool, stat: http_message_stat)
 	{
 	if ( is_orig && c?$http )
-		Intel::seen([$str=HTTP::build_url(c$http),
-		             $str_type=Intel::URL,
+		Intel::seen([$indicator=HTTP::build_url(c$http),
+		             $indicator_type=Intel::URL,
 		             $conn=c,
 		             $where=HTTP::IN_URL]);
 	}
--- a/scripts/policy/frameworks/intel/seen/http-user-agents.bro
+++ b/scripts/policy/frameworks/intel/seen/http-user-agents.bro
@ -4,8 +4,8 @@
 event http_header(c: connection, is_orig: bool, name: string, value: string)
 	{
 	if ( is_orig && name == "USER-AGENT" )
-		Intel::seen([$str=value,
-		             $str_type=Intel::USER_AGENT,
+		Intel::seen([$indicator=value,
+		             $indicator_type=Intel::SOFTWARE,
 		             $conn=c,
 		             $where=HTTP::IN_USER_AGENT_HEADER]);
 	}
--- a/scripts/policy/frameworks/intel/seen/smtp-url-extraction.bro
+++ b/scripts/policy/frameworks/intel/seen/smtp-url-extraction.bro
@ -14,8 +14,8 @@ event intel_mime_data(f: fa_file, data: string)
 		local urls = find_all_urls_without_scheme(data);
 		for ( url in urls )
 			{
-			Intel::seen([$str=url,
-			             $str_type=Intel::URL,
+			Intel::seen([$indicator=url,
+			             $indicator_type=Intel::URL,
 			             $conn=c,
 			             $where=SMTP::IN_MESSAGE]);
 			}
--- a/scripts/policy/frameworks/intel/seen/smtp.bro
+++ b/scripts/policy/frameworks/intel/seen/smtp.bro
@ -0,0 +1,97 @@
+@load base/frameworks/intel
+@load base/protocols/smtp
+@load ./where-locations
+
+event mime_end_entity(c: connection)
+	{
+	if ( c?$smtp )
+		{
+		if ( c$smtp?$path )
+			{
+			local path = c$smtp$path;
+			for ( i in path )
+				{
+				Intel::seen([$host=path[i],
+				             $conn=c,
+				             $where=SMTP::IN_RECEIVED_HEADER]);
+				}
+			}
+
+		if ( c$smtp?$user_agent )
+			Intel::seen([$indicator=c$smtp$user_agent,
+			             $indicator_type=Intel::SOFTWARE,
+			             $conn=c,
+			             $where=SMTP::IN_HEADER]);
+
+		if ( c$smtp?$x_originating_ip )
+			Intel::seen([$host=c$smtp$x_originating_ip,
+			             $conn=c,
+			             $where=SMTP::IN_X_ORIGINATING_IP_HEADER]);
+
+		if ( c$smtp?$mailfrom )
+			{
+			local mailfromparts = split_n(c$smtp$mailfrom, /<.+>/, T, 1);
+			if ( |mailfromparts| > 2 )
+				{
+				Intel::seen([$indicator=mailfromparts[2][1:-2],
+				             $indicator_type=Intel::EMAIL,
+				             $conn=c,
+				             $where=SMTP::IN_MAIL_FROM]);
+				}
+			}
+
+		if ( c$smtp?$rcptto )
+			{
+			for ( rcptto in c$smtp$rcptto )
+				{
+				local rcpttoparts = split_n(rcptto, /<.+>/, T, 1);
+				if ( |rcpttoparts| > 2 )
+					{
+					Intel::seen([$indicator=rcpttoparts[2][1:-2],
+					             $indicator_type=Intel::EMAIL,
+					             $conn=c,
+					             $where=SMTP::IN_RCPT_TO]);
+					}
+				}
+			}
+
+		if ( c$smtp?$from )
+			{
+			local fromparts = split_n(c$smtp$from, /<.+>/, T, 1);
+			if ( |fromparts| > 2 )
+				{
+				Intel::seen([$indicator=fromparts[2][1:-2],
+				             $indicator_type=Intel::EMAIL,
+				             $conn=c,
+				             $where=SMTP::IN_FROM]);
+				}
+			}
+
+		if ( c$smtp?$to )
+			{
+			for ( email_to in c$smtp$to )
+				{
+				local toparts = split_n(email_to, /<.+>/, T, 1);
+				if ( |toparts| > 2 )
+					{
+					Intel::seen([$indicator=toparts[2][1:-2],
+					             $indicator_type=Intel::EMAIL,
+					             $conn=c,
+					             $where=SMTP::IN_TO]);
+					}
+				}
+			}
+
+		if ( c$smtp?$reply_to )
+			{
+			local replytoparts = split_n(c$smtp$reply_to, /<.+>/, T, 1);
+			if ( |replytoparts| > 2 )
+				{
+				Intel::seen([$indicator=replytoparts[2][1:-2],
+				             $indicator_type=Intel::EMAIL,
+				             $conn=c,
+				             $where=SMTP::IN_REPLY_TO]);
+				}
+			}
+		}
+	}
--- a/scripts/policy/frameworks/intel/seen/ssl.bro
+++ b/scripts/policy/frameworks/intel/seen/ssl.bro
@ -10,14 +10,14 @@ event x509_certificate(c: connection, is_orig: bool, cert: X509, chain_idx: coun
 			{
 			local email = sub(cert$subject, /^.*emailAddress=/, "");
 			email = sub(email, /,.*$/, "");
-			Intel::seen([$str=email,
-			             $str_type=Intel::EMAIL,
+			Intel::seen([$indicator=email,
+			             $indicator_type=Intel::EMAIL,
 			             $conn=c,
 			             $where=(is_orig ? SSL::IN_CLIENT_CERT : SSL::IN_SERVER_CERT)]);
 			}

-		Intel::seen([$str=sha1_hash(der_cert),
-		             $str_type=Intel::CERT_HASH,
+		Intel::seen([$indicator=sha1_hash(der_cert),
+		             $indicator_type=Intel::CERT_HASH,
 		             $conn=c,
 		             $where=(is_orig ? SSL::IN_CLIENT_CERT : SSL::IN_SERVER_CERT)]);
 		}
@ -27,8 +27,8 @@ event ssl_extension(c: connection, is_orig: bool, code: count, val: string)
 	{
 	if ( is_orig && SSL::extensions[code] == "server_name" && 
 	     c?$ssl && c$ssl?$server_name )
-		Intel::seen([$str=c$ssl$server_name,
-		             $str_type=Intel::DOMAIN,
+		Intel::seen([$indicator=c$ssl$server_name,
+		             $indicator_type=Intel::DOMAIN,
 		             $conn=c,
 		             $where=SSL::IN_SERVER_NAME]);
 	}
--- a/scripts/policy/frameworks/intel/seen/where-locations.bro
+++ b/scripts/policy/frameworks/intel/seen/where-locations.bro
--- a/scripts/policy/frameworks/intel/smtp.bro
+++ b/scripts/policy/frameworks/intel/smtp.bro
@ -1,71 +0,0 @@
-@load base/frameworks/intel
-@load base/protocols/smtp
-@load ./where-locations
-
-event mime_end_entity(c: connection)
-	{
-	if ( c?$smtp )
-		{
-		if ( c$smtp?$path )
-			{
-			local path = c$smtp$path;
-			for ( i in path )
-				{
-				Intel::seen([$host=path[i],
-				             $conn=c,
-				             $where=SMTP::IN_RECEIVED_HEADER]);
-				}
-			}
-
-		if ( c$smtp?$user_agent )
-			Intel::seen([$str=c$smtp$user_agent,
-			             $str_type=Intel::USER_AGENT,
-			             $conn=c,
-			             $where=SMTP::IN_HEADER]);
-
-		if ( c$smtp?$x_originating_ip )
-			Intel::seen([$host=c$smtp$x_originating_ip,
-			             $conn=c,
-			             $where=SMTP::IN_X_ORIGINATING_IP_HEADER]);
-
-		if ( c$smtp?$mailfrom )
-			Intel::seen([$str=c$smtp$mailfrom,
-			             $str_type=Intel::EMAIL,
-			             $conn=c,
-			             $where=SMTP::IN_MAIL_FROM]);
-
-		if ( c$smtp?$rcptto )
-			{
-			for ( rcptto in c$smtp$rcptto )
-				{
-				Intel::seen([$str=rcptto,
-				             $str_type=Intel::EMAIL,
-				             $conn=c,
-				             $where=SMTP::IN_RCPT_TO]);
-				}
-			}
-
-		if ( c$smtp?$from )
-			Intel::seen([$str=c$smtp$from,
-			             $str_type=Intel::EMAIL,
-			             $conn=c,
-			             $where=SMTP::IN_FROM]);
-
-		if ( c$smtp?$to )
-			{
-			for ( email_to in c$smtp$to )
-				{
-				Intel::seen([$str=email_to,
-				             $str_type=Intel::EMAIL,
-				             $conn=c,
-				             $where=SMTP::IN_TO]);
-				}
-			}
-
-		if ( c$smtp?$reply_to )
-			Intel::seen([$str=c$smtp$reply_to,
-			             $str_type=Intel::EMAIL,
-			             $conn=c,
-			             $where=SMTP::IN_REPLY_TO]);
-		}
-	}
--- a/scripts/policy/protocols/ssh/detect-bruteforcing.bro
+++ b/scripts/policy/protocols/ssh/detect-bruteforcing.bro
@ -58,10 +58,6 @@ event bro_init()
 	                  	        $msg=fmt("%s appears to be guessing SSH passwords (seen in %d connections).", key$host, r$num),
 	                  	        $src=key$host,
 	                  	        $identifier=cat(key$host)]);
-	                  	# Insert the guesser into the intel framework.
-	                  	Intel::insert([$host=key$host,
-	                  	               $meta=[$source="local",
-	                  	                      $desc=fmt("Bro observed %d apparently failed SSH connections.", r$num)]]);
 	                  	}]);
 	}

--- a/scripts/test-all-policy.bro
+++ b/scripts/test-all-policy.bro
@ -14,18 +14,19 @@
 # @load frameworks/control/controller.bro
@load frameworks/dpd/detect-protocols.bro
@load frameworks/dpd/packet-segment-logging.bro
+@load frameworks/intel/do_notice.bro
+@load frameworks/intel/seen/__load__.bro
+@load frameworks/intel/seen/conn-established.bro
+@load frameworks/intel/seen/dns.bro
+@load frameworks/intel/seen/http-host-header.bro
+@load frameworks/intel/seen/http-url.bro
+@load frameworks/intel/seen/http-user-agents.bro
+@load frameworks/intel/seen/smtp-url-extraction.bro
+@load frameworks/intel/seen/smtp.bro
+@load frameworks/intel/seen/ssl.bro
+@load frameworks/intel/seen/where-locations.bro
@load frameworks/files/detect-MHR.bro
@load frameworks/files/hash-all-files.bro
-@load frameworks/intel/__load__.bro
-@load frameworks/intel/conn-established.bro
-@load frameworks/intel/dns.bro
-@load frameworks/intel/http-host-header.bro
-@load frameworks/intel/http-url.bro
-@load frameworks/intel/http-user-agents.bro
-@load frameworks/intel/smtp-url-extraction.bro
-@load frameworks/intel/smtp.bro
-@load frameworks/intel/ssl.bro
-@load frameworks/intel/where-locations.bro
@load frameworks/packet-filter/shunt.bro
@load frameworks/software/version-changes.bro
@load frameworks/software/vulnerable.bro
--- a/src/CMakeLists.txt
+++ b/src/CMakeLists.txt
@ -6,6 +6,9 @@ include_directories(BEFORE
 # This collects generated bif and pac files from subdirectories.
 set(bro_ALL_GENERATED_OUTPUTS  CACHE INTERNAL "automatically generated files" FORCE)

+# This collects bif inputs that we'll load automatically.
+set(bro_AUTO_BIFS CACHE INTERNAL "BIFs for automatic inclusion" FORCE)
+
 # If TRUE, use CMake's object libraries for sub-directories instead of
 # static libraries. This requires CMake >= 2.8.8.
 set(bro_HAVE_OBJECT_LIBRARIES FALSE)
@ -150,6 +153,7 @@ set(bro_PLUGIN_LIBS CACHE INTERNAL "plugin libraries" FORCE)

 add_subdirectory(analyzer)
 add_subdirectory(file_analysis)
+add_subdirectory(probabilistic)

 set(bro_SUBDIRS
    ${bro_SUBDIR_LIBS}
@ -383,8 +387,21 @@ set(BRO_EXE bro
    CACHE STRING "Bro executable binary" FORCE)

 # Target to create all the autogenerated files.
+add_custom_target(generate_outputs_stage1)
+add_dependencies(generate_outputs_stage1 ${bro_ALL_GENERATED_OUTPUTS})
+
+# Target to create the joint includes files that pull in the bif code.
+bro_bif_create_includes(generate_outputs_stage2 ${CMAKE_CURRENT_BINARY_DIR} "${bro_AUTO_BIFS}")
+add_dependencies(generate_outputs_stage2 generate_outputs_stage1)
+
+# Global target to trigger creation of autogenerated code.
 add_custom_target(generate_outputs)
-add_dependencies(generate_outputs ${bro_ALL_GENERATED_OUTPUTS})
+add_dependencies(generate_outputs generate_outputs_stage2)
+
+# Build __load__.bro files for standard *.bif.bro.
+bro_bif_create_loader(bif_loader ${CMAKE_BINARY_DIR}/scripts/base/bif)
+add_dependencies(bif_loader ${bro_SUBDIRS})
+add_dependencies(bro bif_loader)

 # Build __load__.bro files for plugins/*.bif.bro.
 bro_bif_create_loader(bif_loader_plugins ${CMAKE_BINARY_DIR}/scripts/base/bif/plugins)
--- a/src/Func.cc
+++ b/src/Func.cc
@ -560,6 +560,8 @@ void builtin_error(const char* msg, BroObj* arg)
 #include "reporter.bif.func_def"
 #include "strings.bif.func_def"

+#include "__all__.bif.cc" // Autogenerated for compiling in the bif_target() code.
+
 void init_builtin_funcs()
 	{
 	bro_resources = internal_type("bro_resources")->AsRecordType();
@ -574,6 +576,8 @@ void init_builtin_funcs()
 #include "reporter.bif.func_init"
 #include "strings.bif.func_init"

+#include "__all__.bif.init.cc" // Autogenerated for compiling in the bif_target() code.
+
 	did_builtin_init = true;
 	}

--- a/src/H3.h
+++ b/src/H3.h
@ -49,23 +49,61 @@
 //     hash a substring of the data.  Hashes of substrings can be bitwise-XOR'ed
 //     together to get the same result as hashing the full string.
 // Any number of hash functions can be created by creating new instances of H3,
-//     with the same or different template parameters.  The hash function is
-//     randomly generated using bro_random(); you must call init_random_seed()
-//     before the H3 constructor if you wish to seed it.
+//     with the same or different template parameters.  The hash function
+//     constructor takes a seed as argument which defaults to a call to
+//     bro_random().


 #ifndef H3_H
 #define H3_H

 #include <climits>
+#include <cstring>

 // The number of values representable by a byte.
 #define H3_BYTE_RANGE (UCHAR_MAX+1)

-template<class T, int N> class H3 {
-    T byte_lookup[N][H3_BYTE_RANGE];
+template <typename T, int N>
+class H3 {
 public:
-    H3();
+	H3()
+		{
+		Init(false, 0);
+		}
+
+	H3(T seed)
+		{
+		Init(true, seed);
+		}
+
+	void Init(bool have_seed, T seed)
+		{
+		T bit_lookup[N * CHAR_BIT];
+
+		for ( size_t bit = 0; bit < N * CHAR_BIT; bit++ )
+			{
+			bit_lookup[bit] = 0;
+			for ( size_t i = 0; i < sizeof(T)/2; i++ )
+				{
+				seed = have_seed ? bro_prng(seed) : bro_random();
+				// assume random() returns at least 16 random bits
+				bit_lookup[bit] = (bit_lookup[bit] << 16) | (seed & 0xFFFF);
+				}
+			}
+
+		for ( size_t byte = 0; byte < N; byte++ )
+			{
+			for ( unsigned val = 0; val < H3_BYTE_RANGE; val++ )
+				{
+				byte_lookup[byte][val] = 0;
+				for ( size_t bit = 0; bit < CHAR_BIT; bit++ )
+					// Does this mean byte_lookup[*][0] == 0? -RP
+					if (val & (1 << bit))
+						byte_lookup[byte][val] ^= bit_lookup[byte*CHAR_BIT+bit];
+				}
+			}
+		}
+
 	T operator()(const void* data, size_t size, size_t offset = 0) const
 		{
 		const unsigned char *p = static_cast<const unsigned char*>(data);
@ -73,7 +111,7 @@ public:

 		// loop optmized with Duff's Device
 		register unsigned n = (size + 7) / 8;
-	switch (size % 8) {
+		switch ( size % 8 ) {
 		case 0: do { result ^= byte_lookup[offset++][*p++];
 		case 7:      result ^= byte_lookup[offset++][*p++];
 		case 6:      result ^= byte_lookup[offset++][*p++];
@ -82,36 +120,24 @@ public:
 		case 3:      result ^= byte_lookup[offset++][*p++];
 		case 2:      result ^= byte_lookup[offset++][*p++];
 		case 1:      result ^= byte_lookup[offset++][*p++];
-		} while (--n > 0);
+				} while ( --n > 0 );
 			}

 		return result;
 		}
+
+	friend bool operator==(const H3& x, const H3& y)
+		{
+		return ! std::memcmp(x.byte_lookup, y.byte_lookup, N * H3_BYTE_RANGE);
+		}
+
+	friend bool operator!=(const H3& x, const H3& y)
+		{
+		return ! (x == y);
+		}
+
+private:
+	T byte_lookup[N][H3_BYTE_RANGE];
 };

-template<class T, int N>
-H3<T,N>::H3()
-{
-    T bit_lookup[N * CHAR_BIT];
-
-    for (size_t bit = 0; bit < N * CHAR_BIT; bit++) {
-	bit_lookup[bit] = 0;
-	for (size_t i = 0; i < sizeof(T)/2; i++) {
-	    // assume random() returns at least 16 random bits
-	    bit_lookup[bit] = (bit_lookup[bit] << 16) | (bro_random() & 0xFFFF);
-	}
-    }
-
-    for (size_t byte = 0; byte < N; byte++) {
-        for (unsigned val = 0; val < H3_BYTE_RANGE; val++) {
-            byte_lookup[byte][val] = 0;
-            for (size_t bit = 0; bit < CHAR_BIT; bit++) {
-		// Does this mean byte_lookup[*][0] == 0? -RP
-	        if (val & (1 << bit))
-		    byte_lookup[byte][val] ^= bit_lookup[byte*CHAR_BIT+bit];
-            }
-        }
-    }
-}
-
 #endif //H3_H
--- a/src/NetVar.cc
+++ b/src/NetVar.cc
@ -242,6 +242,7 @@ OpaqueType* md5_type;
 OpaqueType* sha1_type;
 OpaqueType* sha256_type;
 OpaqueType* entropy_type;
+OpaqueType* bloomfilter_type;

 #include "const.bif.netvar_def"
 #include "types.bif.netvar_def"
@ -307,6 +308,7 @@ void init_general_global_var()
 	sha1_type = new OpaqueType("sha1");
 	sha256_type = new OpaqueType("sha256");
 	entropy_type = new OpaqueType("entropy");
+	bloomfilter_type = new OpaqueType("bloomfilter");
 	}

 void init_net_var()
--- a/src/NetVar.h
+++ b/src/NetVar.h
@ -247,6 +247,7 @@ extern OpaqueType* md5_type;
 extern OpaqueType* sha1_type;
 extern OpaqueType* sha256_type;
 extern OpaqueType* entropy_type;
+extern OpaqueType* bloomfilter_type;

 // Initializes globals that don't pertain to network/event analysis.
 extern void init_general_global_var();
--- a/src/OpaqueVal.cc
+++ b/src/OpaqueVal.cc
@ -1,3 +1,5 @@
+// See the file "COPYING" in the main distribution directory for copyright.
+
 #include "OpaqueVal.h"
 #include "NetVar.h"
 #include "Reporter.h"
@ -515,3 +517,152 @@ bool EntropyVal::DoUnserialize(UnserialInfo* info)

 	return true;
 	}
+
+BloomFilterVal::BloomFilterVal()
+	: OpaqueVal(bloomfilter_type)
+	{
+	type = 0;
+	hash = 0;
+	bloom_filter = 0;
+	}
+
+BloomFilterVal::BloomFilterVal(OpaqueType* t)
+	: OpaqueVal(t)
+	{
+	type = 0;
+	hash = 0;
+	bloom_filter = 0;
+	}
+
+BloomFilterVal::BloomFilterVal(probabilistic::BloomFilter* bf)
+	: OpaqueVal(bloomfilter_type)
+	{
+	type = 0;
+	hash = 0;
+	bloom_filter = bf;
+	}
+
+bool BloomFilterVal::Typify(BroType* arg_type)
+	{
+	if ( type )
+		return false;
+
+	type = arg_type;
+	type->Ref();
+
+	TypeList* tl = new TypeList(type);
+	tl->Append(type);
+	hash = new CompositeHash(tl);
+	Unref(tl);
+
+	return true;
+	}
+
+BroType* BloomFilterVal::Type() const
+	{
+	return type;
+	}
+
+void BloomFilterVal::Add(const Val* val)
+	{
+	HashKey* key = hash->ComputeHash(val, 1);
+	bloom_filter->Add(key->Hash());
+	delete key;
+	}
+
+size_t BloomFilterVal::Count(const Val* val) const
+	{
+	HashKey* key = hash->ComputeHash(val, 1);
+	size_t cnt = bloom_filter->Count(key->Hash());
+	delete key;
+	return cnt;
+	}
+
+void BloomFilterVal::Clear()
+	{
+	bloom_filter->Clear();
+	}
+
+bool BloomFilterVal::Empty() const
+	{
+	return bloom_filter->Empty();
+	}
+
+BloomFilterVal* BloomFilterVal::Merge(const BloomFilterVal* x,
+				      const BloomFilterVal* y)
+	{
+	if ( ! same_type(x->Type(), y->Type()) )
+		{
+		reporter->Error("cannot merge Bloom filters with different types");
+		return 0;
+		}
+
+	if ( typeid(*x->bloom_filter) != typeid(*y->bloom_filter) )
+		{
+		reporter->Error("cannot merge different Bloom filter types");
+		return 0;
+		}
+
+	probabilistic::BloomFilter* copy = x->bloom_filter->Clone();
+
+	if ( ! copy->Merge(y->bloom_filter) )
+		{
+		reporter->Error("failed to merge Bloom filter");
+		return 0;
+		}
+
+	BloomFilterVal* merged = new BloomFilterVal(copy);
+
+	if ( ! merged->Typify(x->Type()) )
+		{
+		reporter->Error("failed to set type on merged Bloom filter");
+		return 0;
+		}
+
+	return merged;
+	}
+
+BloomFilterVal::~BloomFilterVal()
+	{
+	Unref(type);
+	delete hash;
+	delete bloom_filter;
+	}
+
+IMPLEMENT_SERIAL(BloomFilterVal, SER_BLOOMFILTER_VAL);
+
+bool BloomFilterVal::DoSerialize(SerialInfo* info) const
+	{
+	DO_SERIALIZE(SER_BLOOMFILTER_VAL, OpaqueVal);
+
+	bool is_typed = (type != 0);
+
+	if ( ! SERIALIZE(is_typed) )
+		return false;
+
+	if ( is_typed && ! type->Serialize(info) )
+		return false;
+
+	return bloom_filter->Serialize(info);
+	}
+
+bool BloomFilterVal::DoUnserialize(UnserialInfo* info)
+	{
+	DO_UNSERIALIZE(OpaqueVal);
+
+	bool is_typed;
+	if ( ! UNSERIALIZE(&is_typed) )
+		return false;
+
+	if ( is_typed )
+		{
+		BroType* t = BroType::Unserialize(info);
+		if ( ! Typify(t) )
+			return false;
+
+		Unref(t);
+		}
+
+	bloom_filter = probabilistic::BloomFilter::Unserialize(info);
+	return bloom_filter != 0;
+	}
--- a/src/OpaqueVal.h
+++ b/src/OpaqueVal.h
@ -3,10 +3,18 @@
 #ifndef OPAQUEVAL_H
 #define OPAQUEVAL_H

+#include <typeinfo>
+
 #include "RandTest.h"
 #include "Val.h"
 #include "digest.h"

+#include "probabilistic/BloomFilter.h"
+
+namespace probabilistic {
+	class BloomFilter;
+}
+
 class HashVal : public OpaqueVal {
 public:
 	virtual bool IsValid() const;
@ -107,4 +115,37 @@ private:
 	RandTest state;
 };

+class BloomFilterVal : public OpaqueVal {
+public:
+	explicit BloomFilterVal(probabilistic::BloomFilter* bf);
+	virtual ~BloomFilterVal();
+
+	BroType* Type() const;
+	bool Typify(BroType* type);
+
+	void Add(const Val* val);
+	size_t Count(const Val* val) const;
+	void Clear();
+	bool Empty() const;
+
+	static BloomFilterVal* Merge(const BloomFilterVal* x,
+				     const BloomFilterVal* y);
+
+protected:
+	friend class Val;
+	BloomFilterVal();
+	BloomFilterVal(OpaqueType* t);
+
+	DECLARE_SERIAL(BloomFilterVal);
+
+private:
+	// Disable.
+	BloomFilterVal(const BloomFilterVal&);
+	BloomFilterVal& operator=(const BloomFilterVal&);
+
+	BroType* type;
+	CompositeHash* hash;
+	probabilistic::BloomFilter* bloom_filter;
+	};
+
 #endif
--- a/src/PktSrc.cc
+++ b/src/PktSrc.cc
@ -77,6 +77,12 @@ int PktSrc::ExtractNextPacket()

 	data = last_data = pcap_next(pd, &hdr);

+	if ( data && (hdr.len == 0 || hdr.caplen == 0) )
+		{
+		sessions->Weird("empty_pcap_header", &hdr, data);
+		return 0;
+		}
+
 	if ( data )
 		next_timestamp = hdr.ts.tv_sec + double(hdr.ts.tv_usec) / 1e6;

--- a/src/SerialTypes.h
+++ b/src/SerialTypes.h
@ -49,6 +49,10 @@ SERIAL_IS(STATE_ACCESS, 0x1100)
 SERIAL_IS_BO(CASE, 0x1200)
 SERIAL_IS(LOCATION, 0x1300)
 SERIAL_IS(RE_MATCHER, 0x1400)
+SERIAL_IS(BITVECTOR, 0x1500)
+SERIAL_IS(COUNTERVECTOR, 0x1600)
+SERIAL_IS(BLOOMFILTER, 0x1700)
+SERIAL_IS(HASHER, 0x1800)

 // These are the externally visible types.
 const SerialType SER_NONE = 0;
@ -104,6 +108,7 @@ SERIAL_VAL(MD5_VAL, 16)
 SERIAL_VAL(SHA1_VAL, 17)
 SERIAL_VAL(SHA256_VAL, 18)
 SERIAL_VAL(ENTROPY_VAL, 19)
+SERIAL_VAL(BLOOMFILTER_VAL, 20)

 #define SERIAL_EXPR(name, val) SERIAL_CONST(name, val, EXPR)
 SERIAL_EXPR(EXPR, 1)
@ -197,10 +202,22 @@ SERIAL_FUNC(BRO_FUNC, 2)
 SERIAL_FUNC(DEBUG_FUNC, 3)
 SERIAL_FUNC(BUILTIN_FUNC, 4)

+#define SERIAL_BLOOMFILTER(name, val) SERIAL_CONST(name, val, BLOOMFILTER)
+SERIAL_BLOOMFILTER(BLOOMFILTER, 1)
+SERIAL_BLOOMFILTER(BASICBLOOMFILTER, 2)
+SERIAL_BLOOMFILTER(COUNTINGBLOOMFILTER, 3)
+
+#define SERIAL_HASHER(name, val) SERIAL_CONST(name, val, HASHER)
+SERIAL_HASHER(HASHER, 1)
+SERIAL_HASHER(DEFAULTHASHER, 2)
+SERIAL_HASHER(DOUBLEHASHER, 3)
+
 SERIAL_CONST2(ID)
 SERIAL_CONST2(STATE_ACCESS)
 SERIAL_CONST2(CASE)
 SERIAL_CONST2(LOCATION)
 SERIAL_CONST2(RE_MATCHER)
+SERIAL_CONST2(BITVECTOR)
+SERIAL_CONST2(COUNTERVECTOR)

 #endif
--- a/src/Type.cc
+++ b/src/Type.cc
@ -1311,19 +1311,20 @@ IMPLEMENT_SERIAL(OpaqueType, SER_OPAQUE_TYPE);
 bool OpaqueType::DoSerialize(SerialInfo* info) const
 	{
 	DO_SERIALIZE(SER_OPAQUE_TYPE, BroType);
-	return SERIALIZE(name);
+	return SERIALIZE_STR(name.c_str(), name.size());
 	}

 bool OpaqueType::DoUnserialize(UnserialInfo* info)
 	{
 	DO_UNSERIALIZE(BroType);

-	char const* n;
+	const char* n;
 	if ( ! UNSERIALIZE_STR(&n, 0) )
 		return false;

 	name = n;
 	delete [] n;
+
 	return true;
 	}

--- a/src/analyzer/Manager.cc
+++ b/src/analyzer/Manager.cc
@ -103,7 +103,6 @@ void Manager::InitPreScript()

 void Manager::InitPostScript()
 	{
-	#include "analyzer.bif.init.cc"
 	}

 void Manager::DumpDebug()
--- a/src/bro.bif
+++ b/src/bro.bif
@ -4975,4 +4975,3 @@ function anonymize_addr%(a: addr, cl: IPAddrAnonymizationClass%): addr
 			(enum ip_addr_anonymization_class_t) anon_class));
 		}
 	%}
-
--- a/src/file_analysis/File.cc
+++ b/src/file_analysis/File.cc
@ -100,6 +100,7 @@ File::~File()
 	{
 	DBG_LOG(DBG_FILE_ANALYSIS, "Destroying File object %s", id.c_str());
 	Unref(val);
+
 	// Queue may not be empty in the case where only content gaps were seen.
 	while ( ! fonc_queue.empty() )
 		{
--- a/src/file_analysis/Manager.cc
+++ b/src/file_analysis/Manager.cc
@ -60,7 +60,6 @@ void Manager::RegisterAnalyzerComponent(Component* component)

 void Manager::InitPostScript()
 	{
-	#include "file_analysis.bif.init.cc"
 	}

 void Manager::Terminate()
--- a/src/probabilistic/BitVector.cc
+++ b/src/probabilistic/BitVector.cc
@ -0,0 +1,578 @@
+// See the file "COPYING" in the main distribution directory for copyright.
+
+#include "BitVector.h"
+
+#include <cassert>
+#include <limits>
+#include "Serializer.h"
+
+using namespace probabilistic;
+
+BitVector::size_type BitVector::npos = static_cast<BitVector::size_type>(-1);
+BitVector::block_type BitVector::bits_per_block =
+	std::numeric_limits<BitVector::block_type>::digits;
+
+namespace {
+
+uint8_t count_table[] = {
+	0, 1, 1, 2, 1, 2, 2, 3, 1, 2, 2, 3, 2, 3, 3, 4, 1, 2, 2, 3, 2, 3, 3, 4, 2,
+	3, 3, 4, 3, 4, 4, 5, 1, 2, 2, 3, 2, 3, 3, 4, 2, 3, 3, 4, 3, 4, 4, 5, 2, 3,
+	3, 4, 3, 4, 4, 5, 3, 4, 4, 5, 4, 5, 5, 6, 1, 2, 2, 3, 2, 3, 3, 4, 2, 3, 3,
+	4, 3, 4, 4, 5, 2, 3, 3, 4, 3, 4, 4, 5, 3, 4, 4, 5, 4, 5, 5, 6, 2, 3, 3, 4,
+	3, 4, 4, 5, 3, 4, 4, 5, 4, 5, 5, 6, 3, 4, 4, 5, 4, 5, 5, 6, 4, 5, 5, 6, 5,
+	6, 6, 7, 1, 2, 2, 3, 2, 3, 3, 4, 2, 3, 3, 4, 3, 4, 4, 5, 2, 3, 3, 4, 3, 4,
+	4, 5, 3, 4, 4, 5, 4, 5, 5, 6, 2, 3, 3, 4, 3, 4, 4, 5, 3, 4, 4, 5, 4, 5, 5,
+	6, 3, 4, 4, 5, 4, 5, 5, 6, 4, 5, 5, 6, 5, 6, 6, 7, 2, 3, 3, 4, 3, 4, 4, 5,
+	3, 4, 4, 5, 4, 5, 5, 6, 3, 4, 4, 5, 4, 5, 5, 6, 4, 5, 5, 6, 5, 6, 6, 7, 3,
+	4, 4, 5, 4, 5, 5, 6, 4, 5, 5, 6, 5, 6, 6, 7, 4, 5, 5, 6, 5, 6, 6, 7, 5, 6,
+	6, 7, 6, 7, 7, 8
+};
+
+} // namespace <anonymous>
+
+BitVector::Reference::Reference(block_type& block, block_type i)
+	: block(block), mask((block_type(1) << i))
+	{
+	assert(i < bits_per_block);
+	}
+
+BitVector::Reference& BitVector::Reference::Flip()
+	{
+	block ^= mask;
+	return *this;
+	}
+
+BitVector::Reference::operator bool() const
+	{
+	return (block & mask) != 0;
+	}
+
+bool BitVector::Reference::operator~() const
+	{
+	return (block & mask) == 0;
+	}
+
+BitVector::Reference& BitVector::Reference::operator=(bool x)
+	{
+	if ( x )
+		block |= mask;
+	else
+		block &= ~mask;
+
+	return *this;
+	}
+
+BitVector::Reference& BitVector::Reference::operator=(const Reference& other)
+	{
+	if ( other )
+		block |= mask;
+	else
+		block &= ~mask;
+
+	return *this;
+	}
+
+BitVector::Reference& BitVector::Reference::operator|=(bool x)
+	{
+	if ( x )
+		block |= mask;
+
+	return *this;
+	}
+
+BitVector::Reference& BitVector::Reference::operator&=(bool x)
+	{
+	if ( ! x )
+		block &= ~mask;
+
+	return *this;
+	}
+
+BitVector::Reference& BitVector::Reference::operator^=(bool x)
+	{
+	if ( x )
+		block ^= mask;
+
+	return *this;
+	}
+
+BitVector::Reference& BitVector::Reference::operator-=(bool x)
+	{
+	if ( x )
+		block &= ~mask;
+
+	return *this;
+	}
+
+BitVector::BitVector()
+	{
+	num_bits = 0;
+	}
+
+BitVector::BitVector(size_type size, bool value)
+	: bits(bits_to_blocks(size), value ? ~block_type(0) : 0)
+	{
+	num_bits = size;
+	}
+
+BitVector::BitVector(BitVector const& other)
+	: bits(other.bits)
+	{
+	num_bits = other.num_bits;
+	}
+
+BitVector BitVector::operator~() const
+	{
+	BitVector b(*this);
+	b.Flip();
+	return b;
+	}
+
+BitVector& BitVector::operator=(BitVector const& other)
+	{
+	bits = other.bits;
+	return *this;
+	}
+
+BitVector BitVector::operator<<(size_type n) const
+	{
+	BitVector b(*this);
+	return b <<= n;
+	}
+
+BitVector BitVector::operator>>(size_type n) const
+	{
+	BitVector b(*this);
+	return b >>= n;
+	}
+
+BitVector& BitVector::operator<<=(size_type n)
+	{
+	if ( n >= num_bits )
+		return Reset();
+
+	if ( n > 0 )
+		{
+		size_type last = Blocks() - 1;
+		size_type div = n / bits_per_block;
+		block_type r = bit_index(n);
+		block_type* b = &bits[0];
+
+		assert(Blocks() >= 1);
+		assert(div <= last);
+
+		if ( r != 0 )
+			{
+			for ( size_type i = last - div; i > 0; --i )
+				b[i + div] = (b[i] << r) | (b[i - 1] >> (bits_per_block - r));
+
+			b[div] = b[0] << r;
+			}
+
+		else
+			{
+			for (size_type i = last-div; i > 0; --i)
+				b[i + div] = b[i];
+
+			b[div] = b[0];
+			}
+
+		std::fill_n(b, div, block_type(0));
+		zero_unused_bits();
+		}
+
+	return *this;
+	}
+
+BitVector& BitVector::operator>>=(size_type n)
+	{
+	if ( n >= num_bits )
+		return Reset();
+
+	if ( n > 0 )
+		{
+		size_type last = Blocks() - 1;
+		size_type div = n / bits_per_block;
+		block_type r = bit_index(n);
+		block_type* b = &bits[0];
+
+		assert(Blocks() >= 1);
+		assert(div <= last);
+
+		if ( r != 0 )
+			{
+			for (size_type i = last - div; i > 0; --i)
+				b[i - div] = (b[i] >> r) | (b[i + 1] << (bits_per_block - r));
+
+			b[last - div] = b[last] >> r;
+			}
+
+		else
+			{
+			for (size_type i = div; i <= last; ++i)
+				b[i-div] = b[i];
+			}
+
+		std::fill_n(b + (Blocks() - div), div, block_type(0));
+		}
+
+	return *this;
+	}
+
+BitVector& BitVector::operator&=(BitVector const& other)
+	{
+	assert(Size() >= other.Size());
+
+	for ( size_type i = 0; i < Blocks(); ++i )
+		bits[i] &= other.bits[i];
+
+	return *this;
+	}
+
+BitVector& BitVector::operator|=(BitVector const& other)
+	{
+	assert(Size() >= other.Size());
+
+	for ( size_type i = 0; i < Blocks(); ++i )
+		bits[i] |= other.bits[i];
+
+	return *this;
+	}
+
+BitVector& BitVector::operator^=(BitVector const& other)
+	{
+	assert(Size() >= other.Size());
+
+	for ( size_type i = 0; i < Blocks(); ++i )
+		bits[i] ^= other.bits[i];
+
+	return *this;
+	}
+
+BitVector& BitVector::operator-=(BitVector const& other)
+	{
+	assert(Size() >= other.Size());
+
+	for ( size_type i = 0; i < Blocks(); ++i )
+		bits[i] &= ~other.bits[i];
+
+	return *this;
+	}
+
+namespace probabilistic {
+
+BitVector operator&(BitVector const& x, BitVector const& y)
+	{
+	BitVector b(x);
+	return b &= y;
+	}
+
+BitVector operator|(BitVector const& x, BitVector const& y)
+	{
+	BitVector b(x);
+	return b |= y;
+	}
+
+BitVector operator^(BitVector const& x, BitVector const& y)
+	{
+	BitVector b(x);
+	return b ^= y;
+	}
+
+BitVector operator-(BitVector const& x, BitVector const& y)
+	{
+	BitVector b(x);
+	return b -= y;
+	}
+
+bool operator==(BitVector const& x, BitVector const& y)
+	{
+	return x.num_bits == y.num_bits && x.bits == y.bits;
+	}
+
+bool operator!=(BitVector const& x, BitVector const& y)
+	{
+	return ! (x == y);
+	}
+
+bool operator<(BitVector const& x, BitVector const& y)
+	{
+	assert(x.Size() == y.Size());
+
+	for ( BitVector::size_type r = x.Blocks(); r > 0; --r )
+		{
+		BitVector::size_type i = r - 1;
+
+		if ( x.bits[i] < y.bits[i] )
+			return true;
+
+		else if ( x.bits[i] > y.bits[i] )
+			return false;
+
+		}
+
+	return false;
+	}
+
+}
+
+void BitVector::Resize(size_type n, bool value)
+	{
+	size_type old = Blocks();
+	size_type required = bits_to_blocks(n);
+	block_type block_value = value ? ~block_type(0) : block_type(0);
+
+	if ( required != old )
+		bits.resize(required, block_value);
+
+	if ( value && (n > num_bits) && extra_bits() )
+		bits[old - 1] |= (block_value << extra_bits());
+
+	num_bits = n;
+	zero_unused_bits();
+	}
+
+void BitVector::Clear()
+	{
+	bits.clear();
+	num_bits = 0;
+	}
+
+void BitVector::PushBack(bool bit)
+	{
+	size_type s = Size();
+	Resize(s + 1);
+	Set(s, bit);
+	}
+
+void BitVector::Append(block_type block)
+	{
+	size_type excess = extra_bits();
+
+	if ( excess )
+		{
+		assert(! Empty());
+		bits.push_back(block >> (bits_per_block - excess));
+		bits[Blocks() - 2] |= (block << excess);
+		}
+
+	else
+		{
+		bits.push_back(block);
+		}
+
+	num_bits += bits_per_block;
+	}
+
+BitVector& BitVector::Set(size_type i, bool bit)
+	{
+	assert(i < num_bits);
+
+	if ( bit )
+		bits[block_index(i)] |= bit_mask(i);
+	else
+		Reset(i);
+
+	return *this;
+	}
+
+BitVector& BitVector::Set()
+	{
+	std::fill(bits.begin(), bits.end(), ~block_type(0));
+	zero_unused_bits();
+	return *this;
+	}
+
+BitVector& BitVector::Reset(size_type i)
+	{
+	assert(i < num_bits);
+	bits[block_index(i)] &= ~bit_mask(i);
+	return *this;
+	}
+
+BitVector& BitVector::Reset()
+	{
+	std::fill(bits.begin(), bits.end(), block_type(0));
+	return *this;
+	}
+
+BitVector& BitVector::Flip(size_type i)
+	{
+	assert(i < num_bits);
+	bits[block_index(i)] ^= bit_mask(i);
+	return *this;
+	}
+
+BitVector& BitVector::Flip()
+	{
+	for (size_type i = 0; i < Blocks(); ++i)
+		bits[i] = ~bits[i];
+
+	zero_unused_bits();
+	return *this;
+	}
+
+bool BitVector::operator[](size_type i) const
+	{
+	assert(i < num_bits);
+	return (bits[block_index(i)] & bit_mask(i)) != 0;
+	}
+
+BitVector::Reference BitVector::operator[](size_type i)
+	{
+	assert(i < num_bits);
+	return Reference(bits[block_index(i)], bit_index(i));
+	}
+
+BitVector::size_type BitVector::Count() const
+	{
+	std::vector<block_type>::const_iterator first = bits.begin();
+	size_t n = 0;
+	size_type length = Blocks();
+
+	while ( length )
+		{
+		block_type block = *first;
+
+		while ( block )
+			{
+			// TODO: use _popcnt if available.
+			n += count_table[block & ((1u << 8) - 1)];
+			block >>= 8;
+			}
+
+		++first;
+		--length;
+		}
+
+	return n;
+	}
+
+BitVector::size_type BitVector::Blocks() const
+	{
+	return bits.size();
+	}
+
+BitVector::size_type BitVector::Size() const
+	{
+	return num_bits;
+	}
+
+bool BitVector::Empty() const
+	{
+	return bits.empty();
+	}
+
+bool BitVector::AllZero() const
+	{
+	for ( size_t i = 0; i < bits.size(); ++i )
+		{
+		if ( bits[i] )
+			return false;
+		}
+
+	return true;
+	}
+
+BitVector::size_type BitVector::FindFirst() const
+	{
+	return find_from(0);
+	}
+
+BitVector::size_type BitVector::FindNext(size_type i) const
+	{
+	if ( i >= (Size() - 1) || Size() == 0 )
+		return npos;
+
+	++i;
+	size_type bi = block_index(i);
+	block_type block = bits[bi] & (~block_type(0) << bit_index(i));
+	return block ? bi * bits_per_block + lowest_bit(block) : find_from(bi + 1);
+	}
+
+BitVector::size_type BitVector::lowest_bit(block_type block)
+	{
+	block_type x = block - (block & (block - 1));
+	size_type log = 0;
+
+	while (x >>= 1)
+		++log;
+
+	return log;
+	}
+
+BitVector::block_type BitVector::extra_bits() const
+	{
+	return bit_index(Size());
+	}
+
+void BitVector::zero_unused_bits()
+	{
+	if ( extra_bits() )
+		bits.back() &= ~(~block_type(0) << extra_bits());
+	}
+
+BitVector::size_type BitVector::find_from(size_type i) const
+	{
+	while (i < Blocks() && bits[i] == 0)
+		++i;
+
+	if ( i >= Blocks() )
+		return npos;
+
+	return i * bits_per_block + lowest_bit(bits[i]);
+	}
+
+bool BitVector::Serialize(SerialInfo* info) const
+	{
+	return SerialObj::Serialize(info);
+	}
+
+BitVector* BitVector::Unserialize(UnserialInfo* info)
+	{
+	return reinterpret_cast<BitVector*>(SerialObj::Unserialize(info, SER_BITVECTOR));
+	}
+
+IMPLEMENT_SERIAL(BitVector, SER_BITVECTOR);
+
+bool BitVector::DoSerialize(SerialInfo* info) const
+	{
+	DO_SERIALIZE(SER_BITVECTOR, SerialObj);
+
+	if ( ! SERIALIZE(static_cast<uint64>(bits.size())) )
+		return false;
+
+	for ( size_t i = 0; i < bits.size(); ++i )
+		if ( ! SERIALIZE(static_cast<uint64>(bits[i])) )
+			return false;
+
+	return SERIALIZE(static_cast<uint64>(num_bits));
+	}
+
+bool BitVector::DoUnserialize(UnserialInfo* info)
+	{
+	DO_UNSERIALIZE(SerialObj);
+
+	uint64 size;
+	if ( ! UNSERIALIZE(&size) )
+		return false;
+
+	bits.resize(static_cast<size_t>(size));
+
+	for ( size_t i = 0; i < bits.size(); ++i )
+		{
+		uint64 block;
+		if ( ! UNSERIALIZE(&block) )
+			return false;
+
+		bits[i] = static_cast<block_type>(block);
+		}
+
+	uint64 n;
+	if ( ! UNSERIALIZE(&n) )
+		return false;
+
+	num_bits = static_cast<size_type>(n);
+
+	return true;
+	}
--- a/src/probabilistic/BitVector.h
+++ b/src/probabilistic/BitVector.h
@ -0,0 +1,370 @@
+// See the file "COPYING" in the main distribution directory for copyright.
+
+#ifndef PROBABILISTIC_BITVECTOR_H
+#define PROBABILISTIC_BITVECTOR_H
+
+#include <iterator>
+#include <vector>
+
+#include "SerialObj.h"
+
+namespace probabilistic {
+
+/**
+ * A vector of bits.
+ */
+class BitVector : public SerialObj {
+public:
+	typedef size_t block_type;
+	typedef size_t size_type;
+	typedef bool const_reference;
+
+	static size_type npos;
+	static block_type bits_per_block;
+
+	/**
+	 * An lvalue proxy for individual bits.
+	 */
+	class Reference {
+	public:
+		/**
+		 * Inverts the bits' values.
+		 */
+		Reference& Flip();
+
+		operator bool() const;
+		bool operator~() const;
+		Reference& operator=(bool x);
+		Reference& operator=(const Reference& other);
+		Reference& operator|=(bool x);
+		Reference& operator&=(bool x);
+		Reference& operator^=(bool x);
+		Reference& operator-=(bool x);
+
+	private:
+		friend class BitVector;
+
+		Reference(block_type& block, block_type i);
+		void operator&();
+
+		block_type& block;
+		const block_type mask;
+	};
+
+	/**
+	 * Default-constructs an empty bit vector.
+	 */
+	BitVector();
+
+	/**
+	 * Constructs a bit vector of a given size.
+	 * @param size The number of bits.
+	 * @param value The value for each bit.
+	 */
+	explicit BitVector(size_type size, bool value = false);
+
+	/**
+	 * Constructs a bit vector from a sequence of blocks.
+	 *
+	 * @param first Start of range
+	 * @param last End of range.
+	 *
+	 */
+	template <typename InputIterator>
+	BitVector(InputIterator first, InputIterator last)
+		{
+		bits.insert(bits.end(), first, last);
+		num_bits = bits.size() * bits_per_block;
+		}
+
+	/**
+	 * Copy-constructs a bit vector.
+	 * @param other The bit vector to copy.
+	 */
+	BitVector(const BitVector& other);
+
+	/**
+	 * Assigns another bit vector to this instance.
+	 * @param other The RHS of the assignment.
+	 */
+	BitVector& operator=(const BitVector& other);
+
+	//
+	// Bitwise operations.
+	//
+	BitVector operator~() const;
+	BitVector operator<<(size_type n) const;
+	BitVector operator>>(size_type n) const;
+	BitVector& operator<<=(size_type n);
+	BitVector& operator>>=(size_type n);
+	BitVector& operator&=(BitVector const& other);
+	BitVector& operator|=(BitVector const& other);
+	BitVector& operator^=(BitVector const& other);
+	BitVector& operator-=(BitVector const& other);
+	friend BitVector operator&(BitVector const& x, BitVector const& y);
+	friend BitVector operator|(BitVector const& x, BitVector const& y);
+	friend BitVector operator^(BitVector const& x, BitVector const& y);
+	friend BitVector operator-(BitVector const& x, BitVector const& y);
+
+	//
+	// Relational operators
+	//
+	friend bool operator==(BitVector const& x, BitVector const& y);
+	friend bool operator!=(BitVector const& x, BitVector const& y);
+	friend bool operator<(BitVector const& x, BitVector const& y);
+
+	//
+	// Basic operations
+	//
+
+	/** Appends the bits in a sequence of values.
+	 * @tparam Iterator A forward iterator.
+	 * @param first An iterator pointing to the first element of the sequence.
+	 * @param last An iterator pointing to one past the last element of the
+	 * sequence.
+	 */
+	template <typename ForwardIterator>
+	void Append(ForwardIterator first, ForwardIterator last)
+		{
+		if ( first == last )
+			return;
+
+		block_type excess = extra_bits();
+		typename std::iterator_traits<ForwardIterator>::difference_type delta =
+			std::distance(first, last);
+
+		bits.reserve(Blocks() + delta);
+
+		if ( excess == 0 )
+			{
+			bits.back() |= (*first << excess);
+
+			do {
+				block_type b = *first++ >> (bits_per_block - excess);
+				bits.push_back(b | (first == last ? 0 : *first << excess));
+			} while (first != last);
+
+			}
+
+		else
+			bits.insert(bits.end(), first, last);
+
+		num_bits += bits_per_block * delta;
+		}
+
+	/**
+	 * Appends the bits in a given block.
+	 * @param block The block containing bits to append.
+	 */
+	void Append(block_type block);
+
+	/** Appends a single bit to the end of the bit vector.
+	 * @param bit The value of the bit.
+	 */
+	void PushBack(bool bit);
+
+	/**
+	 * Clears all bits in the bitvector.
+	 */
+	void Clear();
+
+	/**
+	 * Resizes the bit vector to a new number of bits.
+	 * @param n The new number of bits of the bit vector.
+	 * @param value The bit value of new values, if the vector expands.
+	 */
+	void Resize(size_type n, bool value = false);
+
+	/**
+	 * Sets a bit at a specific position to a given value.
+	 * @param i The bit position.
+	 * @param bit The value assigned to position *i*.
+	 * @return A reference to the bit vector instance.
+	 */
+	BitVector& Set(size_type i, bool bit = true);
+
+	/**
+	 * Sets all bits to 1.
+	 * @return A reference to the bit vector instance.
+	 */
+	BitVector& Set();
+
+	/**
+	 * Resets a bit at a specific position, i.e., sets it to 0.
+	 * @param i The bit position.
+	 * @return A reference to the bit vector instance.
+	 */
+	BitVector& Reset(size_type i);
+
+	/**
+	 * Sets all bits to 0.
+	 * @return A reference to the bit vector instance.
+	 */
+	BitVector& Reset();
+
+	/**
+	 * Toggles/flips a bit at a specific position.
+	 * @param i The bit position.
+	 * @return A reference to the bit vector instance.
+	 */
+	BitVector& Flip(size_type i);
+
+	/**
+	 * Computes the complement.
+	 * @return A reference to the bit vector instance.
+	 */
+	BitVector& Flip();
+
+	/** Retrieves a single bit.
+	 * @param i The bit position.
+	 * @return A mutable reference to the bit at position *i*.
+	 */
+	Reference operator[](size_type i);
+
+	/**
+	 * Retrieves a single bit.
+	 * @param i The bit position.
+	 * @return A const-reference to the bit at position *i*.
+	 */
+	const_reference operator[](size_type i) const;
+
+	/**
+	 * Counts the number of 1-bits in the bit vector. Also known as *population
+	 * count* or *Hamming weight*.
+	 * @return The number of bits set to 1.
+	 */
+	size_type Count() const;
+
+	/**
+	 * Retrieves the number of blocks of the underlying storage.
+	 * @param The number of blocks that represent `Size()` bits.
+	 */
+	size_type Blocks() const;
+
+	/**
+	 * Retrieves the number of bits the bitvector consist of.
+	 * @return The length of the bit vector in bits.
+	 */
+	size_type Size() const;
+
+	/**
+	 * Checks whether the bit vector is empty.
+	 * @return `true` iff the bitvector has zero length.
+	 */
+	bool Empty() const;
+
+	/**
+	 * Checks whether all bits are 0.
+	 * @return `true` iff all bits in all blocks are 0.
+	 */
+	bool AllZero() const;
+
+	/**
+	 * Finds the bit position of of the first 1-bit.
+	 * @return The position of the first bit that equals to one or `npos` if no
+	 * such bit exists.
+	 */
+	size_type FindFirst() const;
+
+	/**
+	 * Finds the next 1-bit from a given starting position.
+	 *
+	 * @param i The index where to start looking.
+	 *
+	 * @return The position of the first bit that equals to 1 after position
+	 * *i*  or `npos` if no such bit exists.
+	 */
+	size_type FindNext(size_type i) const;
+
+	/**
+	 * Serializes the bit vector.
+	 *
+	 * @param info The serializaton informationt to use.
+	 *
+	 * @return True if successful.
+	 */
+	bool Serialize(SerialInfo* info) const;
+
+	/**
+	 * Unserialize the bit vector.
+	 *
+	 * @param info The serializaton informationt to use.
+	 *
+	 * @return The unserialized bit vector, or null if an error occured.
+	 */
+	static BitVector* Unserialize(UnserialInfo* info);
+
+protected:
+	DECLARE_SERIAL(BitVector);
+
+private:
+	/**
+	 * Computes the number of excess/unused bits in the bit vector.
+	 */
+	block_type extra_bits() const;
+
+	/**
+	 * If the number of bits in the vector are not not a multiple of
+	 * bitvector::bits_per_block, then the last block exhibits unused bits which
+	 * this function resets.
+	 */
+	void zero_unused_bits();
+
+	/**
+	 * Looks for the first 1-bit starting at a given position.
+	 * @param i The block index to start looking.
+	 * @return The block index of the first 1-bit starting from *i* or
+	 * `bitvector::npos` if no 1-bit exists.
+	 */
+	size_type find_from(size_type i) const;
+
+	/**
+	 * Computes the block index for a given bit position.
+	 */
+	static size_type block_index(size_type i)
+		{
+		return i / bits_per_block;
+		}
+
+	/**
+	 * Computes the bit index within a given block for a given bit position.
+	 */
+	static block_type bit_index(size_type i)
+		{
+		return i % bits_per_block;
+		}
+
+	/**
+	 * Computes the bitmask block to extract a bit a given bit position.
+	 */
+	static block_type bit_mask(size_type i)
+		{
+		return block_type(1) << bit_index(i);
+		}
+
+	/**
+	 * Computes the number of blocks needed to represent a given number of
+	 * bits.
+	 * @param bits the number of bits.
+	 * @return The number of blocks to represent *bits* number of bits.
+	 */
+	static size_type bits_to_blocks(size_type bits)
+		{
+		return bits / bits_per_block
+			+ static_cast<size_type>(bits % bits_per_block != 0);
+		}
+
+	/**
+	 * Computes the bit position first 1-bit in a given block.
+	 * @param block The block to inspect.
+	 * @return The bit position where *block* has its first bit set to 1.
+	 */
+	static size_type lowest_bit(block_type block);
+
+	std::vector<block_type> bits;
+	size_type num_bits;
+};
+
+}
+
+#endif
--- a/src/probabilistic/BloomFilter.cc
+++ b/src/probabilistic/BloomFilter.cc
@ -0,0 +1,244 @@
+// See the file "COPYING" in the main distribution directory for copyright.
+
+#include <typeinfo>
+#include <cmath>
+#include <limits>
+
+#include "BloomFilter.h"
+
+#include "CounterVector.h"
+#include "Serializer.h"
+
+using namespace probabilistic;
+
+BloomFilter::BloomFilter()
+	{
+	hasher = 0;
+	}
+
+BloomFilter::BloomFilter(const Hasher* arg_hasher)
+	{
+	hasher = arg_hasher;
+	}
+
+BloomFilter::~BloomFilter()
+	{
+	delete hasher;
+	}
+
+bool BloomFilter::Serialize(SerialInfo* info) const
+	{
+	return SerialObj::Serialize(info);
+	}
+
+BloomFilter* BloomFilter::Unserialize(UnserialInfo* info)
+	{
+	return reinterpret_cast<BloomFilter*>(SerialObj::Unserialize(info, SER_BLOOMFILTER));
+	}
+
+bool BloomFilter::DoSerialize(SerialInfo* info) const
+	{
+	DO_SERIALIZE(SER_BLOOMFILTER, SerialObj);
+
+	return hasher->Serialize(info);
+	}
+
+bool BloomFilter::DoUnserialize(UnserialInfo* info)
+	{
+	DO_UNSERIALIZE(SerialObj);
+
+	hasher = Hasher::Unserialize(info);
+	return hasher != 0;
+	}
+
+size_t BasicBloomFilter::M(double fp, size_t capacity)
+	{
+	double ln2 = std::log(2);
+	return std::ceil(-(capacity * std::log(fp) / ln2 / ln2));
+	}
+
+size_t BasicBloomFilter::K(size_t cells, size_t capacity)
+	{
+	double frac = static_cast<double>(cells) / static_cast<double>(capacity);
+	return std::ceil(frac * std::log(2));
+	}
+
+bool BasicBloomFilter::Empty() const
+	{
+	return bits->AllZero();
+	}
+
+void BasicBloomFilter::Clear()
+	{
+	bits->Clear();
+	}
+
+bool BasicBloomFilter::Merge(const BloomFilter* other)
+	{
+	if ( typeid(*this) != typeid(*other) )
+		return false;
+
+	const BasicBloomFilter* o = static_cast<const BasicBloomFilter*>(other);
+
+	if ( ! hasher->Equals(o->hasher) )
+		{
+		reporter->Error("incompatible hashers in BasicBloomFilter merge");
+		return false;
+		}
+
+	else if ( bits->Size() != o->bits->Size() )
+		{
+		reporter->Error("different bitvector size in BasicBloomFilter merge");
+		return false;
+		}
+
+	(*bits) |= *o->bits;
+
+	return true;
+	}
+
+BasicBloomFilter* BasicBloomFilter::Clone() const
+	{
+	BasicBloomFilter* copy = new BasicBloomFilter();
+
+	copy->hasher = hasher->Clone();
+	copy->bits = new BitVector(*bits);
+
+	return copy;
+	}
+
+BasicBloomFilter::BasicBloomFilter()
+	{
+	bits = 0;
+	}
+
+BasicBloomFilter::BasicBloomFilter(const Hasher* hasher, size_t cells)
+	: BloomFilter(hasher)
+	{
+	bits = new BitVector(cells);
+	}
+
+IMPLEMENT_SERIAL(BasicBloomFilter, SER_BASICBLOOMFILTER)
+
+bool BasicBloomFilter::DoSerialize(SerialInfo* info) const
+	{
+	DO_SERIALIZE(SER_BASICBLOOMFILTER, BloomFilter);
+	return bits->Serialize(info);
+	}
+
+bool BasicBloomFilter::DoUnserialize(UnserialInfo* info)
+	{
+	DO_UNSERIALIZE(BloomFilter);
+	bits = BitVector::Unserialize(info);
+	return (bits != 0);
+	}
+
+void BasicBloomFilter::AddImpl(const Hasher::digest_vector& h)
+	{
+	for ( size_t i = 0; i < h.size(); ++i )
+		bits->Set(h[i] % bits->Size());
+	}
+
+size_t BasicBloomFilter::CountImpl(const Hasher::digest_vector& h) const
+	{
+	for ( size_t i = 0; i < h.size(); ++i )
+		{
+		if ( ! (*bits)[h[i] % bits->Size()] )
+			return 0;
+		}
+
+	return 1;
+	}
+
+CountingBloomFilter::CountingBloomFilter()
+	{
+	cells = 0;
+	}
+
+CountingBloomFilter::CountingBloomFilter(const Hasher* hasher,
+					 size_t arg_cells, size_t width)
+	: BloomFilter(hasher)
+	{
+	cells = new CounterVector(width, arg_cells);
+	}
+
+bool CountingBloomFilter::Empty() const
+	{
+	return cells->AllZero();
+	}
+
+void CountingBloomFilter::Clear()
+	{
+	cells->Clear();
+	}
+
+bool CountingBloomFilter::Merge(const BloomFilter* other)
+	{
+	if ( typeid(*this) != typeid(*other) )
+		return false;
+
+	const CountingBloomFilter* o = static_cast<const CountingBloomFilter*>(other);
+
+	if ( ! hasher->Equals(o->hasher) )
+		{
+		reporter->Error("incompatible hashers in CountingBloomFilter merge");
+		return false;
+		}
+
+	else if ( cells->Size() != o->cells->Size() )
+		{
+		reporter->Error("different bitvector size in CountingBloomFilter merge");
+		return false;
+		}
+
+	(*cells) |= *o->cells;
+
+	return true;
+	}
+
+CountingBloomFilter* CountingBloomFilter::Clone() const
+	{
+	CountingBloomFilter* copy = new CountingBloomFilter();
+
+	copy->hasher = hasher->Clone();
+	copy->cells = new CounterVector(*cells);
+
+	return copy;
+	}
+
+IMPLEMENT_SERIAL(CountingBloomFilter, SER_COUNTINGBLOOMFILTER)
+
+bool CountingBloomFilter::DoSerialize(SerialInfo* info) const
+	{
+	DO_SERIALIZE(SER_COUNTINGBLOOMFILTER, BloomFilter);
+	return cells->Serialize(info);
+	}
+
+bool CountingBloomFilter::DoUnserialize(UnserialInfo* info)
+	{
+	DO_UNSERIALIZE(BloomFilter);
+	cells = CounterVector::Unserialize(info);
+	return (cells != 0);
+	}
+
+// TODO: Use partitioning in add/count to allow for reusing CMS bounds.
+void CountingBloomFilter::AddImpl(const Hasher::digest_vector& h)
+	{
+	for ( size_t i = 0; i < h.size(); ++i )
+		cells->Increment(h[i] % cells->Size());
+	}
+
+size_t CountingBloomFilter::CountImpl(const Hasher::digest_vector& h) const
+	{
+	CounterVector::size_type min =
+		std::numeric_limits<CounterVector::size_type>::max();
+
+	for ( size_t i = 0; i < h.size(); ++i )
+		{
+		CounterVector::size_type cnt = cells->Count(h[i] % cells->Size());
+		if ( cnt  < min )
+			min = cnt;
+		}
+
+	return min;
+	}
--- a/src/probabilistic/BloomFilter.h
+++ b/src/probabilistic/BloomFilter.h
@ -0,0 +1,238 @@
+// See the file "COPYING" in the main distribution directory for copyright.
+
+#ifndef PROBABILISTIC_BLOOMFILTER_H
+#define PROBABILISTIC_BLOOMFILTER_H
+
+#include <vector>
+#include "BitVector.h"
+#include "Hasher.h"
+
+namespace probabilistic {
+
+class CounterVector;
+
+/**
+ * The abstract base class for Bloom filters.
+ */
+class BloomFilter : public SerialObj {
+public:
+	/**
+	 * Destructor.
+	 */
+	virtual ~BloomFilter();
+
+	/**
+	 * Adds an element of type T to the Bloom filter.
+	 * @param x The element to add
+	 */
+	template <typename T>
+	void Add(const T& x)
+		{
+		AddImpl((*hasher)(x));
+		}
+
+	/**
+	 * Retrieves the associated count of a given value.
+	 *
+	 * @param x The value of type `T` to check.
+	 *
+	 * @return The counter associated with *x*.
+	 */
+	template <typename T>
+	size_t Count(const T& x) const
+		{
+		return CountImpl((*hasher)(x));
+		}
+
+	/**
+	 * Checks whether the Bloom filter is empty.
+	 *
+	 * @return `true` if the Bloom filter contains no elements.
+	 */
+	virtual bool Empty() const = 0;
+
+	/**
+	 * Removes all elements, i.e., resets all bits in the underlying bit vector.
+	 */
+	virtual void Clear() = 0;
+
+	/**
+	 * Merges another Bloom filter into a copy of this one.
+	 *
+	 * @param other The other Bloom filter.
+	 *
+	 * @return `true` on success.
+	 */
+	virtual bool Merge(const BloomFilter* other) = 0;
+
+	/**
+	 * Constructs a copy of this Bloom filter.
+	 *
+	 * @return A copy of `*this`.
+	 */
+	virtual BloomFilter* Clone() const = 0;
+
+	/**
+	 * Serializes the Bloom filter.
+	 *
+	 * @param info The serializaton information to use.
+	 *
+	 * @return True if successful.
+	 */
+	bool Serialize(SerialInfo* info) const;
+
+	/**
+	 * Unserializes a Bloom filter.
+	 *
+	 * @param info The serializaton information to use.
+	 *
+	 * @return The unserialized Bloom filter, or null if an error
+	 * occured.
+	 */
+	static BloomFilter* Unserialize(UnserialInfo* info);
+
+protected:
+	DECLARE_ABSTRACT_SERIAL(BloomFilter);
+
+	/**
+	 * Default constructor.
+	 */
+	BloomFilter();
+
+	/**
+	 * Constructs a Bloom filter.
+	 *
+	 * @param hasher The hasher to use for this Bloom filter.
+	 */
+	BloomFilter(const Hasher* hasher);
+
+	/**
+	 * Abstract method for implementinng the *Add* operation.
+	 *
+	 * @param hashes A set of *k* hashes for the item to add, computed by
+	 * the internal hasher object.
+	 *
+	 */
+	virtual void AddImpl(const Hasher::digest_vector& hashes) = 0;
+
+	/**
+	 * Abstract method for implementing the *Count* operation.
+	 *
+	 * @param hashes A set of *k* hashes for the item to add, computed by
+	 * the internal hasher object.
+	 *
+	 * @return Returns the counter associated with the hashed element.
+	 */
+	virtual size_t CountImpl(const Hasher::digest_vector& hashes) const = 0;
+
+	const Hasher* hasher;
+};
+
+/**
+ * A basic Bloom filter.
+ */
+class BasicBloomFilter : public BloomFilter {
+public:
+	/**
+	 * Constructs a basic Bloom filter with a given number of cells. The
+	 * ideal number of cells can be computed with *M*.
+	 *
+	 * @param hasher The hasher to use. The ideal number of hash
+	 * functions can be computed with *K*.
+	 *
+	 * @param cells The number of cells.
+	 */
+	BasicBloomFilter(const Hasher* hasher, size_t cells);
+
+	/**
+	 * Computes the number of cells based on a given false positive rate
+	 * and capacity. In the literature, this parameter often has the name
+	 * *M*.
+	 *
+	 * @param fp The false positive rate.
+	 *
+	 * @param capacity The expected number of elements that will be
+	 * stored.
+	 *
+	 * Returns: The number cells needed to support a false positive rate
+	 * of *fp* with at most *capacity* elements.
+	 */
+	static size_t M(double fp, size_t capacity);
+
+	/**
+	 * Computes the optimal number of hash functions based on the number cells
+	 * and expected number of elements.
+	 *
+	 * @param cells The number of cells (*m*).
+	 *
+	 * @param capacity The maximum number of elements.
+	 *
+	 * Returns: the optimal number of hash functions for a false-positive
+	 * rate of *fp* for at most *capacity* elements.
+	 */
+	static size_t K(size_t cells, size_t capacity);
+
+	// Overridden from BloomFilter.
+	virtual bool Empty() const;
+	virtual void Clear();
+	virtual bool Merge(const BloomFilter* other);
+	virtual BasicBloomFilter* Clone() const;
+
+protected:
+	DECLARE_SERIAL(BasicBloomFilter);
+
+	/**
+	 * Default constructor.
+	 */
+	BasicBloomFilter();
+
+	// Overridden from BloomFilter.
+	virtual void AddImpl(const Hasher::digest_vector& h);
+	virtual size_t CountImpl(const Hasher::digest_vector& h) const;
+
+private:
+	BitVector* bits;
+};
+
+/**
+ * A counting Bloom filter.
+ */
+class CountingBloomFilter : public BloomFilter {
+public:
+	/**
+	 * Constructs a counting Bloom filter.
+	 *
+	 * @param hasher The hasher to use. The ideal number of hash
+	 * functions can be computed with *K*.
+	 *
+	 * @param cells The number of cells to use.
+	 *
+	 * @param width The maximal bit-width of counter values.
+	 */
+	CountingBloomFilter(const Hasher* hasher, size_t cells, size_t width);
+
+	// Overridden from BloomFilter.
+	virtual bool Empty() const;
+	virtual void Clear();
+	virtual bool Merge(const BloomFilter* other);
+	virtual CountingBloomFilter* Clone() const;
+
+protected:
+	DECLARE_SERIAL(CountingBloomFilter);
+
+	/**
+	 * Default constructor.
+	 */
+	CountingBloomFilter();
+
+	// Overridden from BloomFilter.
+	virtual void AddImpl(const Hasher::digest_vector& h);
+	virtual size_t CountImpl(const Hasher::digest_vector& h) const;
+
+private:
+	CounterVector* cells;
+};
+
+}
+
+#endif
--- a/src/probabilistic/CMakeLists.txt
+++ b/src/probabilistic/CMakeLists.txt
@ -0,0 +1,18 @@
+
+include(BroSubdir)
+
+include_directories(BEFORE
+                    ${CMAKE_CURRENT_SOURCE_DIR}
+                    ${CMAKE_CURRENT_BINARY_DIR}
+)
+
+set(probabilistic_SRCS
+    BitVector.cc
+    BloomFilter.cc
+    CounterVector.cc
+    Hasher.cc)
+
+bif_target(bloom-filter.bif)
+bro_add_subdir_library(probabilistic ${probabilistic_SRCS})
+
+add_dependencies(bro_probabilistic generate_outputs)
--- a/src/probabilistic/CounterVector.cc
+++ b/src/probabilistic/CounterVector.cc
@ -0,0 +1,193 @@
+// See the file "COPYING" in the main distribution directory for copyright.
+
+#include "CounterVector.h"
+
+#include <limits>
+#include "BitVector.h"
+#include "Serializer.h"
+
+using namespace probabilistic;
+
+CounterVector::CounterVector(size_t arg_width, size_t cells)
+	{
+	bits = new BitVector(arg_width * cells);
+	width = arg_width;
+	}
+
+CounterVector::CounterVector(const CounterVector& other)
+	{
+	bits = new BitVector(*other.bits);
+	width = other.width;
+	}
+
+CounterVector::~CounterVector()
+	{
+	delete bits;
+	}
+
+bool CounterVector::Increment(size_type cell, count_type value)
+	{
+	assert(cell < Size());
+	assert(value != 0);
+
+	size_t lsb = cell * width;
+	bool carry = false;
+
+	for ( size_t i = 0; i < width; ++i )
+		{
+		bool b1 = (*bits)[lsb + i];
+		bool b2 = value & (1 << i);
+		(*bits)[lsb + i] = b1 ^ b2 ^ carry;
+		carry = ( b1 && b2 ) || ( carry && ( b1 != b2 ) );
+		}
+
+	if ( carry )
+		{
+		for ( size_t i = 0; i < width; ++i )
+			bits->Set(lsb + i);
+		}
+
+	return ! carry;
+	}
+
+bool CounterVector::Decrement(size_type cell, count_type value)
+	{
+	assert(cell < Size());
+	assert(value != 0);
+
+	value = ~value + 1; // A - B := A + ~B + 1
+	bool carry = false;
+	size_t lsb = cell * width;
+
+	for ( size_t i = 0; i < width; ++i )
+		{
+		bool b1 = (*bits)[lsb + i];
+		bool b2 = value & (1 << i);
+		(*bits)[lsb + i] = b1 ^ b2 ^ carry;
+		carry = ( b1 && b2 ) || ( carry && ( b1 != b2 ) );
+		}
+
+	return carry;
+	}
+
+bool CounterVector::AllZero() const
+	{
+	return bits->AllZero();
+	}
+
+void CounterVector::Clear()
+	{
+	bits->Clear();
+	}
+
+CounterVector::count_type CounterVector::Count(size_type cell) const
+	{
+	assert(cell < Size());
+
+	size_t cnt = 0, order = 1;
+	size_t lsb = cell * width;
+
+	for ( size_t i = lsb; i < lsb + width; ++i, order <<= 1 )
+		if ( (*bits)[i] )
+			cnt |= order;
+
+	return cnt;
+	}
+
+CounterVector::size_type CounterVector::Size() const
+	{
+	return bits->Size() / width;
+	}
+
+size_t CounterVector::Width() const
+	{
+	return width;
+	}
+
+size_t CounterVector::Max() const
+	{
+	return std::numeric_limits<size_t>::max()
+		>> (std::numeric_limits<size_t>::digits - width);
+	}
+
+CounterVector& CounterVector::Merge(const CounterVector& other)
+	{
+	assert(Size() == other.Size());
+	assert(Width() == other.Width());
+
+	for ( size_t cell = 0; cell < Size(); ++cell )
+		{
+		size_t lsb = cell * width;
+		bool carry = false;
+
+		for ( size_t i = 0; i < width; ++i )
+			{
+			bool b1 = (*bits)[lsb + i];
+			bool b2 = (*other.bits)[lsb + i];
+			(*bits)[lsb + i] = b1 ^ b2 ^ carry;
+			carry = ( b1 && b2 ) || ( carry && ( b1 != b2 ) );
+			}
+
+		if ( carry )
+			{
+			for ( size_t i = 0; i < width; ++i )
+				bits->Set(lsb + i);
+			}
+		}
+
+	return *this;
+	}
+
+namespace probabilistic {
+
+CounterVector& CounterVector::operator|=(const CounterVector& other)
+	{
+	return Merge(other);
+	}
+
+CounterVector operator|(const CounterVector& x, const CounterVector& y)
+	{
+	CounterVector cv(x);
+	return cv |= y;
+	}
+
+}
+
+bool CounterVector::Serialize(SerialInfo* info) const
+	{
+	return SerialObj::Serialize(info);
+	}
+
+CounterVector* CounterVector::Unserialize(UnserialInfo* info)
+	{
+	return reinterpret_cast<CounterVector*>(SerialObj::Unserialize(info, SER_COUNTERVECTOR));
+	}
+
+IMPLEMENT_SERIAL(CounterVector, SER_COUNTERVECTOR)
+
+bool CounterVector::DoSerialize(SerialInfo* info) const
+	{
+	DO_SERIALIZE(SER_COUNTERVECTOR, SerialObj);
+
+	if ( ! bits->Serialize(info) )
+		return false;
+
+	return SERIALIZE(static_cast<uint64>(width));
+	}
+
+bool CounterVector::DoUnserialize(UnserialInfo* info)
+	{
+	DO_UNSERIALIZE(SerialObj);
+
+	bits = BitVector::Unserialize(info);
+	if ( ! bits )
+		return false;
+
+	uint64 w;
+	if ( ! UNSERIALIZE(&w) )
+		return false;
+
+	width = static_cast<size_t>(w);
+
+	return true;
+	}
--- a/src/probabilistic/CounterVector.h
+++ b/src/probabilistic/CounterVector.h
@ -0,0 +1,165 @@
+// See the file "COPYING" in the main distribution directory for copyright.
+
+#ifndef PROBABILISTIC_COUNTERVECTOR_H
+#define PROBABILISTIC_COUNTERVECTOR_H
+
+#include "SerialObj.h"
+
+namespace probabilistic {
+
+class BitVector;
+
+/**
+ * A vector of counters, each of which has a fixed number of bits.
+ */
+class CounterVector : public SerialObj {
+public:
+	typedef size_t size_type;
+	typedef uint64 count_type;
+
+	/**
+	 * Constructs a counter vector having cells of a given width.
+	 *
+	 * @param width The number of bits that each cell occupies.
+	 *
+	 * @param cells The number of cells in the bitvector.
+	 *
+	 * @pre `cells > 0 && width > 0`
+	 */
+	CounterVector(size_t width, size_t cells = 1024);
+
+	/**
+	 * Copy-constructs a counter vector.
+	 *
+	 * @param other The counter vector to copy.
+	 */
+	CounterVector(const CounterVector& other);
+
+	/**
+	 * Destructor.
+	 */
+	~CounterVector();
+
+	/**
+	 * Increments a given cell.
+	 *
+	 * @param cell The cell to increment.
+	 *
+	 * @param value The value to add to the current counter in *cell*.
+	 *
+	 * @return `true` if adding *value* to the counter in *cell* succeeded.
+	 *
+	 * @pre `cell < Size()`
+	 */
+	bool Increment(size_type cell, count_type value = 1);
+
+	/**
+	 * Decrements a given cell.
+	 *
+	 * @param cell The cell to decrement.
+	 *
+	 * @param value The value to subtract from the current counter in *cell*.
+	 *
+	 * @return `true` if subtracting *value* from the counter in *cell* succeeded.
+	 *
+	 * @pre `cell < Size()`
+	 */
+	bool Decrement(size_type cell, count_type value = 1);
+
+	/**
+	 * Retrieves the counter of a given cell.
+	 *
+	 * @param cell The cell index to retrieve the count for.
+	 *
+	 * @return The counter associated with *cell*.
+	 *
+	 * @pre `cell < Size()`
+	 */
+	count_type Count(size_type cell) const;
+
+	/**
+	 * Checks whether all counters are 0.
+	 * @return `true` iff all counters have the value 0.
+	 */
+	bool AllZero() const;
+
+	/**
+	 * Sets all counters to 0.
+	 */
+	void Clear();
+
+	/**
+	 * Retrieves the number of cells in the storage.
+	 *
+	 * @return The number of cells.
+	 */
+	size_type Size() const;
+
+	/**
+	 * Retrieves the counter width.
+	 *
+	 * @return The number of bits per counter.
+	 */
+	size_t Width() const;
+
+	/**
+	 * Computes the maximum counter value.
+	 *
+	 * @return The maximum counter value based on the width.
+	 */
+	size_t Max() const;
+
+	/**
+	 * Merges another counter vector into this instance by *adding* the
+	 * counters of each cells.
+	 *
+	 * @param other The counter vector to merge into this instance.
+	 *
+	 * @return A reference to `*this`.
+	 *
+	 * @pre `Size() == other.Size() && Width() == other.Width()`
+	 */
+	CounterVector& Merge(const CounterVector& other);
+
+	/**
+	 * An alias for ::Merge.
+	 */
+	CounterVector& operator|=(const CounterVector& other);
+
+	/**
+	 * Serializes the bit vector.
+	 *
+	 * @param info The serializaton information to use.
+	 *
+	 * @return True if successful.
+	 */
+	bool Serialize(SerialInfo* info) const;
+
+	/**
+	 * Unserialize the counter vector.
+	 *
+	 * @param info The serializaton information to use.
+	 *
+	 * @return The unserialized counter vector, or null if an error
+	 * occured.
+	 */
+	static CounterVector* Unserialize(UnserialInfo* info);
+
+protected:
+	friend CounterVector operator|(const CounterVector& x,
+				       const CounterVector& y);
+
+	CounterVector() { }
+
+	DECLARE_SERIAL(CounterVector);
+
+private:
+	CounterVector& operator=(const CounterVector&); // Disable.
+
+	BitVector* bits;
+	size_t width;
+};
+
+}
+
+#endif
--- a/src/probabilistic/Hasher.cc
+++ b/src/probabilistic/Hasher.cc
@ -0,0 +1,194 @@
+// See the file "COPYING" in the main distribution directory for copyright.
+
+#include <typeinfo>
+
+#include "Hasher.h"
+#include "digest.h"
+#include "Serializer.h"
+
+using namespace probabilistic;
+
+bool Hasher::Serialize(SerialInfo* info) const
+	{
+	return SerialObj::Serialize(info);
+	}
+
+Hasher* Hasher::Unserialize(UnserialInfo* info)
+	{
+	return reinterpret_cast<Hasher*>(SerialObj::Unserialize(info, SER_HASHER));
+	}
+
+bool Hasher::DoSerialize(SerialInfo* info) const
+	{
+	DO_SERIALIZE(SER_HASHER, SerialObj);
+
+	if ( ! SERIALIZE(static_cast<uint16>(k)) )
+		return false;
+
+	return SERIALIZE_STR(name.c_str(), name.size());
+	}
+
+bool Hasher::DoUnserialize(UnserialInfo* info)
+	{
+	DO_UNSERIALIZE(SerialObj);
+
+	uint16 serial_k;
+	if ( ! UNSERIALIZE(&serial_k) )
+		return false;
+
+	k = serial_k;
+	assert(k > 0);
+
+	const char* serial_name;
+	if ( ! UNSERIALIZE_STR(&serial_name, 0) )
+		return false;
+
+	name = serial_name;
+	delete [] serial_name;
+
+	return true;
+	}
+
+Hasher::Hasher(size_t k, const std::string& arg_name)
+	: k(k)
+	{
+	k = k;
+	name = arg_name;
+	}
+
+
+UHF::UHF(size_t seed, const std::string& extra)
+	: h(compute_seed(seed, extra))
+	{
+	}
+
+Hasher::digest UHF::hash(const void* x, size_t n) const
+	{
+	assert(n <= UHASH_KEY_SIZE);
+	return n == 0 ? 0 : h(x, n);
+	}
+
+size_t UHF::compute_seed(size_t seed, const std::string& extra)
+	{
+	u_char buf[SHA256_DIGEST_LENGTH];
+	SHA256_CTX ctx;
+	sha256_init(&ctx);
+
+	if ( extra.empty() )
+		{
+		unsigned int first_seed = initial_seed();
+		sha256_update(&ctx, &first_seed, sizeof(first_seed));
+		}
+
+	else
+		sha256_update(&ctx, extra.c_str(), extra.size());
+
+	sha256_update(&ctx, &seed, sizeof(seed));
+	sha256_final(&ctx, buf);
+
+	// Take the first sizeof(size_t) bytes as seed.
+	return *reinterpret_cast<size_t*>(buf);
+	}
+
+DefaultHasher::DefaultHasher(size_t k, const std::string& name)
+	: Hasher(k, name)
+	{
+	for ( size_t i = 0; i < k; ++i )
+		hash_functions.push_back(UHF(i, name));
+	}
+
+Hasher::digest_vector DefaultHasher::Hash(const void* x, size_t n) const
+	{
+	digest_vector h(K(), 0);
+
+	for ( size_t i = 0; i < h.size(); ++i )
+		h[i] = hash_functions[i](x, n);
+
+	return h;
+	}
+
+DefaultHasher* DefaultHasher::Clone() const
+	{
+	return new DefaultHasher(*this);
+	}
+
+bool DefaultHasher::Equals(const Hasher* other) const
+	{
+	if ( typeid(*this) != typeid(*other) )
+		return false;
+
+	const DefaultHasher* o = static_cast<const DefaultHasher*>(other);
+	return hash_functions == o->hash_functions;
+	}
+
+IMPLEMENT_SERIAL(DefaultHasher, SER_DEFAULTHASHER)
+
+bool DefaultHasher::DoSerialize(SerialInfo* info) const
+	{
+	DO_SERIALIZE(SER_DEFAULTHASHER, Hasher);
+
+	// Nothing to do here, the base class has all we need serialized already.
+	return true;
+	}
+
+bool DefaultHasher::DoUnserialize(UnserialInfo* info)
+	{
+	DO_UNSERIALIZE(Hasher);
+
+	hash_functions.clear();
+	for ( size_t i = 0; i < K(); ++i )
+		hash_functions.push_back(UHF(i, Name()));
+
+	return true;
+	}
+
+DoubleHasher::DoubleHasher(size_t k, const std::string& name)
+	: Hasher(k, name), h1(1, name), h2(2, name)
+	{
+	}
+
+Hasher::digest_vector DoubleHasher::Hash(const void* x, size_t n) const
+	{
+	digest d1 = h1(x, n);
+	digest d2 = h2(x, n);
+	digest_vector h(K(), 0);
+
+	for ( size_t i = 0; i < h.size(); ++i )
+		h[i] = d1 + i * d2;
+
+	return h;
+	}
+
+DoubleHasher* DoubleHasher::Clone() const
+	{
+	return new DoubleHasher(*this);
+	}
+
+bool DoubleHasher::Equals(const Hasher* other) const
+	{
+	if ( typeid(*this) != typeid(*other) )
+		return false;
+
+	const DoubleHasher* o = static_cast<const DoubleHasher*>(other);
+	return h1 == o->h1 && h2 == o->h2;
+	}
+
+IMPLEMENT_SERIAL(DoubleHasher, SER_DOUBLEHASHER)
+
+bool DoubleHasher::DoSerialize(SerialInfo* info) const
+	{
+	DO_SERIALIZE(SER_DOUBLEHASHER, Hasher);
+
+	// Nothing to do here, the base class has all we need serialized already.
+	return true;
+	}
+
+bool DoubleHasher::DoUnserialize(UnserialInfo* info)
+	{
+	DO_UNSERIALIZE(Hasher);
+
+	h1 = UHF(1, Name());
+	h2 = UHF(2, Name());
+
+	return true;
+	}
--- a/src/probabilistic/Hasher.h
+++ b/src/probabilistic/Hasher.h
@ -0,0 +1,220 @@
+// See the file "COPYING" in the main distribution directory for copyright.
+
+#ifndef PROBABILISTIC_HASHER_H
+#define PROBABILISTIC_HASHER_H
+
+#include "Hash.h"
+#include "H3.h"
+#include "SerialObj.h"
+
+namespace probabilistic {
+
+/**
+ * Abstract base class for hashers. A hasher creates a family of hash
+ * functions to hash an element *k* times.
+ */
+class Hasher : public SerialObj {
+public:
+	typedef hash_t digest;
+	typedef std::vector<digest> digest_vector;
+
+	/**
+	 * Destructor.
+	 */
+	virtual ~Hasher() { }
+
+	/**
+	 * Computes hash values for an element.
+	 *
+	 * @param x The element to hash.
+	 *
+	 * @return Vector of *k* hash values.
+	 */
+	template <typename T>
+	digest_vector operator()(const T& x) const
+		{
+		return Hash(&x, sizeof(T));
+		}
+
+	/**
+	 * Computes the hashes for a set of bytes.
+	 *
+	 * @param x Pointer to first byte to hash.
+	 *
+	 * @param n Number of bytes to hash.
+	 *
+	 * @return Vector of *k* hash values.
+	 *
+	 */
+	virtual digest_vector Hash(const void* x, size_t n) const = 0;
+
+	/**
+	 * Returns a deep copy of the hasher.
+	 */
+	virtual Hasher* Clone() const = 0;
+
+	/**
+	 * Returns true if two hashers are identical.
+	 */
+	virtual bool Equals(const Hasher* other) const = 0;
+
+	/**
+	 * Returns the number *k* of hash functions the hashers applies.
+	 */
+	size_t K() const	{ return k; }
+
+	/**
+	 * Returns the hasher's name. If not empty, the hasher uses this descriptor
+	 * to seed its *k* hash functions. Otherwise the hasher mixes in the initial
+	 * seed derived from the environment variable `$BRO_SEED`.
+	 */
+	const std::string& Name() const { return name; }
+
+	bool Serialize(SerialInfo* info) const;
+	static Hasher* Unserialize(UnserialInfo* info);
+
+protected:
+	DECLARE_ABSTRACT_SERIAL(Hasher);
+
+	Hasher() { }
+
+	/**
+	 * Constructor.
+	 *
+	 * @param k the number of hash functions.
+	 *
+	 * @param name A name for the hasher. Hashers with the same name
+	 * should provide consistent results.
+	 */
+	Hasher(size_t k, const std::string& name);
+
+private:
+	size_t k;
+	std::string name;
+};
+
+/**
+ * A universal hash function family. This is a helper class that Hasher
+ * implementations can use in their implementation.
+ */
+class UHF {
+public:
+	/**
+	 * Constructs an H3 hash function seeded with a given seed and an
+	 * optional extra seed to replace the initial Bro seed.
+	 *
+	 * @param seed The seed to use for this instance.
+	 *
+	 * @param extra If not empty, this parameter replaces the initial
+	 * seed to compute the seed for t to compute the seed NUL-terminated
+	 * string as additional seed.
+	 */
+	UHF(size_t seed = 0, const std::string& extra = "");
+
+	template <typename T>
+	Hasher::digest operator()(const T& x) const
+		{
+		return hash(&x, sizeof(T));
+		}
+
+	/**
+	 * Computes hash values for an element.
+	 *
+	 * @param x The element to hash.
+	 *
+	 * @return Vector of *k* hash values.
+	 */
+	Hasher::digest operator()(const void* x, size_t n) const
+		{
+		return hash(x, n);
+		}
+
+	/**
+	 * Computes the hashes for a set of bytes.
+	 *
+	 * @param x Pointer to first byte to hash.
+	 *
+	 * @param n Number of bytes to hash.
+	 *
+	 * @return Vector of *k* hash values.
+	 *
+	 */
+	Hasher::digest hash(const void* x, size_t n) const;
+
+	friend bool operator==(const UHF& x, const UHF& y)
+		{
+		return x.h == y.h;
+		}
+
+	friend bool operator!=(const UHF& x, const UHF& y)
+		{
+		return ! (x == y);
+		}
+
+private:
+	static size_t compute_seed(size_t seed, const std::string& extra);
+
+	H3<Hasher::digest, UHASH_KEY_SIZE> h;
+};
+
+
+/**
+ * A hasher implementing the default hashing policy. Uses *k* separate hash
+ * functions internally.
+ */
+class DefaultHasher : public Hasher {
+public:
+	/**
+	 * Constructor for a hasher with *k* hash functions.
+	 *
+	 * @param k The number of hash functions to use.
+	 *
+	 * @param name The name of the hasher.
+	 */
+	DefaultHasher(size_t k, const std::string& name = "");
+
+	// Overridden from Hasher.
+	virtual digest_vector Hash(const void* x, size_t n) const /* final */;
+	virtual DefaultHasher* Clone() const /* final */;
+	virtual bool Equals(const Hasher* other) const /* final */;
+
+	DECLARE_SERIAL(DefaultHasher);
+
+private:
+	DefaultHasher() { }
+
+	std::vector<UHF> hash_functions;
+};
+
+/**
+ * The *double-hashing* policy. Uses a linear combination of two hash
+ * functions.
+ */
+class DoubleHasher : public Hasher {
+public:
+	/**
+	 * Constructor for a double hasher with *k* hash functions.
+	 *
+	 * @param k The number of hash functions to use.
+	 *
+	 * @param name The name of the hasher.
+	 */
+	DoubleHasher(size_t k, const std::string& name = "");
+
+	// Overridden from Hasher.
+	virtual digest_vector Hash(const void* x, size_t n) const /* final */;
+	virtual DoubleHasher* Clone() const /* final */;
+	virtual bool Equals(const Hasher* other) const /* final */;
+
+	DECLARE_SERIAL(DoubleHasher);
+
+private:
+	DoubleHasher() { }
+
+	UHF h1;
+	UHF h2;
+};
+
+}
+
+#endif
--- a/src/probabilistic/bloom-filter.bif
+++ b/src/probabilistic/bloom-filter.bif
@ -0,0 +1,196 @@
+# ===========================================================================
+#
+#                           Bloom Filter Functions
+#
+# ===========================================================================
+
+%%{
+
+// TODO: This is currently included from the top-level src directory, hence
+// paths are relative to there. We need a better mechanisms to pull in
+// BiFs defined in sub directories.
+#include "probabilistic/BloomFilter.h"
+#include "OpaqueVal.h"
+
+using namespace probabilistic;
+
+%%}
+
+module GLOBAL;
+
+## Creates a basic Bloom filter.
+##
+## .. note:: A Bloom filter can have a name associated with it. In the future,
+##    Bloom filters with the same name will be compatible across indepedent Bro
+##    instances, i.e., it will be possible to merge them. Currently, however, that is
+##    not yet supported.
+##
+## fp: The desired false-positive rate.
+##
+## capacity: the maximum number of elements that guarantees a false-positive
+## rate of *fp*.
+##
+## name: A name that uniquely identifies and seeds the Bloom filter. If empty,
+## the filter will remain tied to the current Bro process.
+##
+## Returns: A Bloom filter handle.
+##
+## .. bro:see:: bloomfilter_counting_init  bloomfilter_add bloomfilter_lookup
+##    bloomfilter_clear bloomfilter_merge
+function bloomfilter_basic_init%(fp: double, capacity: count,
+                                 name: string &default=""%): opaque of bloomfilter
+	%{
+	if ( fp < 0.0 || fp > 1.0 )
+		{
+		reporter->Error("false-positive rate must take value between 0 and 1");
+		return 0;
+		}
+
+	size_t cells = BasicBloomFilter::M(fp, capacity);
+	size_t optimal_k = BasicBloomFilter::K(cells, capacity);
+	const Hasher* h = new DefaultHasher(optimal_k, name->CheckString());
+
+	return new BloomFilterVal(new BasicBloomFilter(h, cells));
+	%}
+
+## Creates a counting Bloom filter.
+##
+## .. note:: A Bloom filter can have a name associated with it. In the future,
+##    Bloom filters with the same name will be compatible across indepedent Bro
+##    instances, i.e., it will be possible to merge them. Currently, however, that is
+##    not yet supported.
+##
+## k: The number of hash functions to use.
+##
+## cells: The number of cells of the underlying counter vector. As there's no
+## single answer to what's the best parameterization for a counting Bloom filter,
+## we refer to the Bloom filter literature here for choosing an appropiate value.
+##
+## max: The maximum counter value associated with each each element described
+## by *w = ceil(log_2(max))* bits. Each bit in the underlying counter vector
+## becomes a cell of size *w* bits.
+##
+## name: A name that uniquely identifies and seeds the Bloom filter. If empty,
+## the filter will remain tied to the current Bro process.
+##
+## Returns: A Bloom filter handle.
+##
+## .. bro:see:: bloomfilter_basic_init bloomfilter_add bloomfilter_lookup
+##    bloomfilter_clear bloomfilter_merge
+function bloomfilter_counting_init%(k: count, cells: count, max: count,
+				    name: string &default=""%): opaque of bloomfilter
+	%{
+	if ( max == 0 )
+		{
+		reporter->Error("max counter value must be greater than 0");
+		return 0;
+		}
+
+	const Hasher* h = new DefaultHasher(k, name->CheckString());
+
+	uint16 width = 1;
+	while ( max >>= 1 )
+		++width;
+
+	return new BloomFilterVal(new CountingBloomFilter(h, cells, width));
+	%}
+
+## Adds an element to a Bloom filter.
+##
+## bf: The Bloom filter handle.
+##
+## x: The element to add.
+##
+## .. bro:see:: bloomfilter_counting_init bloomfilter_basic_init loomfilter_lookup
+##    bloomfilter_clear bloomfilter_merge
+function bloomfilter_add%(bf: opaque of bloomfilter, x: any%): any
+	%{
+	BloomFilterVal* bfv = static_cast<BloomFilterVal*>(bf);
+
+	if ( ! bfv->Type() && ! bfv->Typify(x->Type()) )
+		reporter->Error("failed to set Bloom filter type");
+
+	else if ( ! same_type(bfv->Type(), x->Type()) )
+		reporter->Error("incompatible Bloom filter types");
+
+	else
+		bfv->Add(x);
+
+	return 0;
+	%}
+
+## Retrieves the counter for a given element in a Bloom filter.
+##
+## bf: The Bloom filter handle.
+##
+## x: The element to count.
+##
+## Returns: the counter associated with *x* in *bf*.
+##
+## .. bro:see:: bloomfilter_counting_init bloomfilter_basic_init
+##    bloomfilter_add bloomfilter_clear bloomfilter_merge
+function bloomfilter_lookup%(bf: opaque of bloomfilter, x: any%): count
+	%{
+	const BloomFilterVal* bfv = static_cast<const BloomFilterVal*>(bf);
+
+	if ( bfv->Empty() )
+		return new Val(0, TYPE_COUNT);
+
+	if ( ! bfv->Type() )
+		reporter->Error("cannot perform lookup on untyped Bloom filter");
+
+	else if ( ! same_type(bfv->Type(), x->Type()) )
+		reporter->Error("incompatible Bloom filter types");
+
+	else
+		return new Val(static_cast<uint64>(bfv->Count(x)), TYPE_COUNT);
+
+	return new Val(0, TYPE_COUNT);
+	%}
+
+## Removes all elements from a Bloom filter. This function resets all bits in the
+## underlying bitvector back to 0 but does not change the parameterization of the
+## Bloom filter, such as the element type and the hasher seed.
+##
+## bf: The Bloom filter handle.
+##
+## .. bro:see:: bloomfilter_counting_init bloomfilter_basic_init
+##    bloomfilter_add bloomfilter_lookup bloomfilter_merge
+function bloomfilter_clear%(bf: opaque of bloomfilter%): any
+	%{
+	BloomFilterVal* bfv = static_cast<BloomFilterVal*>(bf);
+
+	if ( bfv->Type() ) // Untyped Bloom filters are already empty.
+		bfv->Clear();
+
+	return 0;
+	%}
+
+## Merges two Bloom filters.
+##
+## .. note:: Currently Bloom filters created by different Bro instances cannot
+##    be merged. In the future, this will be supported as long as both filters
+##    are created with the same name.
+##
+## bf1: The first Bloom filter handle.
+##
+## bf2: The second Bloom filter handle.
+##
+## Returns: The union of *bf1* and *bf2*.
+##
+## .. bro:see:: bloomfilter_counting_init bloomfilter_basic_init
+##    bloomfilter_add bloomfilter_lookup bloomfilter_clear
+function bloomfilter_merge%(bf1: opaque of bloomfilter,
+			    bf2: opaque of bloomfilter%): opaque of bloomfilter
+	%{
+	const BloomFilterVal* bfv1 = static_cast<const BloomFilterVal*>(bf1);
+	const BloomFilterVal* bfv2 = static_cast<const BloomFilterVal*>(bf2);
+
+	if ( ! same_type(bfv1->Type(), bfv2->Type()) )
+		{
+		reporter->Error("incompatible Bloom filter types");
+		return 0;
+		}
+
+	return BloomFilterVal::Merge(bfv1, bfv2);
+	%}
--- a/src/util.cc
+++ b/src/util.cc
@ -716,6 +716,8 @@ static bool write_random_seeds(const char* write_file, uint32 seed,

 static bool bro_rand_determistic = false;
 static unsigned int bro_rand_state = 0;
+static bool first_seed_saved = false;
+static unsigned int first_seed = 0;

 static void bro_srandom(unsigned int seed, bool deterministic)
 	{
@ -800,6 +802,12 @@ void init_random_seed(uint32 seed, const char* read_file, const char* write_file

 	bro_srandom(seed, seeds_done);

+	if ( ! first_seed_saved )
+		{
+		first_seed = seed;
+		first_seed_saved = true;
+		}
+
 	if ( ! hmac_key_set )
 		{
 		MD5((const u_char*) buf, sizeof(buf), shared_hmac_md5_key);
@ -811,27 +819,39 @@ void init_random_seed(uint32 seed, const char* read_file, const char* write_file
 				write_file);
 	}

+unsigned int initial_seed()
+	{
+	return first_seed;
+	}
+
 bool have_random_seed()
 	{
 	return bro_rand_determistic;
 	}

+unsigned int bro_prng(unsigned int  state)
+	{
+	// Use our own simple linear congruence PRNG to make sure we are
+	// predictable across platforms.
+	static const long int m = 2147483647;
+	static const long int a = 16807;
+	const long int q = m / a;
+	const long int r = m % a;
+
+	state = a * ( state % q ) - r * ( state / q );
+
+	if ( state <= 0 )
+		state += m;
+
+	return state;
+	}
+
 long int bro_random()
 	{
 	if ( ! bro_rand_determistic )
 		return random(); // Use system PRNG.

-	// Use our own simple linear congruence PRNG to make sure we are
-	// predictable across platforms.
-	const long int m = 2147483647;
-	const long int a = 16807;
-	const long int q = m / a;
-	const long int r = m % a;
-
-	bro_rand_state = a * ( bro_rand_state % q ) - r * ( bro_rand_state / q );
-
-	if ( bro_rand_state <= 0 )
-		bro_rand_state += m;
+	bro_rand_state = bro_prng(bro_rand_state);

 	return bro_rand_state;
 	}
--- a/src/util.h
+++ b/src/util.h
@ -165,12 +165,20 @@ extern void hmac_md5(size_t size, const unsigned char* bytes,
 extern void init_random_seed(uint32 seed, const char* load_file,
 				const char* write_file);

+// Retrieves the initial seed computed after the very first call to
+// init_random_seed(). Repeated calls to init_random_seed() will not affect
+// the return value of this function.
+unsigned int initial_seed();
+
 // Returns true if the user explicitly set a seed via init_random_seed();
 extern bool have_random_seed();

+// A simple linear congruence PRNG. It takes its state as argument and
+// returns a new random value, which can serve as state for subsequent calls.
+unsigned int bro_prng(unsigned int state);
+
 // Replacement for the system random(), to which is normally falls back
-// except when a seed has been given. In that case, we use our own
-// predictable PRNG.
+// except when a seed has been given. In that case, the function bro_prng.
 long int bro_random();

 // Calls the system srandom() function with the given seed if not running
--- a/testing/btest/Baseline/bifs.bloomfilter/output
+++ b/testing/btest/Baseline/bifs.bloomfilter/output
@ -0,0 +1,27 @@
+error: incompatible Bloom filter types
+error: incompatible Bloom filter types
+error: incompatible Bloom filter types
+error: incompatible Bloom filter types
+error: false-positive rate must take value between 0 and 1
+error: false-positive rate must take value between 0 and 1
+0
+1
+1
+0
+1
+1
+1
+1
+1
+1
+1
+1
+1
+2
+3
+3
+2
+3
+3
+3
+2
--- a/testing/btest/Baseline/coverage.bare-load-baseline/canonified_loaded_scripts.log
+++ b/testing/btest/Baseline/coverage.bare-load-baseline/canonified_loaded_scripts.log
@ -3,7 +3,7 @@
 #empty_field	(empty)
 #unset_field	-
 #path	loaded_scripts
-#open	2013-07-25-19-59-47
+#open	2013-07-29-22-37-52
 #fields	name
 #types	string
 scripts/base/init-bare.bro
@ -12,6 +12,7 @@ scripts/base/init-bare.bro
  build/scripts/base/bif/strings.bif.bro
  build/scripts/base/bif/bro.bif.bro
  build/scripts/base/bif/reporter.bif.bro
+  build/scripts/base/bif/bloom-filter.bif.bro
  build/scripts/base/bif/event.bif.bro
  build/scripts/base/bif/plugins/__load__.bro
    build/scripts/base/bif/plugins/Bro_ARP.events.bif.bro
@ -89,6 +90,7 @@ scripts/base/init-bare.bro
      build/scripts/base/bif/file_analysis.bif.bro
      scripts/base/utils/site.bro
        scripts/base/utils/patterns.bro
+  build/scripts/base/bif/__load__.bro
 scripts/policy/misc/loaded-scripts.bro
  scripts/base/utils/paths.bro
-#close	2013-07-25-19-59-47
+#close	2013-07-29-22-37-52
--- a/testing/btest/Baseline/coverage.default-load-baseline/canonified_loaded_scripts.log
+++ b/testing/btest/Baseline/coverage.default-load-baseline/canonified_loaded_scripts.log
@ -3,7 +3,7 @@
 #empty_field	(empty)
 #unset_field	-
 #path	loaded_scripts
-#open	2013-07-29-20-08-38
+#open	2013-07-29-22-37-53
 #fields	name
 #types	string
 scripts/base/init-bare.bro
@ -12,6 +12,7 @@ scripts/base/init-bare.bro
  build/scripts/base/bif/strings.bif.bro
  build/scripts/base/bif/bro.bif.bro
  build/scripts/base/bif/reporter.bif.bro
+  build/scripts/base/bif/bloom-filter.bif.bro
  build/scripts/base/bif/event.bif.bro
  build/scripts/base/bif/plugins/__load__.bro
    build/scripts/base/bif/plugins/Bro_ARP.events.bif.bro
@ -89,13 +90,19 @@ scripts/base/init-bare.bro
      build/scripts/base/bif/file_analysis.bif.bro
      scripts/base/utils/site.bro
        scripts/base/utils/patterns.bro
+  build/scripts/base/bif/__load__.bro
 scripts/base/init-default.bro
+  scripts/base/utils/active-http.bro
+    scripts/base/utils/exec.bro
  scripts/base/utils/addrs.bro
  scripts/base/utils/conn-ids.bro
+  scripts/base/utils/dir.bro
+    scripts/base/frameworks/reporter/__load__.bro
+      scripts/base/frameworks/reporter/main.bro
+    scripts/base/utils/paths.bro
  scripts/base/utils/directions-and-hosts.bro
  scripts/base/utils/files.bro
  scripts/base/utils/numbers.bro
-  scripts/base/utils/paths.bro
  scripts/base/utils/queue.bro
  scripts/base/utils/strings.bro
  scripts/base/utils/thresholds.bro
@ -129,8 +136,6 @@ scripts/base/init-default.bro
  scripts/base/frameworks/intel/__load__.bro
    scripts/base/frameworks/intel/main.bro
    scripts/base/frameworks/intel/input.bro
-  scripts/base/frameworks/reporter/__load__.bro
-    scripts/base/frameworks/reporter/main.bro
  scripts/base/frameworks/sumstats/__load__.bro
    scripts/base/frameworks/sumstats/main.bro
    scripts/base/frameworks/sumstats/plugins/__load__.bro
@ -197,4 +202,4 @@ scripts/base/init-default.bro
    scripts/base/files/extract/main.bro
  scripts/base/misc/find-checksum-offloading.bro
 scripts/policy/misc/loaded-scripts.bro
-#close	2013-07-29-20-08-38
+#close	2013-07-29-22-37-53
--- a/testing/btest/Baseline/scripts.base.frameworks.intel.cluster-transparency/manager-1.intel.log
+++ b/testing/btest/Baseline/scripts.base.frameworks.intel.cluster-transparency/manager-1.intel.log
@ -3,8 +3,8 @@
 #empty_field	(empty)
 #unset_field	-
 #path	intel
-#open	2012-10-03-20-20-39
-#fields	ts	uid	id.orig_h	id.orig_p	id.resp_h	id.resp_p	seen.host	seen.str	seen.str_type	seen.where	sources
-#types	time	string	addr	port	addr	port	addr	string	enum	enum	table[string]
-1349295639.424940	-	-	-	-	-	123.123.123.123	-	-	Intel::IN_ANYWHERE	worker-1
-#close	2012-10-03-20-20-49
+#open	2013-07-19-17-05-48
+#fields	ts	uid	id.orig_h	id.orig_p	id.resp_h	id.resp_p	seen.indicator	seen.indicator_type	seen.where	sources
+#types	time	string	addr	port	addr	port	string	enum	enum	table[string]
+1374253548.038580	-	-	-	-	-	123.123.123.123	Intel::ADDR	Intel::IN_ANYWHERE	worker-1
+#close	2013-07-19-17-05-57
--- a/testing/btest/Baseline/scripts.base.frameworks.intel.input-and-match/broproc.intel.log
+++ b/testing/btest/Baseline/scripts.base.frameworks.intel.input-and-match/broproc.intel.log
@ -3,9 +3,9 @@
 #empty_field	(empty)
 #unset_field	-
 #path	intel
-#open	2012-10-03-20-18-05
-#fields	ts	uid	id.orig_h	id.orig_p	id.resp_h	id.resp_p	seen.host	seen.str	seen.str_type	seen.where	sources
-#types	time	string	addr	port	addr	port	addr	string	enum	enum	table[string]
-1349295485.114156	-	-	-	-	-	-	e@mail.com	Intel::EMAIL	SOMEWHERE	source1
-1349295485.114156	-	-	-	-	-	1.2.3.4	-	-	SOMEWHERE	source1
-#close	2012-10-03-20-18-05
+#open	2013-07-19-17-04-26
+#fields	ts	uid	id.orig_h	id.orig_p	id.resp_h	id.resp_p	seen.indicator	seen.indicator_type	seen.where	sources
+#types	time	string	addr	port	addr	port	string	enum	enum	table[string]
+1374253466.857185	-	-	-	-	-	e@mail.com	Intel::EMAIL	SOMEWHERE	source1
+1374253466.857185	-	-	-	-	-	1.2.3.4	Intel::ADDR	SOMEWHERE	source1
+#close	2013-07-19-17-04-26
--- a/testing/btest/Baseline/scripts.base.frameworks.intel.read-file-dist-cluster/manager-1.intel.log
+++ b/testing/btest/Baseline/scripts.base.frameworks.intel.read-file-dist-cluster/manager-1.intel.log
@ -3,11 +3,11 @@
 #empty_field	(empty)
 #unset_field	-
 #path	intel
-#open	2012-10-10-15-05-23
-#fields	ts	uid	id.orig_h	id.orig_p	id.resp_h	id.resp_p	seen.host	seen.str	seen.str_type	seen.where	sources
-#types	time	string	addr	port	addr	port	addr	string	enum	enum	table[string]
-1349881523.548946	-	-	-	-	-	1.2.3.4	-	-	Intel::IN_A_TEST	source1
-1349881523.548946	-	-	-	-	-	-	e@mail.com	Intel::EMAIL	Intel::IN_A_TEST	source1
-1349881524.567896	-	-	-	-	-	1.2.3.4	-	-	Intel::IN_A_TEST	source1
-1349881524.567896	-	-	-	-	-	-	e@mail.com	Intel::EMAIL	Intel::IN_A_TEST	source1
-#close	2012-10-10-15-05-24
+#open	2013-07-19-17-06-57
+#fields	ts	uid	id.orig_h	id.orig_p	id.resp_h	id.resp_p	seen.indicator	seen.indicator_type	seen.where	sources
+#types	time	string	addr	port	addr	port	string	enum	enum	table[string]
+1374253617.312158	-	-	-	-	-	1.2.3.4	Intel::ADDR	Intel::IN_A_TEST	source1
+1374253617.312158	-	-	-	-	-	e@mail.com	Intel::EMAIL	Intel::IN_A_TEST	source1
+1374253618.332565	-	-	-	-	-	1.2.3.4	Intel::ADDR	Intel::IN_A_TEST	source1
+1374253618.332565	-	-	-	-	-	e@mail.com	Intel::EMAIL	Intel::IN_A_TEST	source1
+#close	2013-07-19-17-07-06
--- a/testing/btest/Baseline/scripts.base.frameworks.logging.dataseries.wikipedia/http.ds.txt
+++ b/testing/btest/Baseline/scripts.base.frameworks.logging.dataseries.wikipedia/http.ds.txt
@ -32,10 +32,10 @@
 	<field type="variable32" name="username" pack_unique="yes"/>
 	<field type="variable32" name="password" pack_unique="yes"/>
 	<field type="variable32" name="proxied" pack_unique="yes"/>
-	<field type="variable32" name="mime_type" pack_unique="yes"/>
-	<field type="variable32" name="md5" pack_unique="yes"/>
-	<field type="variable32" name="extracted_request_files" pack_unique="yes"/>
-	<field type="variable32" name="extracted_response_files" pack_unique="yes"/>
+	<field type="variable32" name="orig_fuids" pack_unique="yes"/>
+	<field type="variable32" name="orig_mime_types" pack_unique="yes"/>
+	<field type="variable32" name="resp_fuids" pack_unique="yes"/>
+	<field type="variable32" name="resp_mime_types" pack_unique="yes"/>
 </ExtentType>
 <!-- ts : time -->
 <!-- uid : string -->
@ -60,13 +60,13 @@
 <!-- username : string -->
 <!-- password : string -->
 <!-- proxied : table[string] -->
-<!-- mime_type : string -->
-<!-- md5 : string -->
-<!-- extracted_request_files : vector[string] -->
-<!-- extracted_response_files : vector[string] -->
+<!-- orig_fuids : vector[string] -->
+<!-- orig_mime_types : vector[string] -->
+<!-- resp_fuids : vector[string] -->
+<!-- resp_mime_types : vector[string] -->

 # Extent, type='http'
-ts uid id.orig_h id.orig_p id.resp_h id.resp_p trans_depth method host uri referrer user_agent request_body_len response_body_len status_code status_msg info_code info_msg filename tags username password proxied mime_type md5 extracted_request_files extracted_response_files
+ts uid id.orig_h id.orig_p id.resp_h id.resp_p trans_depth method host uri referrer user_agent request_body_len response_body_len status_code status_msg info_code info_msg filename tags username password proxied orig_fuids orig_mime_types resp_fuids resp_mime_types
 1300475168.784020 j4u32Pc5bif 141.142.220.118 48649 208.80.152.118 80 1 GET bits.wikimedia.org /skins-1.5/monobook/main.css http://www.wikipedia.org/ Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.2.15) Gecko/20110303 Ubuntu/10.04 (lucid) Firefox/3.6.15 0 0 304 Not Modified 0          
 1300475168.916018 VW0XPVINV8a 141.142.220.118 49997 208.80.152.3 80 1 GET upload.wikimedia.org /wikipedia/commons/6/63/Wikipedia-logo.png http://www.wikipedia.org/ Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.2.15) Gecko/20110303 Ubuntu/10.04 (lucid) Firefox/3.6.15 0 0 304 Not Modified 0          
 1300475168.916183 3PKsZ2Uye21 141.142.220.118 49996 208.80.152.3 80 1 GET upload.wikimedia.org /wikipedia/commons/thumb/b/bb/Wikipedia_wordmark.svg/174px-Wikipedia_wordmark.svg.png http://www.wikipedia.org/ Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.2.15) Gecko/20110303 Ubuntu/10.04 (lucid) Firefox/3.6.15 0 0 304 Not Modified 0          
--- a/testing/btest/Baseline/scripts.base.protocols.dns.dns-key/dns.log
+++ b/testing/btest/Baseline/scripts.base.protocols.dns.dns-key/dns.log
@ -0,0 +1,10 @@
+#separator \x09
+#set_separator	,
+#empty_field	(empty)
+#unset_field	-
+#path	dns
+#open	2013-07-25-20-29-44
+#fields	ts	uid	id.orig_h	id.orig_p	id.resp_h	id.resp_p	proto	trans_id	query	qclass	qclass_name	qtype	qtype_name	rcode	rcode_name	AA	TC	RD	RA	Z	answers	TTLs	rejected
+#types	time	string	addr	port	addr	port	enum	count	string	count	string	count	string	count	string	bool	bool	bool	bool	count	vector[string]	vector[interval]	bool
+1359565680.761790	UWkUyAuUGXf	192.168.6.10	53209	192.168.129.36	53	udp	41477	paypal.com	1	C_INTERNET	48	DNSKEY	0	NOERROR	F	F	T	F	1	-	-	F
+#close	2013-07-25-20-29-44
--- a/testing/btest/Baseline/scripts.base.utils.active-http/bro..stdout
+++ b/testing/btest/Baseline/scripts.base.utils.active-http/bro..stdout
@ -0,0 +1,5 @@
+[code=200, msg=OK^M, body=It works!, headers={
+[Server] =  1.0,
+[Content-type] =  text/plain,
+[Date] =  July 22, 2013
+}]
--- a/testing/btest/Baseline/scripts.base.utils.dir/bro..stdout
+++ b/testing/btest/Baseline/scripts.base.utils.dir/bro..stdout
@ -0,0 +1,10 @@
+new_file1, ../testdir/bye
+new_file1, ../testdir/hi
+new_file1, ../testdir/howsitgoing
+new_file2, ../testdir/bye
+new_file2, ../testdir/hi
+new_file2, ../testdir/howsitgoing
+new_file1, ../testdir/bye
+new_file1, ../testdir/newone
+new_file2, ../testdir/bye
+new_file2, ../testdir/newone
--- a/testing/btest/Baseline/scripts.base.utils.exec/bro..stdout
+++ b/testing/btest/Baseline/scripts.base.utils.exec/bro..stdout
@ -0,0 +1,7 @@
+test1, [exit_code=0, signal_exit=F, stdout=[done, exit, stop], stderr=<uninitialized>, files={
+[out1] = [insert text here, and here],
+[out2] = [insert more text here, and there]
+}]
+test2, [exit_code=1, signal_exit=F, stdout=[here's something on stdout, some more stdout, last stdout], stderr=[and some stderr, more stderr, last stderr], files=<uninitialized>]
+test3, [exit_code=9, signal_exit=F, stdout=[FML], stderr=<uninitialized>, files=<uninitialized>]
+test4, [exit_code=0, signal_exit=F, stdout=[hibye], stderr=<uninitialized>, files=<uninitialized>]
--- a/testing/btest/Makefile
+++ b/testing/btest/Makefile
@ -24,4 +24,11 @@ cleanup:
 update-doc-sources:
 	../../doc/scripts/genDocSourcesList.sh ../../doc/scripts/DocSourcesList.cmake

+# Updates the three coverage tests that usually need tweaking when
+# scripts get added/removed.
+update-coverage-tests: update-doc-sources
+	btest -qU coverage.bare-load-baseline
+	btest -qU coverage.default-load-baseline
+	@echo "Use 'git diff' to check updates look right."
+
 .PHONY: all btest-verbose brief btest-brief coverage cleanup
--- a/testing/btest/Traces/dns-dnskey.trace
+++ b/testing/btest/Traces/dns-dnskey.trace
--- a/testing/btest/bifs/bloomfilter.bro
+++ b/testing/btest/bifs/bloomfilter.bro
@ -0,0 +1,83 @@
+# @TEST-EXEC: bro -b %INPUT >output 2>&1
+# @TEST-EXEC: btest-diff output
+
+function test_basic_bloom_filter()
+  {
+  # Basic usage with counts.
+  local bf_cnt = bloomfilter_basic_init(0.1, 1000);
+  bloomfilter_add(bf_cnt, 42);
+  bloomfilter_add(bf_cnt, 84);
+  bloomfilter_add(bf_cnt, 168);
+  print bloomfilter_lookup(bf_cnt, 0);
+  print bloomfilter_lookup(bf_cnt, 42);
+  print bloomfilter_lookup(bf_cnt, 168);
+  print bloomfilter_lookup(bf_cnt, 336);
+  bloomfilter_add(bf_cnt, 0.5); # Type mismatch
+  bloomfilter_add(bf_cnt, "foo"); # Type mismatch
+
+  # Basic usage with strings.
+  local bf_str = bloomfilter_basic_init(0.9, 10);
+  bloomfilter_add(bf_str, "foo");
+  bloomfilter_add(bf_str, "bar");
+  print bloomfilter_lookup(bf_str, "foo");
+  print bloomfilter_lookup(bf_str, "bar");
+  print bloomfilter_lookup(bf_str, "b4z"); # FP
+  print bloomfilter_lookup(bf_str, "quux"); # FP
+  bloomfilter_add(bf_str, 0.5); # Type mismatch
+  bloomfilter_add(bf_str, 100); # Type mismatch
+
+  # Edge cases.
+  local bf_edge0 = bloomfilter_basic_init(0.000000000001, 1);
+  local bf_edge1 = bloomfilter_basic_init(0.00000001, 100000000);
+  local bf_edge2 = bloomfilter_basic_init(0.9999999, 1);
+  local bf_edge3 = bloomfilter_basic_init(0.9999999, 100000000000);
+
+  # Invalid parameters.
+  local bf_bug0 = bloomfilter_basic_init(-0.5, 42);
+  local bf_bug1 = bloomfilter_basic_init(1.1, 42);
+
+  # Merging
+  local bf_cnt2 = bloomfilter_basic_init(0.1, 1000);
+  bloomfilter_add(bf_cnt2, 42);
+  bloomfilter_add(bf_cnt, 100);
+  local bf_merged = bloomfilter_merge(bf_cnt, bf_cnt2);
+  print bloomfilter_lookup(bf_merged, 42);
+  print bloomfilter_lookup(bf_merged, 84);
+  print bloomfilter_lookup(bf_merged, 100);
+  print bloomfilter_lookup(bf_merged, 168);
+  }
+
+function test_counting_bloom_filter()
+  {
+  local bf = bloomfilter_counting_init(3, 32, 3);
+  bloomfilter_add(bf, "foo");
+  print bloomfilter_lookup(bf, "foo");    # 1
+  bloomfilter_add(bf, "foo");
+  print bloomfilter_lookup(bf, "foo");    # 2
+  bloomfilter_add(bf, "foo");
+  print bloomfilter_lookup(bf, "foo");    # 3
+  bloomfilter_add(bf, "foo");
+  print bloomfilter_lookup(bf, "foo");    # still 3
+
+
+  bloomfilter_add(bf, "bar");
+  bloomfilter_add(bf, "bar");
+  print bloomfilter_lookup(bf, "bar");    # 2
+  print bloomfilter_lookup(bf, "foo");    # still 3
+
+  # Merging
+  local bf2 = bloomfilter_counting_init(3, 32, 3);
+  bloomfilter_add(bf2, "baz");
+  bloomfilter_add(bf2, "baz");
+  bloomfilter_add(bf2, "bar");
+  local bf_merged = bloomfilter_merge(bf, bf2);
+  print bloomfilter_lookup(bf_merged, "foo");
+  print bloomfilter_lookup(bf_merged, "bar");
+  print bloomfilter_lookup(bf_merged, "baz");
+  }
+
+event bro_init()
+  {
+  test_basic_bloom_filter();
+  test_counting_bloom_filter();
+  }
--- a/testing/btest/coverage/bare-mode-errors.test
+++ b/testing/btest/coverage/bare-mode-errors.test
@ -10,5 +10,8 @@
 #
 # @TEST-EXEC: test -d $DIST/scripts
 # @TEST-EXEC: for script in `find $DIST/scripts/ -name \*\.bro -not -path '*/site/*'`; do echo "=== $script" >>allerrors; if echo "$script" | egrep -q 'communication/listen|controllee'; then rm -rf load_attempt .bgprocs; btest-bg-run load_attempt bro -b $script; btest-bg-wait -k 2; cat load_attempt/.stderr >>allerrors; else bro -b $script 2>>allerrors; fi done || exit 0
-# @TEST-EXEC: cat allerrors | grep -v "received termination signal" | grep -v '===' | sort | uniq > unique_errors
+# @TEST-EXEC: cat allerrors | grep -v "received termination signal" | fgrep -v -f %INPUT | grep -v '===' | sort | uniq > unique_errors
 # @TEST-EXEC: btest-diff unique_errors
+
+# White-list of tests to exclude because of cyclic load dependencies.
+scripts/base/protocols/ftp/utils.bro
--- a/testing/btest/istate/opaque.bro
+++ b/testing/btest/istate/opaque.bro
@ -12,6 +12,9 @@ global sha1_handle: opaque of sha1 &persistent &synchronized;
 global sha256_handle: opaque of sha256 &persistent &synchronized;
 global entropy_handle: opaque of entropy &persistent &synchronized;

+global bloomfilter_elements: set[string] &persistent &synchronized;
+global bloomfilter_handle: opaque of bloomfilter &persistent &synchronized;
+
 event bro_done()
  {
  local out = open("output.log");
@ -36,6 +39,9 @@ event bro_done()
    print out, entropy_test_finish(entropy_handle);
  else
    print out, "entropy_test_add() failed";
+
+  for ( e in bloomfilter_elements )
+    print bloomfilter_lookup(bloomfilter_handle, e);
  }

@TEST-END-FILE
@ -47,6 +53,9 @@ global sha1_handle: opaque of sha1 &persistent &synchronized;
 global sha256_handle: opaque of sha256 &persistent &synchronized;
 global entropy_handle: opaque of entropy &persistent &synchronized;

+global bloomfilter_elements = { "foo", "bar", "baz" } &persistent &synchronized;
+global bloomfilter_handle: opaque of bloomfilter &persistent &synchronized;
+
 event bro_init()
  {
 	local out = open("expected.log");
@ -72,6 +81,10 @@ event bro_init()
  entropy_handle = entropy_test_init();
  if ( ! entropy_test_add(entropy_handle, "f") )
    print out, "entropy_test_add() failed";
+
+  bloomfilter_handle = bloomfilter_basic_init(0.1, 100);
+  for ( e in bloomfilter_elements )
+    bloomfilter_add(bloomfilter_handle, e);
  }

@TEST-END-FILE
--- a/testing/btest/scripts/base/frameworks/intel/cluster-transparency.bro
+++ b/testing/btest/scripts/base/frameworks/intel/cluster-transparency.bro
@ -28,7 +28,7 @@ event remote_connection_handshake_done(p: event_peer)
 	# Insert the data once both workers are connected.
 	if ( Cluster::local_node_type() == Cluster::MANAGER && Cluster::worker_count == 2 )
 		{
-		Intel::insert([$host=1.2.3.4,$meta=[$source="manager"]]);
+		Intel::insert([$indicator="1.2.3.4", $indicator_type=Intel::ADDR, $meta=[$source="manager"]]);
 		}
 	}

@ -39,7 +39,7 @@ event Intel::cluster_new_item(item: Intel::Item)
 	if ( ! is_remote_event() )
 		return;

-	print fmt("cluster_new_item: %s inserted by %s (from peer: %s)", item$host, item$meta$source, get_event_peer()$descr);
+	print fmt("cluster_new_item: %s inserted by %s (from peer: %s)", item$indicator, item$meta$source, get_event_peer()$descr);

 	if ( ! sent_data )
 		{
@ -47,9 +47,9 @@ event Intel::cluster_new_item(item: Intel::Item)
 		# full cluster is constructed.
 		sent_data = T;
 		if ( Cluster::node == "worker-1" )
-			Intel::insert([$host=123.123.123.123,$meta=[$source="worker-1"]]);
+			Intel::insert([$indicator="123.123.123.123", $indicator_type=Intel::ADDR, $meta=[$source="worker-1"]]);
 		if ( Cluster::node == "worker-2" )
-			Intel::insert([$host=4.3.2.1,$meta=[$source="worker-2"]]);
+			Intel::insert([$indicator="4.3.2.1", $indicator_type=Intel::ADDR, $meta=[$source="worker-2"]]);
 		}

 	# We're forcing worker-2 to do a lookup when it has three intelligence items
--- a/testing/btest/scripts/base/frameworks/intel/input-and-match.bro
+++ b/testing/btest/scripts/base/frameworks/intel/input-and-match.bro
@ -5,10 +5,10 @@
 # @TEST-EXEC: btest-diff broproc/intel.log

@TEST-START-FILE intel.dat
-#fields	host	net	str	str_type	meta.source	meta.desc	meta.url
-1.2.3.4	-	-	-	source1	this host is just plain baaad	http://some-data-distributor.com/1234
-1.2.3.4	-	-	-	source1	this host is just plain baaad	http://some-data-distributor.com/1234
-	-	e@mail.com	Intel::EMAIL	source1	Phishing email source	http://some-data-distributor.com/100000
+#fields	indicator	indicator_type	meta.source	meta.desc	meta.url
+1.2.3.4	Intel::ADDR	source1	this host is just plain baaad	http://some-data-distributor.com/1234
+1.2.3.4	Intel::ADDR	source1	this host is just plain baaad	http://some-data-distributor.com/1234
+e@mail.com	Intel::EMAIL	source1	Phishing email source	http://some-data-distributor.com/100000
@TEST-END-FILE

@load frameworks/communication/listen
@ -18,8 +18,8 @@ redef enum Intel::Where += { SOMEWHERE };

 event do_it()
 	{
-	Intel::seen([$str="e@mail.com",
-	             $str_type=Intel::EMAIL,
+	Intel::seen([$indicator="e@mail.com",
+	             $indicator_type=Intel::EMAIL,
 	             $where=SOMEWHERE]);

 	Intel::seen([$host=1.2.3.4,
--- a/testing/btest/scripts/base/frameworks/intel/read-file-dist-cluster.bro
+++ b/testing/btest/scripts/base/frameworks/intel/read-file-dist-cluster.bro
@ -19,10 +19,10 @@ redef Cluster::nodes = {
@TEST-END-FILE

@TEST-START-FILE intel.dat
-#fields	host	net	str	str_type	meta.source	meta.desc	meta.url
-1.2.3.4	-	-	-	source1	this host is just plain baaad	http://some-data-distributor.com/1234
-1.2.3.4	-	-	-	source1	this host is just plain baaad	http://some-data-distributor.com/1234
-	-	e@mail.com	Intel::EMAIL	source1	Phishing email source	http://some-data-distributor.com/100000
+#fields	indicator	indicator_type	meta.source	meta.desc	meta.url
+1.2.3.4	Intel::ADDR	source1	this host is just plain baaad	http://some-data-distributor.com/1234
+1.2.3.4	Intel::ADDR	source1	this host is just plain baaad	http://some-data-distributor.com/1234
+e@mail.com	Intel::EMAIL	source1	Phishing email source	http://some-data-distributor.com/100000
@TEST-END-FILE

@load base/frameworks/control
@ -41,7 +41,7 @@ redef enum Intel::Where += {
 event do_it()
 	{
 	Intel::seen([$host=1.2.3.4, $where=Intel::IN_A_TEST]);
-	Intel::seen([$str="e@mail.com", $str_type=Intel::EMAIL, $where=Intel::IN_A_TEST]);
+	Intel::seen([$indicator="e@mail.com", $indicator_type=Intel::EMAIL, $where=Intel::IN_A_TEST]);
 	}

 event bro_init()
--- a/testing/btest/scripts/base/protocols/dns/dns-key.bro
+++ b/testing/btest/scripts/base/protocols/dns/dns-key.bro
@ -0,0 +1,4 @@
+# Making sure DNSKEY gets logged as such.
+#
+# @TEST-EXEC: bro -r $TRACES/dns-dnskey.trace
+# @TEST-EXEC: btest-diff dns.log
--- a/testing/btest/scripts/base/utils/active-http.test
+++ b/testing/btest/scripts/base/utils/active-http.test
@ -0,0 +1,28 @@
+# @TEST-REQUIRES: which httpd
+# @TEST-REQUIRES: which python
+#
+# @TEST-EXEC: btest-bg-run httpd python $SCRIPTS/httpd.py --max 1
+# @TEST-EXEC: sleep 3
+# @TEST-EXEC: btest-bg-run bro bro -b %INPUT
+# @TEST-EXEC: btest-bg-wait 15
+# @TEST-EXEC: btest-diff bro/.stdout
+
+@load base/utils/active-http
+
+redef exit_only_after_terminate = T;
+
+event bro_init()
+	{
+	local req = ActiveHTTP::Request($url="localhost:32123");
+
+	when ( local resp = ActiveHTTP::request(req) )
+		{
+		print resp;
+		terminate();
+		}
+	timeout 1min
+		{
+		print "HTTP request timeout";
+		terminate();
+		}
+	}
--- a/testing/btest/scripts/base/utils/dir.test
+++ b/testing/btest/scripts/base/utils/dir.test
@ -0,0 +1,58 @@
+# @TEST-EXEC: btest-bg-run bro bro -b ../dirtest.bro
+# @TEST-EXEC: btest-bg-wait 10
+# @TEST-EXEC: TEST_DIFF_CANONIFIER=$SCRIPTS/diff-sort btest-diff bro/.stdout
+
+@TEST-START-FILE dirtest.bro
+
+@load base/utils/dir
+
+redef exit_only_after_terminate = T;
+
+global c: count = 0;
+
+function check_terminate_condition()
+	{
+	c += 1;
+
+	if ( c == 10 )
+		terminate();
+	}
+
+function new_file1(fname: string)
+	{
+	print "new_file1", fname;
+	check_terminate_condition();
+	}
+
+function new_file2(fname: string)
+	{
+	print "new_file2", fname;
+	check_terminate_condition();
+	}
+
+event change_things()
+	{
+	system("touch ../testdir/newone");
+	system("rm ../testdir/bye && touch ../testdir/bye");
+	}
+
+event bro_init()
+	{
+	Dir::monitor("../testdir", new_file1, .5sec);
+	Dir::monitor("../testdir", new_file2, 1sec);
+	schedule 1sec { change_things() };
+	}
+
+@TEST-END-FILE
+
+@TEST-START-FILE testdir/hi
+123
+@TEST-END-FILE
+
+@TEST-START-FILE testdir/howsitgoing
+abc
+@TEST-END-FILE
+
+@TEST-START-FILE testdir/bye
+!@#
+@TEST-END-FILE
--- a/testing/btest/scripts/base/utils/exec.test
+++ b/testing/btest/scripts/base/utils/exec.test
@ -0,0 +1,74 @@
+# @TEST-EXEC: btest-bg-run bro bro -b ../exectest.bro
+# @TEST-EXEC: btest-bg-wait 10
+# @TEST-EXEC: TEST_DIFF_CANONIFIER=$SCRIPTS/diff-sort btest-diff bro/.stdout
+
+@TEST-START-FILE exectest.bro
+
+@load base/utils/exec
+
+redef exit_only_after_terminate = T;
+
+global c: count = 0;
+
+function check_exit_condition()
+	{
+	c += 1;
+
+	if ( c == 4 )
+		terminate();
+	}
+
+function test_cmd(label: string, cmd: Exec::Command)
+	{
+	when ( local result = Exec::run(cmd) )
+		{
+		print label, result;
+		check_exit_condition();
+		}
+	}
+
+event bro_init()
+	{
+	test_cmd("test1", [$cmd="bash ../somescript.sh",
+	                   $read_files=set("out1", "out2")]);
+	test_cmd("test2", [$cmd="bash ../nofiles.sh"]);
+	test_cmd("test3", [$cmd="bash ../suicide.sh"]);
+	test_cmd("test4", [$cmd="bash ../stdin.sh", $stdin="hibye"]);
+	}
+
+@TEST-END-FILE
+
+@TEST-START-FILE somescript.sh
+#! /usr/bin/env bash
+echo "insert text here" > out1
+echo "and here" >> out1
+echo "insert more text here" > out2
+echo "and there" >> out2
+echo "done"
+echo "exit"
+echo "stop"
+@TEST-END-FILE
+
+@TEST-START-FILE nofiles.sh
+#! /usr/bin/env bash
+echo "here's something on stdout"
+echo "some more stdout"
+echo "last stdout"
+echo "and some stderr" 1>&2
+echo "more stderr" 1>&2
+echo "last stderr" 1>&2
+exit 1
+@TEST-END-FILE
+
+@TEST-START-FILE suicide.sh
+#! /usr/bin/env bash
+echo "FML"
+kill -9 $$
+echo "nope"
+@TEST-END-FILE
+
+@TEST-START-FILE stdin.sh
+#! /usr/bin/env bash
+read -r line
+echo "$line"
+@TEST-END-FILE
--- a/testing/scripts/httpd.py
+++ b/testing/scripts/httpd.py
@ -0,0 +1,40 @@
+#! /usr/bin/env python
+
+import BaseHTTPServer
+
+class MyRequestHandler(BaseHTTPServer.BaseHTTPRequestHandler):
+
+    def do_GET(self):
+        self.send_response(200)
+        self.send_header("Content-type", "text/plain")
+        self.end_headers()
+        self.wfile.write("It works!")
+
+    def version_string(self):
+        return "1.0"
+
+    def date_time_string(self):
+        return "July 22, 2013"
+
+
+if __name__ == "__main__":
+    from optparse import OptionParser
+    p = OptionParser()
+    p.add_option("-a", "--addr", type="string", default="localhost",
+                 help=("listen on given address (numeric IP or host name), "
+                       "an empty string (the default) means INADDR_ANY"))
+    p.add_option("-p", "--port", type="int", default=32123,
+                 help="listen on given TCP port number")
+    p.add_option("-m", "--max", type="int", default=-1,
+                 help="max number of requests to respond to, -1 means no max")
+    options, args = p.parse_args()
+
+    httpd = BaseHTTPServer.HTTPServer((options.addr, options.port),
+                                      MyRequestHandler)
+    if options.max == -1:
+        httpd.serve_forever()
+    else:
+        served_count = 0
+        while served_count != options.max:
+            httpd.handle_request()
+            served_count += 1
 @ -1 +1 @@
 .1-824
 .1-945