Factor implementation and change interface.

When constructing a Bloom filter, one now has to pass a HashPolicy instance to
it. This separates more clearly the concerns of hashing and Bloom filter
management.

This commit also changes the interface to initialize Bloom filters: there exist
now two initialization functions, one for each type:

  (1) bloomfilter_basic_init(fp: double,
                             capacity: count,
                             name: string &default=""): opaque of bloomfilter

  (2) bloomfilter_counting_init(k: count,
                                cells: count,
                                max: count,
                                name: string &default=""): opaque of bloomfilter

The BiFs for adding elements and performing lookups remain the same. This
essentially gives us "BiF polymorphism" at script land, where the
initialization BiF constructs the most derived type while subsequent BiFs
adhere to the same interface.

The reason why we split up the constructor in this case is that we have not yet
derived the math that computes the optimal number of hash functions for
counting Bloom filters---users have to explicitly parameterize them for now.
This commit is contained in:
Matthias Vallentin 2013-06-17 16:06:02 -07:00
parent 9f74064289
commit 532fbfb4d2
11 changed files with 409 additions and 319 deletions

View file

@ -4,7 +4,7 @@
event bro_init()
{
# Basic usage with counts.
local bf_cnt = bloomfilter_init(0.1, 1000);
local bf_cnt = bloomfilter_basic_init(0.1, 1000);
bloomfilter_add(bf_cnt, 42);
bloomfilter_add(bf_cnt, 84);
bloomfilter_add(bf_cnt, 168);
@ -16,23 +16,23 @@ event bro_init()
bloomfilter_add(bf_cnt, "foo"); # Type mismatch
# Basic usage with strings.
local bf_str = bloomfilter_init(0.9, 10);
local bf_str = bloomfilter_basic_init(0.9, 10);
bloomfilter_add(bf_str, "foo");
bloomfilter_add(bf_str, "bar");
print bloomfilter_lookup(bf_str, "foo");
print bloomfilter_lookup(bf_str, "bar");
print bloomfilter_lookup(bf_str, "baz"); # FP
print bloomfilter_lookup(bf_str, "qux"); # FP
print bloomfilter_lookup(bf_str, "b4z"); # FP
print bloomfilter_lookup(bf_str, "quux"); # FP
bloomfilter_add(bf_str, 0.5); # Type mismatch
bloomfilter_add(bf_str, 100); # Type mismatch
# Edge cases.
local bf_edge0 = bloomfilter_init(0.000000000001, 1);
local bf_edge1 = bloomfilter_init(0.00000001, 100000000);
local bf_edge2 = bloomfilter_init(0.9999999, 1);
local bf_edge3 = bloomfilter_init(0.9999999, 100000000000);
local bf_edge0 = bloomfilter_basic_init(0.000000000001, 1);
local bf_edge1 = bloomfilter_basic_init(0.00000001, 100000000);
local bf_edge2 = bloomfilter_basic_init(0.9999999, 1);
local bf_edge3 = bloomfilter_basic_init(0.9999999, 100000000000);
# Invalid parameters.
local bf_bug0 = bloomfilter_init(-0.5, 42);
local bf_bug1 = bloomfilter_init(1.1, 42);
local bf_bug0 = bloomfilter_basic_init(-0.5, 42);
local bf_bug1 = bloomfilter_basic_init(1.1, 42);
}

View file

@ -82,7 +82,7 @@ event bro_init()
if ( ! entropy_test_add(entropy_handle, "f") )
print out, "entropy_test_add() failed";
bloomfilter_handle = bloomfilter_init(0.1, 100);
bloomfilter_handle = bloomfilter_basic_init(0.1, 100);
for ( e in bloomfilter_elements )
bloomfilter_add(bloomfilter_handle, e);
}