Commit graph

177 commits

Author SHA1 Message Date
Robin Sommer
86dcea3b35 Merge remote-tracking branch 'origin/fastpath'
Slightly adapted after discussing with Bernhard. I also added one
further check.

* origin/fastpath:
  fix segfault that could be caused by merging an empty bloom-filter with a bloom-filter already containing values.
2013-07-31 20:09:37 -07:00
Robin Sommer
2a0790c231 Changing the Bloom filter hashing so that it's independent of
CompositeHash.

We do this by hashing values added to a BloomFilter another time more
with a stable hash seeded only by either the filter's name or the
global_hash_seed (or Bro's random() seed if neither is defined).

I'm also adding a new bif bloomfilter_internal_state() that returns a
string representation of a Bloom filter's current internal state. This
is solely for writing tests that check that the filters end up
consistent when seeded with the same value.
2013-07-31 19:56:34 -07:00
Bernhard Amann
39c0f5abad make gcc happy 2013-07-31 12:43:33 -07:00
Bernhard Amann
5122bf4a7c adapt to new folder structure 2013-07-31 12:06:59 -07:00
Robin Sommer
6c197fbebf Merge remote-tracking branch 'origin/topic/matthias/bloom-filter'
* origin/topic/matthias/bloom-filter:
  Add new BiF for low-level Bloom filter initialization.
  Introduce global_hash_seed script variable.
2013-07-31 11:41:08 -07:00
Matthias Vallentin
d50b8a147d Add new BiF for low-level Bloom filter initialization.
For symmetry reasons, the new Bif bloomfilter_basic_init2 also allows users to
manually specify the memory bounds and number of hash functions to use.
2013-07-31 18:21:37 +02:00
Matthias Vallentin
8ca76dd4ee Introduce global_hash_seed script variable.
This commit adds support for script-level specification of a seed to be used by
hashers. For example, if the given name of a Bloom filter is not empty, then
the seed used by the underlying hasher only depends on the Bloom filter name.
If the name is empty, we check whether the user defined a non-empty
global_hash_seed string variable at script and use it instead. If that script
variable does not exist, then we fall back to the initial seed computed a
Bro startup (which is affected ultimately by $BRO_SEED).

See Hasher::MakeSeed for details.
2013-07-31 17:59:08 +02:00
Bernhard Amann
83ce77e575 re-use same hash class for all add operations 2013-07-30 18:48:05 -07:00
Bernhard Amann
18c10f3cb5 get hll ready for merging 2013-07-30 16:47:26 -07:00
Bernhard Amann
edb04e6d8b fix segfault that could be caused by merging an empty bloom-filter
with a bloom-filter already containing values.

I assume that it is ok to merge an empty bloom-filter with any bloom-filter -
if not we have to change the patch to return an error in this case.
2013-07-30 16:10:06 -07:00
Bernhard Amann
5b9d80e50d Merge remote-tracking branch 'origin/master' into topic/bernhard/hyperloglog 2013-07-30 14:31:09 -07:00
Robin Sommer
629c331ca0 Merge remote-tracking branch 'origin/topic/matthias/bloom-filter'
* origin/topic/matthias/bloom-filter:
  Update submodules.
  Make hashers serializable.
  Add docs and use default value for hasher names.
2013-07-30 10:06:44 -07:00
Matthias Vallentin
9ad7121fed Merge remote-tracking branch 'origin/master' into topic/matthias/bloom-filter
Conflicts:
	src/probabilistic/Hasher.h
2013-07-30 12:12:27 +02:00
Bernhard Amann
32c2885742 Merge remote-tracking branch 'origin/master' into topic/bernhard/hyperloglog
Conflicts:
	src/Func.cc
	src/probabilistic/CMakeLists.txt
2013-07-25 14:46:38 -07:00
Robin Sommer
c11bf3d922 Fixing serialization bug introduced during earlier merge. 2013-07-25 11:29:13 -07:00
Robin Sommer
b97e045c9a Merge branch 'master' into topic/robin/bloom-filter-merge 2013-07-25 10:18:46 -07:00
Matthias Vallentin
2fc5ca53ff Make hashers serializable.
There exists still a small bug that I could not find; the unit test
istate/opaque.bro fails. If someone sees why, please chime in.
2013-07-25 17:35:35 +02:00
Matthias Vallentin
e482897f88 Add docs and use default value for hasher names. 2013-07-25 15:16:53 +02:00
Robin Sommer
599dadf30b Merge branch 'topic/robin/bloom-filter-merge'
* topic/robin/bloom-filter-merge: (50 commits)
  Support emptiness check on Bloom filters.
  Refactor Bloom filter merging.
  Add bloomfilter_clear() BiF.
  Updating NEWS.
  Broifying the code.
  Implement and test Bloom filter merging.
  Make hash functions equality comparable.
  Make counter vectors mergeable.
  Use half adder for bitwise addition and subtraction.
  Fix and test counting Bloom filter.
  Implement missing CounterVector functions.
  Tweak hasher interface.
  Add missing include for GCC.
  Fixing for unserializion error.
  Small fixes and style tweaks.
  Only serialize Bloom filter type if available.
  Create hash policies through factory.
  Remove lingering debug code.
  Factor implementation and change interface.
  Expose Bro's linear congruence PRNG as utility function.
  ...
2013-07-24 15:51:10 -07:00
Robin Sommer
23b352b702 Merge remote-tracking branch 'origin/topic/matthias/bloom-filter' into topic/robin/bloom-filter-merge
* origin/topic/matthias/bloom-filter:
  Support emptiness check on Bloom filters.
  Refactor Bloom filter merging.
  Add bloomfilter_clear() BiF.
2013-07-24 15:39:50 -07:00
Bernhard Amann
efdffaec9e and forgot a file... 2013-07-24 12:51:31 -07:00
Bernhard Amann
b7cdfc0e6e adapt to new structure 2013-07-24 12:50:01 -07:00
Matthias Vallentin
5769c32f1e Support emptiness check on Bloom filters. 2013-07-24 13:18:19 +02:00
Matthias Vallentin
5736aef440 Refactor Bloom filter merging. 2013-07-24 13:05:38 +02:00
Matthias Vallentin
5383e8f75b Add bloomfilter_clear() BiF. 2013-07-24 11:21:10 +02:00
Robin Sommer
474107fe40 Broifying the code.
Also extending API documentation a bit more and fixing a memory leak.
2013-07-23 20:10:32 -07:00
Robin Sommer
21685d2529 Merge remote-tracking branch 'origin/topic/matthias/bloom-filter'
I'm moving the new files into a subdirectory probabilistic, and into a
corresponding namespace. We can later put code for the other
probabilistic data structures there as well.

* origin/topic/matthias/bloom-filter: (45 commits)
  Implement and test Bloom filter merging.
  Make hash functions equality comparable.
  Make counter vectors mergeable.
  Use half adder for bitwise addition and subtraction.
  Fix and test counting Bloom filter.
  Implement missing CounterVector functions.
  Tweak hasher interface.
  Add missing include for GCC.
  Fixing for unserializion error.
  Small fixes and style tweaks.
  Only serialize Bloom filter type if available.
  Create hash policies through factory.
  Remove lingering debug code.
  Factor implementation and change interface.
  Expose Bro's linear congruence PRNG as utility function.
  H3 does not check for zero length input.
  Support seeding for hashers.
  Add utility function to access first random seed.
  Update H3 documentation (and minor style nits.)
  Make H3 seed configurable.
  ...
2013-07-23 16:40:56 -07:00