Commit graph

16 commits

Author SHA1 Message Date
Robin Sommer
4d84ee82da Merge remote-tracking branch 'origin/topic/johanna/bit-1612'
Addig a new random seed for external tests.

I added a wrapper around the siphash() function to make calling it a
little bit safer at least.

BIT-1612 #merged

* origin/topic/johanna/bit-1612:
  HLL: Fix missing typecast in test case.
  Remove the -K/-J options for setting keys.
  Add test checking the quality of HLL by adding a lot of elements.
  Fix serializing probabilistic hashers.
  Baseline updates after hash function change.
  Also switch BloomFilters from H3 to siphash.
  Change Hashing from H3 to Siphash.
  HLL: Remove unnecessary comparison.
  Hyperloglog: change calculation of Rho
2016-07-14 16:26:17 -07:00
Johanna Amann
4a14fd4688 Fix serializing probabilistic hashers. 2016-07-13 10:12:17 -07:00
Johanna Amann
f1bae871e9 Also switch BloomFilters from H3 to siphash.
This removes all dependencies on H3 in our source tree.
2016-07-13 09:04:10 -07:00
Matthias Vallentin
1d50874256 Use full digest length instead of just one byte.
When our universal hash function fell back to MD5 for inputs larger than
supported by H3, the computation only returned the first byte of the MD5 result
instead of as many bytes as needed to cover sizeof(Hasher::digest).
2014-06-05 16:01:20 +02:00
Daniel Thayer
1d33883dfc Fix compiler warnings 2013-09-13 00:30:18 -05:00
Matthias Vallentin
516e044e34 Use Bro-style platform-independent integer types. 2013-08-16 13:29:52 -07:00
Jon Siwek
774dadfe9a Change bloom filter's dependence on size_t.
That type can vary across platforms, but factored in to a bloom
filter's internal state, e.g. size of the seed.
2013-08-16 12:39:21 -05:00
Robin Sommer
00e4369eae Merge remote-tracking branch 'origin/topic/matthias/bloom-filter' into topic/robin/bloom-filter-merge
* origin/topic/matthias/bloom-filter:
  Support UHF hashing for >= UHASH_KEY_SIZE bytes.
2013-08-01 10:38:33 -07:00
Matthias Vallentin
34965b4e77 Support UHF hashing for >= UHASH_KEY_SIZE bytes. 2013-08-01 19:15:28 +02:00
Robin Sommer
2a0790c231 Changing the Bloom filter hashing so that it's independent of
CompositeHash.

We do this by hashing values added to a BloomFilter another time more
with a stable hash seeded only by either the filter's name or the
global_hash_seed (or Bro's random() seed if neither is defined).

I'm also adding a new bif bloomfilter_internal_state() that returns a
string representation of a Bloom filter's current internal state. This
is solely for writing tests that check that the filters end up
consistent when seeded with the same value.
2013-07-31 19:56:34 -07:00
Robin Sommer
6c197fbebf Merge remote-tracking branch 'origin/topic/matthias/bloom-filter'
* origin/topic/matthias/bloom-filter:
  Add new BiF for low-level Bloom filter initialization.
  Introduce global_hash_seed script variable.
2013-07-31 11:41:08 -07:00
Matthias Vallentin
8ca76dd4ee Introduce global_hash_seed script variable.
This commit adds support for script-level specification of a seed to be used by
hashers. For example, if the given name of a Bloom filter is not empty, then
the seed used by the underlying hasher only depends on the Bloom filter name.
If the name is empty, we check whether the user defined a non-empty
global_hash_seed string variable at script and use it instead. If that script
variable does not exist, then we fall back to the initial seed computed a
Bro startup (which is affected ultimately by $BRO_SEED).

See Hasher::MakeSeed for details.
2013-07-31 17:59:08 +02:00
Robin Sommer
629c331ca0 Merge remote-tracking branch 'origin/topic/matthias/bloom-filter'
* origin/topic/matthias/bloom-filter:
  Update submodules.
  Make hashers serializable.
  Add docs and use default value for hasher names.
2013-07-30 10:06:44 -07:00
Matthias Vallentin
2fc5ca53ff Make hashers serializable.
There exists still a small bug that I could not find; the unit test
istate/opaque.bro fails. If someone sees why, please chime in.
2013-07-25 17:35:35 +02:00
Robin Sommer
474107fe40 Broifying the code.
Also extending API documentation a bit more and fixing a memory leak.
2013-07-23 20:10:32 -07:00
Robin Sommer
21685d2529 Merge remote-tracking branch 'origin/topic/matthias/bloom-filter'
I'm moving the new files into a subdirectory probabilistic, and into a
corresponding namespace. We can later put code for the other
probabilistic data structures there as well.

* origin/topic/matthias/bloom-filter: (45 commits)
  Implement and test Bloom filter merging.
  Make hash functions equality comparable.
  Make counter vectors mergeable.
  Use half adder for bitwise addition and subtraction.
  Fix and test counting Bloom filter.
  Implement missing CounterVector functions.
  Tweak hasher interface.
  Add missing include for GCC.
  Fixing for unserializion error.
  Small fixes and style tweaks.
  Only serialize Bloom filter type if available.
  Create hash policies through factory.
  Remove lingering debug code.
  Factor implementation and change interface.
  Expose Bro's linear congruence PRNG as utility function.
  H3 does not check for zero length input.
  Support seeding for hashers.
  Add utility function to access first random seed.
  Update H3 documentation (and minor style nits.)
  Make H3 seed configurable.
  ...
2013-07-23 16:40:56 -07:00