Commit graph

64 commits

Author SHA1 Message Date
Matthias Vallentin
673607f9a7 Switch to double hashing.
For large k, standard hashing imposes an unnecessary overhead. By switchting to
double hashing, we invoke the hash function code at most two times.
2014-06-05 16:02:25 +02:00
Matthias Vallentin
1d50874256 Use full digest length instead of just one byte.
When our universal hash function fell back to MD5 for inputs larger than
supported by H3, the computation only returned the first byte of the MD5 result
instead of as many bytes as needed to cover sizeof(Hasher::digest).
2014-06-05 16:01:20 +02:00
Matthias Vallentin
cb4eaf762c Fix bug when clearing Bloom filter contents.
This patch fixes a bug that occurred when calling the BiF bloomfilter_clear,
which used to not only clear the underlying bit vector but also set its size to
zero. As a result, subsequent element access or computations using the bit
vector size caused erroneous behavior.

Reported by @colonelxc.
2014-04-15 12:48:56 +02:00
Bernhard Amann
b3bd509b3f Allow iterating over bif functions with result type vector of any.
This changes the internal type that is used to signal that a vector
is unspecified from any to void.

I tried to verify that the behavior of Bro is still the same. After
a lot of playing around, I think everything still should worl as before.

However, it might be good for someone to take a look at this.

addresses BIT-1144
2014-02-25 15:30:29 -08:00
Daniel Thayer
6f06705c23 Fix typos in BIF documentation
Fixed typos in documentation of hexstr_to_bytestring.
Also added documentation that was missing for function parameters
and return values of other BIFs.
2013-11-22 14:49:16 -06:00
Daniel Thayer
8c3adc9df6 Fix typos and formatting in the BiFs docs 2013-10-17 01:04:20 -05:00
Robin Sommer
0fe474e232 Polishing the reference section of the manual.
Mostly resorting and renaming a few things.
2013-10-07 15:53:46 -07:00
Robin Sommer
c6de23ebe1 Merge remote-tracking branch 'origin/topic/bernhard/ticket1072'
* origin/topic/bernhard/ticket1072:
  and const 2 more functions
  update hll documentation, make a few functions private and create a new copy constructor.
  fix case where hll_error_margin could be undefined (thanks John)

BIT-1072 #merged
2013-09-18 15:00:06 -07:00
Bernhard Amann
ecc20b932a and const 2 more functions 2013-09-16 11:00:54 -07:00
Bernhard Amann
c0f780c728 update hll documentation, make a few functions private and create
a new copy constructor.
2013-09-16 10:40:25 -07:00
Daniel Thayer
1d33883dfc Fix compiler warnings 2013-09-13 00:30:18 -05:00
Jon Siwek
0b97343ff7 Fix various potential memory leaks.
Though I expect most not to be exercised in practice.
2013-09-12 15:23:52 -05:00
Daniel Thayer
ee1312f2ad Fix an error seen when building documentation 2013-09-10 11:22:14 -05:00
Jon Siwek
db470a637a Documentation fixes.
This cleans up most of the warnings from sphinx (broken :doc: links,
broxygen role misuses, etc.).  The remaining ones should be harmless,
but not quick to silence.

I found that the README for each component was a copy from the actual
repo, so I turned those in to symlinks so they don't get out of date.
2013-09-03 15:59:40 -05:00
Robin Sommer
de5bb65ff7 Removing the "uint8*" methods from SerializationFormat.
They conflict with the "char" version, so that other classes would now
pick the wrong one. Added a bit of casting to HLL to use the "char"
versions instead.
2013-08-31 11:17:49 -07:00
Robin Sommer
6f9d28cc18 Merge branch 'topic/robin/hyperloglog-merge'
* topic/robin/hyperloglog-merge: (35 commits)
  Making the confidence configurable.
  Renaming HyperLogLog->CardinalityCounter.
  Fixing bug introduced during merging.
  add clustered leak test for hll. No issues.
  make gcc happy
  (hopefully) fix refcounting problem in hll/bloom-filter opaque vals. Thanks Robin.
  re-use same hash class for all add operations
  get hll ready for merging
  and forgot a file...
  adapt to new structure
  fix opaqueval-related memleak.
  make it compile on case-sensitive file systems and fix warnings
  make error rate configureable
  add persistence test not using predetermined random seeds.
  update cluster test to also use hll
  persistence really works.
  well, with this commit synchronizing the data structure should work.. ...if we had consistent hashing.
  and also serialize the other things we need
  ok, this bug was hard to find.
  serialization compiles.
  ...
2013-08-31 10:42:42 -07:00
Robin Sommer
295987c8d0 Making the confidence configurable. 2013-08-31 10:34:50 -07:00
Robin Sommer
fb3ceae6d5 Renaming HyperLogLog->CardinalityCounter.
For consistency with the class' name.
2013-08-31 10:22:27 -07:00
Robin Sommer
ef04ce809b Fixing bug introduced during merging. 2013-08-31 10:17:13 -07:00
Robin Sommer
4dcf8fc0db Merge remote-tracking branch 'origin/topic/bernhard/hyperloglog'
* origin/topic/bernhard/hyperloglog: (32 commits)
  add clustered leak test for hll. No issues.
  make gcc happy
  (hopefully) fix refcounting problem in hll/bloom-filter opaque vals. Thanks Robin.
  re-use same hash class for all add operations
  get hll ready for merging
  and forgot a file...
  adapt to new structure
  fix opaqueval-related memleak.
  make it compile on case-sensitive file systems and fix warnings
  make error rate configureable
  add persistence test not using predetermined random seeds.
  update cluster test to also use hll
  persistence really works.
  well, with this commit synchronizing the data structure should work.. ...if we had consistent hashing.
  and also serialize the other things we need
  ok, this bug was hard to find.
  serialization compiles.
  change plugin after feedback of seth
  Forgot a file. Again. Like always. Basically.
  do away with old file.
  ...
2013-08-30 11:30:05 -07:00
Bernhard Amann
2dd0d057e6 Merge remote-tracking branch 'origin/master' into topic/bernhard/hyperloglog
Conflicts:
	src/NetVar.cc
	src/NetVar.h
2013-08-30 08:43:47 -07:00
Jon Siwek
fb8b78840b Fix bloom filter memory leaks. 2013-08-29 11:24:24 -05:00
Bernhard Amann
74f96d22ef Merge remote branch 'origin/master' into topic/bernhard/hyperloglog
Conflicts:
	src/3rdparty
2013-08-26 12:53:13 -07:00
Matthias Vallentin
516e044e34 Use Bro-style platform-independent integer types. 2013-08-16 13:29:52 -07:00
Jon Siwek
774dadfe9a Change bloom filter's dependence on size_t.
That type can vary across platforms, but factored in to a bloom
filter's internal state, e.g. size of the seed.
2013-08-16 12:39:21 -05:00
Bernhard Amann
baef38976d Merge remote-tracking branch 'origin/topic/bernhard/hyperloglog' into topic/bernhard/hyperloglog 2013-08-12 09:50:43 -07:00
Bernhard Amann
d83edf8068 Merge remote-tracking branch 'origin/master' into topic/bernhard/hyperloglog
Conflicts:
	src/NetVar.cc
	src/NetVar.h
	src/SerialTypes.h
	src/probabilistic/CMakeLists.txt
	testing/btest/scripts/base/frameworks/sumstats/basic-cluster.bro
	testing/btest/scripts/base/frameworks/sumstats/basic.bro
2013-08-12 09:47:53 -07:00
Robin Sommer
1b40412818 Merge remote-tracking branch 'origin/topic/bernhard/topk'
* origin/topic/bernhard/topk:
  3 more functions to document.

Conflicts:
	src/probabilistic/Topk.h
2013-08-01 15:43:33 -07:00
Robin Sommer
04ccb12183 Merge branch 'topic/robin/topk-merge'
BIT-1048 #merged

I'm reverting the serializer version update for now as that breaks
Broccoli. Let's do that later for 2.2.

* topic/robin/topk-merge:
  update documentation, rename get* to Get* and make hasher persistent
  adapt to new folder structure
  fix opaqueval-related memleak
  synchronize pruned attribute
  potentially found wrong Ref.
  add sum function that can be used to get the number of total observed elements.
  in cluster settings, the resultvals can apparently been uninitialized in some special cases
  fix memory leaks
  fix warnings
  add topk cluster test
  make size of topk-list configureable when using sumstats
  implement merging for top-k.
  add serialization for topk
  make the get function const
  topk for sumstats
  well, a test that works..
  implement topk.
2013-08-01 14:39:16 -07:00
Robin Sommer
f6e5de91fa Merge remote-tracking branch 'origin/topic/bernhard/topk' into topic/robin/topk-merge
* origin/topic/bernhard/topk:
  update documentation, rename get* to Get* and make hasher persistent

Conflicts:
	src/probabilistic/Topk.cc
	src/probabilistic/Topk.h
	src/probabilistic/top-k.bif
2013-08-01 14:13:25 -07:00
Bernhard Amann
3c0be74759 3 more functions to document. 2013-08-01 14:13:20 -07:00
Bernhard Amann
6a45a67eb5 update documentation, rename get* to Get* and make hasher
persistent
2013-08-01 14:07:39 -07:00
Robin Sommer
32a403cdaf Merge branch 'topic/robin/bloom-filter-merge'
* topic/robin/bloom-filter-merge:
  Using a real hash function for hashing a BitVector's internal state.
  Support UHF hashing for >= UHASH_KEY_SIZE bytes.
  Changing the Bloom filter hashing so that it's independent of CompositeHash.
  Add new BiF for low-level Bloom filter initialization.
  Introduce global_hash_seed script variable.

Conflicts:
	testing/btest/Baseline/bifs.bloomfilter/output
2013-08-01 10:52:08 -07:00
Robin Sommer
7ab2170641 Using a real hash function for hashing a BitVector's internal state. 2013-08-01 10:46:05 -07:00
Robin Sommer
00e4369eae Merge remote-tracking branch 'origin/topic/matthias/bloom-filter' into topic/robin/bloom-filter-merge
* origin/topic/matthias/bloom-filter:
  Support UHF hashing for >= UHASH_KEY_SIZE bytes.
2013-08-01 10:38:33 -07:00
Robin Sommer
81dcda3eb4 Merge remote-tracking branch 'origin/topic/bernhard/topk'
* origin/topic/bernhard/topk:
  adapt to new folder structure
  fix opaqueval-related memleak
  synchronize pruned attribute
  potentially found wrong Ref.
  add sum function that can be used to get the number of total observed elements.
  in cluster settings, the resultvals can apparently been uninitialized in some special cases
  fix memory leaks
  fix warnings
  add topk cluster test
  make size of topk-list configureable when using sumstats
  implement merging for top-k.
  add serialization for topk
  make the get function const
  topk for sumstats
  well, a test that works..
  implement topk.
2013-08-01 10:27:18 -07:00
Matthias Vallentin
34965b4e77 Support UHF hashing for >= UHASH_KEY_SIZE bytes. 2013-08-01 19:15:28 +02:00
Robin Sommer
86dcea3b35 Merge remote-tracking branch 'origin/fastpath'
Slightly adapted after discussing with Bernhard. I also added one
further check.

* origin/fastpath:
  fix segfault that could be caused by merging an empty bloom-filter with a bloom-filter already containing values.
2013-07-31 20:09:37 -07:00
Robin Sommer
2a0790c231 Changing the Bloom filter hashing so that it's independent of
CompositeHash.

We do this by hashing values added to a BloomFilter another time more
with a stable hash seeded only by either the filter's name or the
global_hash_seed (or Bro's random() seed if neither is defined).

I'm also adding a new bif bloomfilter_internal_state() that returns a
string representation of a Bloom filter's current internal state. This
is solely for writing tests that check that the filters end up
consistent when seeded with the same value.
2013-07-31 19:56:34 -07:00
Bernhard Amann
39c0f5abad make gcc happy 2013-07-31 12:43:33 -07:00
Bernhard Amann
5122bf4a7c adapt to new folder structure 2013-07-31 12:06:59 -07:00
Robin Sommer
6c197fbebf Merge remote-tracking branch 'origin/topic/matthias/bloom-filter'
* origin/topic/matthias/bloom-filter:
  Add new BiF for low-level Bloom filter initialization.
  Introduce global_hash_seed script variable.
2013-07-31 11:41:08 -07:00
Matthias Vallentin
d50b8a147d Add new BiF for low-level Bloom filter initialization.
For symmetry reasons, the new Bif bloomfilter_basic_init2 also allows users to
manually specify the memory bounds and number of hash functions to use.
2013-07-31 18:21:37 +02:00
Matthias Vallentin
8ca76dd4ee Introduce global_hash_seed script variable.
This commit adds support for script-level specification of a seed to be used by
hashers. For example, if the given name of a Bloom filter is not empty, then
the seed used by the underlying hasher only depends on the Bloom filter name.
If the name is empty, we check whether the user defined a non-empty
global_hash_seed string variable at script and use it instead. If that script
variable does not exist, then we fall back to the initial seed computed a
Bro startup (which is affected ultimately by $BRO_SEED).

See Hasher::MakeSeed for details.
2013-07-31 17:59:08 +02:00
Bernhard Amann
83ce77e575 re-use same hash class for all add operations 2013-07-30 18:48:05 -07:00
Bernhard Amann
18c10f3cb5 get hll ready for merging 2013-07-30 16:47:26 -07:00
Bernhard Amann
edb04e6d8b fix segfault that could be caused by merging an empty bloom-filter
with a bloom-filter already containing values.

I assume that it is ok to merge an empty bloom-filter with any bloom-filter -
if not we have to change the patch to return an error in this case.
2013-07-30 16:10:06 -07:00
Bernhard Amann
5b9d80e50d Merge remote-tracking branch 'origin/master' into topic/bernhard/hyperloglog 2013-07-30 14:31:09 -07:00
Robin Sommer
629c331ca0 Merge remote-tracking branch 'origin/topic/matthias/bloom-filter'
* origin/topic/matthias/bloom-filter:
  Update submodules.
  Make hashers serializable.
  Add docs and use default value for hasher names.
2013-07-30 10:06:44 -07:00
Matthias Vallentin
9ad7121fed Merge remote-tracking branch 'origin/master' into topic/matthias/bloom-filter
Conflicts:
	src/probabilistic/Hasher.h
2013-07-30 12:12:27 +02:00