Commit graph

177 commits

Author SHA1 Message Date
Johanna Amann
ffa6756255 Merge branch 'master' of https://github.com/rbclark/bro into topic/johanna/md5-fips
* 'master' of https://github.com/rbclark/bro:
  Tell OpenSSL that MD5 is not used for security in order to allow bro to work properly on a FIPS system
2019-01-18 15:34:06 -08:00
Robert Clark
a72e9a8126
Tell OpenSSL that MD5 is not used for security in order to allow bro to work properly on a FIPS system 2019-01-11 16:09:42 -05:00
Jon Siwek
2982765128 Pre-allocate and re-use Vals for bool, int, count, enum and empty string 2019-01-09 18:29:23 -06:00
Matthias Vallentin
74c6b9f54c Remove unnessary check
The call to Empty() was originally meant as an optimization in the
lookup phase. However, the performance implications are substantial:
this check operates in O(f(m/8)) where m is the number of bits in the
Bloom filters and f a function that looks for the first non-empty block
of bits.

As the Bloom filter fills up, the check for Empty() becomes no longer
negligible and can lead to serious performance degradations when Bloom
filters are used frequently.
2018-11-07 13:11:15 -08:00
Jon Siwek
0c9878f136 Fix strict-aliasing compiler warning 2018-08-29 17:18:56 -05:00
Johanna Amann
6d612ced3d Mark one-parameter constructors as explicit & use override where possible
This commit marks (hopefully) ever one-parameter constructor as explicit.

It also uses override in (hopefully) all circumstances where a virtual
method is overridden.

There are a very few other minor changes - most of them were necessary
to get everything to compile (like one additional constructor). In one
case I changed an implicit operation to an explicit string conversion -
I think the automatically chosen conversion was much more convoluted.

This took longer than I want to admit but not as long as I feared :)
2018-03-27 07:17:32 -07:00
Robin Sommer
4d84ee82da Merge remote-tracking branch 'origin/topic/johanna/bit-1612'
Addig a new random seed for external tests.

I added a wrapper around the siphash() function to make calling it a
little bit safer at least.

BIT-1612 #merged

* origin/topic/johanna/bit-1612:
  HLL: Fix missing typecast in test case.
  Remove the -K/-J options for setting keys.
  Add test checking the quality of HLL by adding a lot of elements.
  Fix serializing probabilistic hashers.
  Baseline updates after hash function change.
  Also switch BloomFilters from H3 to siphash.
  Change Hashing from H3 to Siphash.
  HLL: Remove unnecessary comparison.
  Hyperloglog: change calculation of Rho
2016-07-14 16:26:17 -07:00
Johanna Amann
4a14fd4688 Fix serializing probabilistic hashers. 2016-07-13 10:12:17 -07:00
Johanna Amann
f1bae871e9 Also switch BloomFilters from H3 to siphash.
This removes all dependencies on H3 in our source tree.
2016-07-13 09:04:10 -07:00
Johanna Amann
e1218cc7fa Change Hashing from H3 to Siphash.
This commit mostly changes the hash function that is used for Internal
hashing of data < 36 bytes from H3 to Siphash. This change is motivated
by the fact that it turns out that H3 apparently does not deliver a very
good source of data uniqueness; running HLL with H3 as a hashing
function results in quite poor results (up to of 75% off in my tests).
In difference, running HLL with Siphash (or HMAC-MD5) changes this
factor to ~2%.

This also fixes a long-standing bug in Hash.h which truncated our hash
values to 32 bit on most machines.

Furthermore, it once again fixes a problem with the Rank function in
HLL.
2016-07-13 06:44:51 -07:00
Johanna Amann
b7c64c4522 HLL: Remove unnecessary comparison.
Rank always returns at least 1, hence this check is not necessary.
2016-06-15 11:33:37 -07:00
Johanna Amann
3aabe83ec6 Hyperloglog: change calculation of Rho
This commit changes the calculation of the rho-value to be in line with
the implementation of the original research paper, counting the number
of zero bits before the data.

This also fixes an infinite loop in case the hash value is 0.

I also cleaned up the code a bit, converting the raw pointers that were
used to a STL vector.

Addresses BIT-1612
2016-06-13 15:18:44 -07:00
Seth Hall
a58c308427 Adding override/final to overridden virtual methods.
C++11 compilers complain about overridden virtual methods
not being specified as either final or overridden.
2016-01-16 23:35:31 -05:00
Matthias Vallentin
673607f9a7 Switch to double hashing.
For large k, standard hashing imposes an unnecessary overhead. By switchting to
double hashing, we invoke the hash function code at most two times.
2014-06-05 16:02:25 +02:00
Matthias Vallentin
1d50874256 Use full digest length instead of just one byte.
When our universal hash function fell back to MD5 for inputs larger than
supported by H3, the computation only returned the first byte of the MD5 result
instead of as many bytes as needed to cover sizeof(Hasher::digest).
2014-06-05 16:01:20 +02:00
Matthias Vallentin
cb4eaf762c Fix bug when clearing Bloom filter contents.
This patch fixes a bug that occurred when calling the BiF bloomfilter_clear,
which used to not only clear the underlying bit vector but also set its size to
zero. As a result, subsequent element access or computations using the bit
vector size caused erroneous behavior.

Reported by @colonelxc.
2014-04-15 12:48:56 +02:00
Bernhard Amann
b3bd509b3f Allow iterating over bif functions with result type vector of any.
This changes the internal type that is used to signal that a vector
is unspecified from any to void.

I tried to verify that the behavior of Bro is still the same. After
a lot of playing around, I think everything still should worl as before.

However, it might be good for someone to take a look at this.

addresses BIT-1144
2014-02-25 15:30:29 -08:00
Daniel Thayer
6f06705c23 Fix typos in BIF documentation
Fixed typos in documentation of hexstr_to_bytestring.
Also added documentation that was missing for function parameters
and return values of other BIFs.
2013-11-22 14:49:16 -06:00
Daniel Thayer
8c3adc9df6 Fix typos and formatting in the BiFs docs 2013-10-17 01:04:20 -05:00
Robin Sommer
0fe474e232 Polishing the reference section of the manual.
Mostly resorting and renaming a few things.
2013-10-07 15:53:46 -07:00
Robin Sommer
c6de23ebe1 Merge remote-tracking branch 'origin/topic/bernhard/ticket1072'
* origin/topic/bernhard/ticket1072:
  and const 2 more functions
  update hll documentation, make a few functions private and create a new copy constructor.
  fix case where hll_error_margin could be undefined (thanks John)

BIT-1072 #merged
2013-09-18 15:00:06 -07:00
Bernhard Amann
ecc20b932a and const 2 more functions 2013-09-16 11:00:54 -07:00
Bernhard Amann
c0f780c728 update hll documentation, make a few functions private and create
a new copy constructor.
2013-09-16 10:40:25 -07:00
Daniel Thayer
1d33883dfc Fix compiler warnings 2013-09-13 00:30:18 -05:00
Jon Siwek
0b97343ff7 Fix various potential memory leaks.
Though I expect most not to be exercised in practice.
2013-09-12 15:23:52 -05:00
Daniel Thayer
ee1312f2ad Fix an error seen when building documentation 2013-09-10 11:22:14 -05:00
Jon Siwek
db470a637a Documentation fixes.
This cleans up most of the warnings from sphinx (broken :doc: links,
broxygen role misuses, etc.).  The remaining ones should be harmless,
but not quick to silence.

I found that the README for each component was a copy from the actual
repo, so I turned those in to symlinks so they don't get out of date.
2013-09-03 15:59:40 -05:00
Robin Sommer
de5bb65ff7 Removing the "uint8*" methods from SerializationFormat.
They conflict with the "char" version, so that other classes would now
pick the wrong one. Added a bit of casting to HLL to use the "char"
versions instead.
2013-08-31 11:17:49 -07:00
Robin Sommer
6f9d28cc18 Merge branch 'topic/robin/hyperloglog-merge'
* topic/robin/hyperloglog-merge: (35 commits)
  Making the confidence configurable.
  Renaming HyperLogLog->CardinalityCounter.
  Fixing bug introduced during merging.
  add clustered leak test for hll. No issues.
  make gcc happy
  (hopefully) fix refcounting problem in hll/bloom-filter opaque vals. Thanks Robin.
  re-use same hash class for all add operations
  get hll ready for merging
  and forgot a file...
  adapt to new structure
  fix opaqueval-related memleak.
  make it compile on case-sensitive file systems and fix warnings
  make error rate configureable
  add persistence test not using predetermined random seeds.
  update cluster test to also use hll
  persistence really works.
  well, with this commit synchronizing the data structure should work.. ...if we had consistent hashing.
  and also serialize the other things we need
  ok, this bug was hard to find.
  serialization compiles.
  ...
2013-08-31 10:42:42 -07:00
Robin Sommer
295987c8d0 Making the confidence configurable. 2013-08-31 10:34:50 -07:00
Robin Sommer
fb3ceae6d5 Renaming HyperLogLog->CardinalityCounter.
For consistency with the class' name.
2013-08-31 10:22:27 -07:00
Robin Sommer
ef04ce809b Fixing bug introduced during merging. 2013-08-31 10:17:13 -07:00
Robin Sommer
4dcf8fc0db Merge remote-tracking branch 'origin/topic/bernhard/hyperloglog'
* origin/topic/bernhard/hyperloglog: (32 commits)
  add clustered leak test for hll. No issues.
  make gcc happy
  (hopefully) fix refcounting problem in hll/bloom-filter opaque vals. Thanks Robin.
  re-use same hash class for all add operations
  get hll ready for merging
  and forgot a file...
  adapt to new structure
  fix opaqueval-related memleak.
  make it compile on case-sensitive file systems and fix warnings
  make error rate configureable
  add persistence test not using predetermined random seeds.
  update cluster test to also use hll
  persistence really works.
  well, with this commit synchronizing the data structure should work.. ...if we had consistent hashing.
  and also serialize the other things we need
  ok, this bug was hard to find.
  serialization compiles.
  change plugin after feedback of seth
  Forgot a file. Again. Like always. Basically.
  do away with old file.
  ...
2013-08-30 11:30:05 -07:00
Bernhard Amann
2dd0d057e6 Merge remote-tracking branch 'origin/master' into topic/bernhard/hyperloglog
Conflicts:
	src/NetVar.cc
	src/NetVar.h
2013-08-30 08:43:47 -07:00
Jon Siwek
fb8b78840b Fix bloom filter memory leaks. 2013-08-29 11:24:24 -05:00
Bernhard Amann
74f96d22ef Merge remote branch 'origin/master' into topic/bernhard/hyperloglog
Conflicts:
	src/3rdparty
2013-08-26 12:53:13 -07:00
Matthias Vallentin
516e044e34 Use Bro-style platform-independent integer types. 2013-08-16 13:29:52 -07:00
Jon Siwek
774dadfe9a Change bloom filter's dependence on size_t.
That type can vary across platforms, but factored in to a bloom
filter's internal state, e.g. size of the seed.
2013-08-16 12:39:21 -05:00
Bernhard Amann
baef38976d Merge remote-tracking branch 'origin/topic/bernhard/hyperloglog' into topic/bernhard/hyperloglog 2013-08-12 09:50:43 -07:00
Bernhard Amann
d83edf8068 Merge remote-tracking branch 'origin/master' into topic/bernhard/hyperloglog
Conflicts:
	src/NetVar.cc
	src/NetVar.h
	src/SerialTypes.h
	src/probabilistic/CMakeLists.txt
	testing/btest/scripts/base/frameworks/sumstats/basic-cluster.bro
	testing/btest/scripts/base/frameworks/sumstats/basic.bro
2013-08-12 09:47:53 -07:00
Robin Sommer
1b40412818 Merge remote-tracking branch 'origin/topic/bernhard/topk'
* origin/topic/bernhard/topk:
  3 more functions to document.

Conflicts:
	src/probabilistic/Topk.h
2013-08-01 15:43:33 -07:00
Robin Sommer
04ccb12183 Merge branch 'topic/robin/topk-merge'
BIT-1048 #merged

I'm reverting the serializer version update for now as that breaks
Broccoli. Let's do that later for 2.2.

* topic/robin/topk-merge:
  update documentation, rename get* to Get* and make hasher persistent
  adapt to new folder structure
  fix opaqueval-related memleak
  synchronize pruned attribute
  potentially found wrong Ref.
  add sum function that can be used to get the number of total observed elements.
  in cluster settings, the resultvals can apparently been uninitialized in some special cases
  fix memory leaks
  fix warnings
  add topk cluster test
  make size of topk-list configureable when using sumstats
  implement merging for top-k.
  add serialization for topk
  make the get function const
  topk for sumstats
  well, a test that works..
  implement topk.
2013-08-01 14:39:16 -07:00
Robin Sommer
f6e5de91fa Merge remote-tracking branch 'origin/topic/bernhard/topk' into topic/robin/topk-merge
* origin/topic/bernhard/topk:
  update documentation, rename get* to Get* and make hasher persistent

Conflicts:
	src/probabilistic/Topk.cc
	src/probabilistic/Topk.h
	src/probabilistic/top-k.bif
2013-08-01 14:13:25 -07:00
Bernhard Amann
3c0be74759 3 more functions to document. 2013-08-01 14:13:20 -07:00
Bernhard Amann
6a45a67eb5 update documentation, rename get* to Get* and make hasher
persistent
2013-08-01 14:07:39 -07:00
Robin Sommer
32a403cdaf Merge branch 'topic/robin/bloom-filter-merge'
* topic/robin/bloom-filter-merge:
  Using a real hash function for hashing a BitVector's internal state.
  Support UHF hashing for >= UHASH_KEY_SIZE bytes.
  Changing the Bloom filter hashing so that it's independent of CompositeHash.
  Add new BiF for low-level Bloom filter initialization.
  Introduce global_hash_seed script variable.

Conflicts:
	testing/btest/Baseline/bifs.bloomfilter/output
2013-08-01 10:52:08 -07:00
Robin Sommer
7ab2170641 Using a real hash function for hashing a BitVector's internal state. 2013-08-01 10:46:05 -07:00
Robin Sommer
00e4369eae Merge remote-tracking branch 'origin/topic/matthias/bloom-filter' into topic/robin/bloom-filter-merge
* origin/topic/matthias/bloom-filter:
  Support UHF hashing for >= UHASH_KEY_SIZE bytes.
2013-08-01 10:38:33 -07:00
Robin Sommer
81dcda3eb4 Merge remote-tracking branch 'origin/topic/bernhard/topk'
* origin/topic/bernhard/topk:
  adapt to new folder structure
  fix opaqueval-related memleak
  synchronize pruned attribute
  potentially found wrong Ref.
  add sum function that can be used to get the number of total observed elements.
  in cluster settings, the resultvals can apparently been uninitialized in some special cases
  fix memory leaks
  fix warnings
  add topk cluster test
  make size of topk-list configureable when using sumstats
  implement merging for top-k.
  add serialization for topk
  make the get function const
  topk for sumstats
  well, a test that works..
  implement topk.
2013-08-01 10:27:18 -07:00
Matthias Vallentin
34965b4e77 Support UHF hashing for >= UHASH_KEY_SIZE bytes. 2013-08-01 19:15:28 +02:00