When our universal hash function fell back to MD5 for inputs larger than
supported by H3, the computation only returned the first byte of the MD5 result
instead of as many bytes as needed to cover sizeof(Hasher::digest).
This patch fixes a bug that occurred when calling the BiF bloomfilter_clear,
which used to not only clear the underlying bit vector but also set its size to
zero. As a result, subsequent element access or computations using the bit
vector size caused erroneous behavior.
Reported by @colonelxc.
This changes the internal type that is used to signal that a vector
is unspecified from any to void.
I tried to verify that the behavior of Bro is still the same. After
a lot of playing around, I think everything still should worl as before.
However, it might be good for someone to take a look at this.
addresses BIT-1144
Fixed typos in documentation of hexstr_to_bytestring.
Also added documentation that was missing for function parameters
and return values of other BIFs.
* origin/topic/bernhard/ticket1072:
and const 2 more functions
update hll documentation, make a few functions private and create a new copy constructor.
fix case where hll_error_margin could be undefined (thanks John)
BIT-1072 #merged
This cleans up most of the warnings from sphinx (broken :doc: links,
broxygen role misuses, etc.). The remaining ones should be harmless,
but not quick to silence.
I found that the README for each component was a copy from the actual
repo, so I turned those in to symlinks so they don't get out of date.
They conflict with the "char" version, so that other classes would now
pick the wrong one. Added a bit of casting to HLL to use the "char"
versions instead.
* topic/robin/hyperloglog-merge: (35 commits)
Making the confidence configurable.
Renaming HyperLogLog->CardinalityCounter.
Fixing bug introduced during merging.
add clustered leak test for hll. No issues.
make gcc happy
(hopefully) fix refcounting problem in hll/bloom-filter opaque vals. Thanks Robin.
re-use same hash class for all add operations
get hll ready for merging
and forgot a file...
adapt to new structure
fix opaqueval-related memleak.
make it compile on case-sensitive file systems and fix warnings
make error rate configureable
add persistence test not using predetermined random seeds.
update cluster test to also use hll
persistence really works.
well, with this commit synchronizing the data structure should work.. ...if we had consistent hashing.
and also serialize the other things we need
ok, this bug was hard to find.
serialization compiles.
...
* origin/topic/bernhard/hyperloglog: (32 commits)
add clustered leak test for hll. No issues.
make gcc happy
(hopefully) fix refcounting problem in hll/bloom-filter opaque vals. Thanks Robin.
re-use same hash class for all add operations
get hll ready for merging
and forgot a file...
adapt to new structure
fix opaqueval-related memleak.
make it compile on case-sensitive file systems and fix warnings
make error rate configureable
add persistence test not using predetermined random seeds.
update cluster test to also use hll
persistence really works.
well, with this commit synchronizing the data structure should work.. ...if we had consistent hashing.
and also serialize the other things we need
ok, this bug was hard to find.
serialization compiles.
change plugin after feedback of seth
Forgot a file. Again. Like always. Basically.
do away with old file.
...
BIT-1048 #merged
I'm reverting the serializer version update for now as that breaks
Broccoli. Let's do that later for 2.2.
* topic/robin/topk-merge:
update documentation, rename get* to Get* and make hasher persistent
adapt to new folder structure
fix opaqueval-related memleak
synchronize pruned attribute
potentially found wrong Ref.
add sum function that can be used to get the number of total observed elements.
in cluster settings, the resultvals can apparently been uninitialized in some special cases
fix memory leaks
fix warnings
add topk cluster test
make size of topk-list configureable when using sumstats
implement merging for top-k.
add serialization for topk
make the get function const
topk for sumstats
well, a test that works..
implement topk.
* topic/robin/bloom-filter-merge:
Using a real hash function for hashing a BitVector's internal state.
Support UHF hashing for >= UHASH_KEY_SIZE bytes.
Changing the Bloom filter hashing so that it's independent of CompositeHash.
Add new BiF for low-level Bloom filter initialization.
Introduce global_hash_seed script variable.
Conflicts:
testing/btest/Baseline/bifs.bloomfilter/output
* origin/topic/bernhard/topk:
adapt to new folder structure
fix opaqueval-related memleak
synchronize pruned attribute
potentially found wrong Ref.
add sum function that can be used to get the number of total observed elements.
in cluster settings, the resultvals can apparently been uninitialized in some special cases
fix memory leaks
fix warnings
add topk cluster test
make size of topk-list configureable when using sumstats
implement merging for top-k.
add serialization for topk
make the get function const
topk for sumstats
well, a test that works..
implement topk.
Slightly adapted after discussing with Bernhard. I also added one
further check.
* origin/fastpath:
fix segfault that could be caused by merging an empty bloom-filter with a bloom-filter already containing values.
CompositeHash.
We do this by hashing values added to a BloomFilter another time more
with a stable hash seeded only by either the filter's name or the
global_hash_seed (or Bro's random() seed if neither is defined).
I'm also adding a new bif bloomfilter_internal_state() that returns a
string representation of a Bloom filter's current internal state. This
is solely for writing tests that check that the filters end up
consistent when seeded with the same value.
This commit adds support for script-level specification of a seed to be used by
hashers. For example, if the given name of a Bloom filter is not empty, then
the seed used by the underlying hasher only depends on the Bloom filter name.
If the name is empty, we check whether the user defined a non-empty
global_hash_seed string variable at script and use it instead. If that script
variable does not exist, then we fall back to the initial seed computed a
Bro startup (which is affected ultimately by $BRO_SEED).
See Hasher::MakeSeed for details.
with a bloom-filter already containing values.
I assume that it is ok to merge an empty bloom-filter with any bloom-filter -
if not we have to change the patch to return an error in this case.