* origin/topic/bernhard/topk:
adapt to new folder structure
fix opaqueval-related memleak
synchronize pruned attribute
potentially found wrong Ref.
add sum function that can be used to get the number of total observed elements.
in cluster settings, the resultvals can apparently been uninitialized in some special cases
fix memory leaks
fix warnings
add topk cluster test
make size of topk-list configureable when using sumstats
implement merging for top-k.
add serialization for topk
make the get function const
topk for sumstats
well, a test that works..
implement topk.
CompositeHash.
We do this by hashing values added to a BloomFilter another time more
with a stable hash seeded only by either the filter's name or the
global_hash_seed (or Bro's random() seed if neither is defined).
I'm also adding a new bif bloomfilter_internal_state() that returns a
string representation of a Bloom filter's current internal state. This
is solely for writing tests that check that the filters end up
consistent when seeded with the same value.
This commit adds support for script-level specification of a seed to be used by
hashers. For example, if the given name of a Bloom filter is not empty, then
the seed used by the underlying hasher only depends on the Bloom filter name.
If the name is empty, we check whether the user defined a non-empty
global_hash_seed string variable at script and use it instead. If that script
variable does not exist, then we fall back to the initial seed computed a
Bro startup (which is affected ultimately by $BRO_SEED).
See Hasher::MakeSeed for details.
with a bloom-filter already containing values.
I assume that it is ok to merge an empty bloom-filter with any bloom-filter -
if not we have to change the patch to return an error in this case.
When constructing a Bloom filter, one now has to pass a HashPolicy instance to
it. This separates more clearly the concerns of hashing and Bloom filter
management.
This commit also changes the interface to initialize Bloom filters: there exist
now two initialization functions, one for each type:
(1) bloomfilter_basic_init(fp: double,
capacity: count,
name: string &default=""): opaque of bloomfilter
(2) bloomfilter_counting_init(k: count,
cells: count,
max: count,
name: string &default=""): opaque of bloomfilter
The BiFs for adding elements and performing lookups remain the same. This
essentially gives us "BiF polymorphism" at script land, where the
initialization BiF constructs the most derived type while subsequent BiFs
adhere to the same interface.
The reason why we split up the constructor in this case is that we have not yet
derived the math that computes the optimal number of hash functions for
counting Bloom filters---users have to explicitly parameterize them for now.
Thanks to git this merge was less troublesome that I was afraid it
would be. Not all tests pass yet though (and file hashes have changed
unfortunately).
Conflicts:
cmake
doc/scripts/DocSourcesList.cmake
scripts/base/init-bare.bro
scripts/base/protocols/ftp/main.bro
scripts/base/protocols/irc/dcc-send.bro
scripts/test-all-policy.bro
src/AnalyzerTags.h
src/CMakeLists.txt
src/analyzer/Analyzer.cc
src/analyzer/protocol/file/File.cc
src/analyzer/protocol/file/File.h
src/analyzer/protocol/http/HTTP.cc
src/analyzer/protocol/http/HTTP.h
src/analyzer/protocol/mime/MIME.cc
src/event.bif
src/main.cc
src/util-config.h.in
testing/btest/Baseline/coverage.bare-load-baseline/canonified_loaded_scripts.log
testing/btest/Baseline/coverage.default-load-baseline/canonified_loaded_scripts.log
testing/btest/Baseline/istate.events-ssl/receiver.http.log
testing/btest/Baseline/istate.events-ssl/sender.http.log
testing/btest/Baseline/istate.events/receiver.http.log
testing/btest/Baseline/istate.events/sender.http.log
And changed the endianness parameter of bytestring_to_count() BIF to
default to false (big endian), mostly just to prove that the BIF parser
doesn't choke on default parameters.
observed elements.
Add methods to merge with and without pruning (before only merge
method was with pruning, which invalidates the number of total
observed elements)
I am not (entirely) sure that this is mathematically correct, but
I am (more and more) getting the feeling that it... might be.
In any case - this was the last step and now it should work
in cluster settings.
Note: merging top-k data structures is not yet possible (and is
actually quite awkward/expensive). I will have to think about
how to do that for a bit...
All tests pass with one exception: some Broxygen tests are broken
because dpd_config doesn't exist anymore. Need to update the mechanism
for auto-documenting well-known ports.
* origin/topic/matthias/opaque:
Add new unit test for opaque serialization.
Migrate entropy testing to opaque.
C++ify RandTest.*
Fix a hard-to-spot bug.
Use more descriptive error message.
Fix the fix :-/.
Fix initialization of hash values.
Be clearer about delegation.
Implement serialization of opaque types.
Update hash BiF documentation.
Migrate free SHA* functions to SHA*Val::digest().
Add missing type name that caused failing tests.
Update base scripts and unit tests.
Simplify hash function BiFs.
Add support for opaque hash values.
Adapt BiF & Bro parser to handle opaque types.
More lexer/parser work.
Implement equivalence relation for opaque types.
Support basic serialization of opaque.
Add opaque type to lexer, parser, and BroType.
Closes#925
Conflicts:
aux/broccoli