Commit graph

206 commits

Author SHA1 Message Date
Robin Sommer
81dcda3eb4 Merge remote-tracking branch 'origin/topic/bernhard/topk'
* origin/topic/bernhard/topk:
  adapt to new folder structure
  fix opaqueval-related memleak
  synchronize pruned attribute
  potentially found wrong Ref.
  add sum function that can be used to get the number of total observed elements.
  in cluster settings, the resultvals can apparently been uninitialized in some special cases
  fix memory leaks
  fix warnings
  add topk cluster test
  make size of topk-list configureable when using sumstats
  implement merging for top-k.
  add serialization for topk
  make the get function const
  topk for sumstats
  well, a test that works..
  implement topk.
2013-08-01 10:27:18 -07:00
Robin Sommer
2a0790c231 Changing the Bloom filter hashing so that it's independent of
CompositeHash.

We do this by hashing values added to a BloomFilter another time more
with a stable hash seeded only by either the filter's name or the
global_hash_seed (or Bro's random() seed if neither is defined).

I'm also adding a new bif bloomfilter_internal_state() that returns a
string representation of a Bloom filter's current internal state. This
is solely for writing tests that check that the filters end up
consistent when seeded with the same value.
2013-07-31 19:56:34 -07:00
Bernhard Amann
5122bf4a7c adapt to new folder structure 2013-07-31 12:06:59 -07:00
Bernhard Amann
daaf091bc3 Merge remote-tracking branch 'origin/master' into topic/bernhard/topk
Conflicts:
	src/NetVar.cc
	src/NetVar.h
	src/SerialTypes.h
	src/bro.bif
2013-07-31 11:52:39 -07:00
Matthias Vallentin
d50b8a147d Add new BiF for low-level Bloom filter initialization.
For symmetry reasons, the new Bif bloomfilter_basic_init2 also allows users to
manually specify the memory bounds and number of hash functions to use.
2013-07-31 18:21:37 +02:00
Matthias Vallentin
8ca76dd4ee Introduce global_hash_seed script variable.
This commit adds support for script-level specification of a seed to be used by
hashers. For example, if the given name of a Bloom filter is not empty, then
the seed used by the underlying hasher only depends on the Bloom filter name.
If the name is empty, we check whether the user defined a non-empty
global_hash_seed string variable at script and use it instead. If that script
variable does not exist, then we fall back to the initial seed computed a
Bro startup (which is affected ultimately by $BRO_SEED).

See Hasher::MakeSeed for details.
2013-07-31 17:59:08 +02:00
Bernhard Amann
18c10f3cb5 get hll ready for merging 2013-07-30 16:47:26 -07:00
Bernhard Amann
edb04e6d8b fix segfault that could be caused by merging an empty bloom-filter
with a bloom-filter already containing values.

I assume that it is ok to merge an empty bloom-filter with any bloom-filter -
if not we have to change the patch to return an error in this case.
2013-07-30 16:10:06 -07:00
Bernhard Amann
9e0fd963e0 Merge remote-tracking branch 'origin/topic/robin/bloom-filter-merge' into topic/bernhard/hyperloglog
Conflicts:
	scripts/base/frameworks/sumstats/plugins/__load__.bro
	src/CMakeLists.txt
	src/NetVar.cc
	src/NetVar.h
	src/OpaqueVal.h
	src/SerialTypes.h
	src/bro.bif
2013-07-23 21:31:05 -07:00
Robin Sommer
474107fe40 Broifying the code.
Also extending API documentation a bit more and fixing a memory leak.
2013-07-23 20:10:32 -07:00
Matthias Vallentin
a39f980cd4 Implement and test Bloom filter merging. 2013-07-22 18:11:12 +02:00
Matthias Vallentin
7a0240694e Fix and test counting Bloom filter. 2013-07-22 14:09:32 +02:00
Bernhard Amann
03b584c34a Merge remote-tracking branch 'origin/master' into topic/bernhard/topk 2013-07-09 14:56:05 -07:00
Matthias Vallentin
532fbfb4d2 Factor implementation and change interface.
When constructing a Bloom filter, one now has to pass a HashPolicy instance to
it. This separates more clearly the concerns of hashing and Bloom filter
management.

This commit also changes the interface to initialize Bloom filters: there exist
now two initialization functions, one for each type:

  (1) bloomfilter_basic_init(fp: double,
                             capacity: count,
                             name: string &default=""): opaque of bloomfilter

  (2) bloomfilter_counting_init(k: count,
                                cells: count,
                                max: count,
                                name: string &default=""): opaque of bloomfilter

The BiFs for adding elements and performing lookups remain the same. This
essentially gives us "BiF polymorphism" at script land, where the
initialization BiF constructs the most derived type while subsequent BiFs
adhere to the same interface.

The reason why we split up the constructor in this case is that we have not yet
derived the math that computes the optimal number of hash functions for
counting Bloom filters---users have to explicitly parameterize them for now.
2013-06-17 16:14:11 -07:00
Matthias Vallentin
d25984ba45 Update baseline for unit tests. 2013-06-10 12:55:03 -07:00
Matthias Vallentin
86becdd6e4 Add tests. 2013-06-06 15:08:24 -07:00
Robin Sommer
eb637f9f3e Merge remote-tracking branch 'origin/master' into topic/robin/plugins
Thanks to git this merge was less troublesome that I was afraid it
would be. Not all tests pass yet though (and file hashes have changed
unfortunately).

Conflicts:
	cmake
	doc/scripts/DocSourcesList.cmake
	scripts/base/init-bare.bro
	scripts/base/protocols/ftp/main.bro
	scripts/base/protocols/irc/dcc-send.bro
	scripts/test-all-policy.bro
	src/AnalyzerTags.h
	src/CMakeLists.txt
	src/analyzer/Analyzer.cc
	src/analyzer/protocol/file/File.cc
	src/analyzer/protocol/file/File.h
	src/analyzer/protocol/http/HTTP.cc
	src/analyzer/protocol/http/HTTP.h
	src/analyzer/protocol/mime/MIME.cc
	src/event.bif
	src/main.cc
	src/util-config.h.in
	testing/btest/Baseline/coverage.bare-load-baseline/canonified_loaded_scripts.log
	testing/btest/Baseline/coverage.default-load-baseline/canonified_loaded_scripts.log
	testing/btest/Baseline/istate.events-ssl/receiver.http.log
	testing/btest/Baseline/istate.events-ssl/sender.http.log
	testing/btest/Baseline/istate.events/receiver.http.log
	testing/btest/Baseline/istate.events/sender.http.log
2013-05-16 17:58:48 -07:00
Bernhard Amann
56ab9285a4 Merge remote-tracking branch 'origin/master' into topic/bernhard/topk 2013-05-13 21:03:23 -07:00
Jon Siwek
e2a1d4a233 Allow default function/hook/event parameters. Addresses #972.
And changed the endianness parameter of bytestring_to_count() BIF to
default to false (big endian), mostly just to prove that the BIF parser
doesn't choke on default parameters.
2013-05-07 14:32:22 -05:00
Bernhard Amann
160da6f1a6 add sum function that can be used to get the number of total
observed elements.

Add methods to merge with and without pruning (before only merge
method was with pruning, which invalidates the number of total
observed elements)
2013-04-28 21:55:06 -07:00
Bernhard Amann
f2967f485b add persistence test not using predetermined random seeds.
This is failing at the moment.
2013-04-24 16:03:40 -07:00
Bernhard Amann
f69db71f57 Merge remote-tracking branch 'origin/master' into topic/bernhard/hyperloglog 2013-04-24 16:01:05 -07:00
Bernhard Amann
dbd53a09a6 Merge remote-tracking branch 'origin/master' into topic/bernhard/topk 2013-04-24 15:02:19 -07:00
Bernhard Amann
2f48008c42 implement merging for top-k.
I am not (entirely) sure that this is mathematically correct, but
I am (more and more) getting the feeling that it... might be.

In any case - this was the last step and now it should work
in cluster settings.
2013-04-24 06:17:51 -07:00
Bernhard Amann
6f863d2259 add serialization for topk 2013-04-23 23:24:02 -07:00
Yun Zheng Hu
3fff71b37a Add bytestring_to_count function to bro.bif 2013-04-23 20:18:38 -07:00
Bernhard Amann
de5769a88f topk for sumstats 2013-04-23 15:19:01 -07:00
Bernhard Amann
ce7ad003f2 well, a test that works..
Note: merging top-k data structures is not yet possible (and is
actually quite awkward/expensive). I will have to think about
how to do that for a bit...
2013-04-22 02:40:42 -07:00
Bernhard Amann
8340af55d1 persistence really works.
It took me way too long to find this - I got the uint8 serialize/deserialize
wrong :/
2013-04-19 09:52:45 -07:00
Bernhard Amann
53d6f3aae7 rework cardinality interface to use opaque.
I like it better...
2013-04-07 23:05:14 +02:00
Robin Sommer
2be985433c Test-suite passes.
All tests pass with one exception: some Broxygen tests are broken
because dpd_config doesn't exist anymore. Need to update the mechanism
for auto-documenting well-known ports.
2013-03-26 15:40:23 -07:00
Bernhard Amann
b05eef6541 Merge remote-tracking branch 'origin/master' into topic/bernhard/hyperloglog
Conflicts:
	src/bro.bif
2013-03-25 08:39:52 -07:00
Yun Zheng Hu
9a88dc500a Added reverse() function to strings.bif.
Closes #969.
2013-03-23 08:39:04 -07:00
Bernhard Amann
a5161783ef and add bae64 bif tests. 2013-03-12 09:33:49 -07:00
Bernhard Amann
986b346e3f remove the byte_len and length bifs 2013-03-06 13:45:42 -08:00
Robin Sommer
da90976170 Merge remote-tracking branch 'origin/topic/matthias/opaque'
* origin/topic/matthias/opaque:
  Add new unit test for opaque serialization.
  Migrate entropy testing to opaque.
  C++ify RandTest.*
  Fix a hard-to-spot bug.
  Use more descriptive error message.
  Fix the fix :-/.
  Fix initialization of hash values.
  Be clearer about delegation.
  Implement serialization of opaque types.
  Update hash BiF documentation.
  Migrate free SHA* functions to SHA*Val::digest().
  Add missing type name that caused failing tests.
  Update base scripts and unit tests.
  Simplify hash function BiFs.
  Add support for opaque hash values.
  Adapt BiF & Bro parser to handle opaque types.
  More lexer/parser work.
  Implement equivalence relation for opaque types.
  Support basic serialization of opaque.
  Add opaque type to lexer, parser, and BroType.

Closes #925

Conflicts:
	aux/broccoli
2012-12-20 16:30:22 -08:00
Jon Siwek
4a09c12882 Fix to_port() BIF for port strings with a port number of zero. 2012-12-18 15:08:18 -06:00
Matthias Vallentin
b9d05f56d0 Migrate entropy testing to opaque. 2012-12-13 19:28:19 -08:00
Matthias Vallentin
816965f3c7 Merge remote-tracking branch 'origin/master' into topic/matthias/opaque 2012-12-11 16:32:01 -08:00
Matthias Vallentin
30bab14dbf Update base scripts and unit tests. 2012-12-11 16:26:17 -08:00
Jon Siwek
95ffb1cf27 Quick pass over unit tests, adding -b flag to bro so they run faster.
Doing this made bifs/ ~3x faster and language/ ~2x faster.
2012-11-30 17:44:36 -06:00
Soumya Basu
80cdfbcab4 Moved the testing file to the correct directory 2012-11-15 13:04:48 -08:00
Daniel Thayer
48c4487378 Add test cases for the bytestring_to_double BIF 2012-10-25 17:10:51 -05:00
Seth Hall
d157759ff2 Added a BiF to wrap the strptime function. 2012-10-19 02:07:34 -04:00
Robin Sommer
22cf75dae5 Two fixes.
- Typo in recent scanner fix.

    - Make bif.identify_magic robust against FreeBSD's libmagic config.
2012-08-29 08:11:16 -07:00
Daniel Thayer
bda8631f32 Add more BIF tests 2012-08-07 14:10:55 -05:00
Daniel Thayer
10b671a638 Add tests for untested BIFs 2012-08-03 17:24:04 -05:00
Daniel Thayer
91522e7836 Fix tests and error message for to_double BIF 2012-07-25 12:10:47 -05:00
Robin Sommer
c36a449c76 New built-in function to_double(s: string).
Closes #859.
2012-07-24 15:05:13 -07:00
Daniel Thayer
1b8673f4b2 Remove a non-portable test case 2012-07-05 17:58:44 -05:00