Commit graph

227 commits

Author SHA1 Message Date
Christian Struck
b36d5fc81b [ADD] builtin function enum_to_int()
[ADD] added tests for the new enum_to_int function
2014-11-10 18:24:27 -08:00
Robin Sommer
9efb549236 Merge remote-tracking branch 'origin/topic/jsiwek/file-signatures'
* origin/topic/jsiwek/file-signatures:
  File type detection changes and fix https.log {orig,resp}_fuids fields.
  Various minor changes related to file mime type detection.
  Refactor common MIME magic matching code.
  Replace libmagic w/ Bro signatures for file MIME type identification.

Conflicts:
	scripts/base/init-default.bro
	testing/btest/Baseline/coverage.bare-load-baseline/canonified_loaded_scripts.log
	testing/btest/Baseline/coverage.default-load-baseline/canonified_loaded_scripts.log

BIT-1143 #merged
2014-03-30 22:51:05 +02:00
Bernhard Amann
4da0718511 Finishing touches of the x509 file analyzer.
Mostly baseline updates and new tests.

addresses BIT-953, BIT-760, BIT-1150
2014-03-13 15:21:30 -07:00
Jon Siwek
9ac8110416 Merge branch 'master' into topic/jsiwek/file-signatures 2014-03-04 15:36:49 -06:00
Jon Siwek
b22ca5d0a3 Replace libmagic w/ Bro signatures for file MIME type identification.
Notable changes:

- libmagic is no longer used at all.  All MIME type detection is
  done through new Bro signatures, and there's no longer a means to get
  verbose file type descriptions (e.g. "PNG image data, 1435 x 170").
  The majority of the default file magic signatures are derived
  from the default magic database of libmagic ~5.17.

- File magic signatures consist of two new constructs in the
  signature rule parsing grammar: "file-magic" gives a regular
  expression to match against, and "file-mime" gives the MIME type
  string of content that matches the magic and an optional strength
  value for the match.

- Modified signature/rule syntax for identifiers: they can no longer
  start with a '-', which made for ambiguous syntax when doing negative
  strength values in "file-mime".  Also brought syntax for Bro script
  identifiers in line with reality (they can't start with numbers or
  include '-' at all).

- A new Built-In Function, "file_magic", can be used to get all
  file magic matches and their corresponding strength against a given
  chunk of data

- The second parameter of the "identify_data" Built-In Function
  can no longer be used to get verbose file type descriptions, though it
  can still be used to get the strongest matching file magic signature.

- The "file_transferred" event's "descr" parameter no longer
  contains verbose file type descriptions.

- The BROMAGIC environment variable no longer changes any behavior
  in Bro as magic databases are no longer used/installed.

- Reverted back to minimum requirement of CMake 2.6.3 from 2.8.0
  (it's back to being the same requirement as the Bro v2.2 release).
  The bump was to accomodate building libmagic as an external project,
  which is no longer needed.

Addresses BIT-1143.
2014-03-04 11:12:06 -06:00
Bernhard Amann
b3bd509b3f Allow iterating over bif functions with result type vector of any.
This changes the internal type that is used to signal that a vector
is unspecified from any to void.

I tried to verify that the behavior of Bro is still the same. After
a lot of playing around, I think everything still should worl as before.

However, it might be good for someone to take a look at this.

addresses BIT-1144
2014-02-25 15:30:29 -08:00
Jon Siwek
90026f7196 Update to libmagic version 5.17, address BIT-1136. 2014-02-19 10:32:27 -06:00
Jon Siwek
eab886fb84 Change test of identify_data BIF to ignore charset.
It may vary with libmagic version.
2013-10-23 16:51:55 -05:00
Robin Sommer
295987c8d0 Making the confidence configurable. 2013-08-31 10:34:50 -07:00
Robin Sommer
4dcf8fc0db Merge remote-tracking branch 'origin/topic/bernhard/hyperloglog'
* origin/topic/bernhard/hyperloglog: (32 commits)
  add clustered leak test for hll. No issues.
  make gcc happy
  (hopefully) fix refcounting problem in hll/bloom-filter opaque vals. Thanks Robin.
  re-use same hash class for all add operations
  get hll ready for merging
  and forgot a file...
  adapt to new structure
  fix opaqueval-related memleak.
  make it compile on case-sensitive file systems and fix warnings
  make error rate configureable
  add persistence test not using predetermined random seeds.
  update cluster test to also use hll
  persistence really works.
  well, with this commit synchronizing the data structure should work.. ...if we had consistent hashing.
  and also serialize the other things we need
  ok, this bug was hard to find.
  serialization compiles.
  change plugin after feedback of seth
  Forgot a file. Again. Like always. Basically.
  do away with old file.
  ...
2013-08-30 11:30:05 -07:00
Bernhard Amann
dc9fd36497 Merge remote branch 'origin/master' into topic/bernhard/hyperloglog 2013-08-28 17:48:59 -07:00
Bernhard Amann
8a5a2b5b39 add hexstr_to_bytestring bif that does exactly the opposite of
bytestring_to_hexstr.
2013-08-27 12:20:03 -07:00
Bernhard Amann
74f96d22ef Merge remote branch 'origin/master' into topic/bernhard/hyperloglog
Conflicts:
	src/3rdparty
2013-08-26 12:53:13 -07:00
Robin Sommer
ab8d13889e Merge remote-tracking branch 'origin/topic/matthias/bloom-filter'
* origin/topic/matthias/bloom-filter:
  Use Bro-style platform-independent integer types.
  Change bloom filter's dependence on size_t.
  Remove debugging code.
  Update baseline with now correct FP tests.
  Add debugging code to find FP inconsistency.

Conflicts:
	src/3rdparty
2013-08-19 11:26:29 -07:00
Robin Sommer
95f74313d0 Merge branch 'master' of https://github.com/anthonykasza/bro
* 'master' of https://github.com/anthonykasza/bro:
  levenshtein distance function unit test
  levenshtein distance

Conflicts:
	src/3rdparty
2013-08-19 11:20:50 -07:00
anthonykasza
c9313df382 levenshtein distance function unit test 2013-08-12 21:29:57 -05:00
Bernhard Amann
d83edf8068 Merge remote-tracking branch 'origin/master' into topic/bernhard/hyperloglog
Conflicts:
	src/NetVar.cc
	src/NetVar.h
	src/SerialTypes.h
	src/probabilistic/CMakeLists.txt
	testing/btest/scripts/base/frameworks/sumstats/basic-cluster.bro
	testing/btest/scripts/base/frameworks/sumstats/basic.bro
2013-08-12 09:47:53 -07:00
Matthias Vallentin
c526ebcfeb Update baseline with now correct FP tests. 2013-08-03 16:54:47 +02:00
Robin Sommer
04ccb12183 Merge branch 'topic/robin/topk-merge'
BIT-1048 #merged

I'm reverting the serializer version update for now as that breaks
Broccoli. Let's do that later for 2.2.

* topic/robin/topk-merge:
  update documentation, rename get* to Get* and make hasher persistent
  adapt to new folder structure
  fix opaqueval-related memleak
  synchronize pruned attribute
  potentially found wrong Ref.
  add sum function that can be used to get the number of total observed elements.
  in cluster settings, the resultvals can apparently been uninitialized in some special cases
  fix memory leaks
  fix warnings
  add topk cluster test
  make size of topk-list configureable when using sumstats
  implement merging for top-k.
  add serialization for topk
  make the get function const
  topk for sumstats
  well, a test that works..
  implement topk.
2013-08-01 14:39:16 -07:00
Robin Sommer
948441e176 Test expected false positive, but it isn't one any more.
Matthias, please check if this is correct.
2013-08-01 10:52:15 -07:00
Robin Sommer
32a403cdaf Merge branch 'topic/robin/bloom-filter-merge'
* topic/robin/bloom-filter-merge:
  Using a real hash function for hashing a BitVector's internal state.
  Support UHF hashing for >= UHASH_KEY_SIZE bytes.
  Changing the Bloom filter hashing so that it's independent of CompositeHash.
  Add new BiF for low-level Bloom filter initialization.
  Introduce global_hash_seed script variable.

Conflicts:
	testing/btest/Baseline/bifs.bloomfilter/output
2013-08-01 10:52:08 -07:00
Robin Sommer
81dcda3eb4 Merge remote-tracking branch 'origin/topic/bernhard/topk'
* origin/topic/bernhard/topk:
  adapt to new folder structure
  fix opaqueval-related memleak
  synchronize pruned attribute
  potentially found wrong Ref.
  add sum function that can be used to get the number of total observed elements.
  in cluster settings, the resultvals can apparently been uninitialized in some special cases
  fix memory leaks
  fix warnings
  add topk cluster test
  make size of topk-list configureable when using sumstats
  implement merging for top-k.
  add serialization for topk
  make the get function const
  topk for sumstats
  well, a test that works..
  implement topk.
2013-08-01 10:27:18 -07:00
Robin Sommer
2a0790c231 Changing the Bloom filter hashing so that it's independent of
CompositeHash.

We do this by hashing values added to a BloomFilter another time more
with a stable hash seeded only by either the filter's name or the
global_hash_seed (or Bro's random() seed if neither is defined).

I'm also adding a new bif bloomfilter_internal_state() that returns a
string representation of a Bloom filter's current internal state. This
is solely for writing tests that check that the filters end up
consistent when seeded with the same value.
2013-07-31 19:56:34 -07:00
Bernhard Amann
5122bf4a7c adapt to new folder structure 2013-07-31 12:06:59 -07:00
Bernhard Amann
daaf091bc3 Merge remote-tracking branch 'origin/master' into topic/bernhard/topk
Conflicts:
	src/NetVar.cc
	src/NetVar.h
	src/SerialTypes.h
	src/bro.bif
2013-07-31 11:52:39 -07:00
Matthias Vallentin
d50b8a147d Add new BiF for low-level Bloom filter initialization.
For symmetry reasons, the new Bif bloomfilter_basic_init2 also allows users to
manually specify the memory bounds and number of hash functions to use.
2013-07-31 18:21:37 +02:00
Matthias Vallentin
8ca76dd4ee Introduce global_hash_seed script variable.
This commit adds support for script-level specification of a seed to be used by
hashers. For example, if the given name of a Bloom filter is not empty, then
the seed used by the underlying hasher only depends on the Bloom filter name.
If the name is empty, we check whether the user defined a non-empty
global_hash_seed string variable at script and use it instead. If that script
variable does not exist, then we fall back to the initial seed computed a
Bro startup (which is affected ultimately by $BRO_SEED).

See Hasher::MakeSeed for details.
2013-07-31 17:59:08 +02:00
Bernhard Amann
18c10f3cb5 get hll ready for merging 2013-07-30 16:47:26 -07:00
Bernhard Amann
edb04e6d8b fix segfault that could be caused by merging an empty bloom-filter
with a bloom-filter already containing values.

I assume that it is ok to merge an empty bloom-filter with any bloom-filter -
if not we have to change the patch to return an error in this case.
2013-07-30 16:10:06 -07:00
Bernhard Amann
9e0fd963e0 Merge remote-tracking branch 'origin/topic/robin/bloom-filter-merge' into topic/bernhard/hyperloglog
Conflicts:
	scripts/base/frameworks/sumstats/plugins/__load__.bro
	src/CMakeLists.txt
	src/NetVar.cc
	src/NetVar.h
	src/OpaqueVal.h
	src/SerialTypes.h
	src/bro.bif
2013-07-23 21:31:05 -07:00
Robin Sommer
474107fe40 Broifying the code.
Also extending API documentation a bit more and fixing a memory leak.
2013-07-23 20:10:32 -07:00
Matthias Vallentin
a39f980cd4 Implement and test Bloom filter merging. 2013-07-22 18:11:12 +02:00
Matthias Vallentin
7a0240694e Fix and test counting Bloom filter. 2013-07-22 14:09:32 +02:00
Bernhard Amann
03b584c34a Merge remote-tracking branch 'origin/master' into topic/bernhard/topk 2013-07-09 14:56:05 -07:00
Matthias Vallentin
532fbfb4d2 Factor implementation and change interface.
When constructing a Bloom filter, one now has to pass a HashPolicy instance to
it. This separates more clearly the concerns of hashing and Bloom filter
management.

This commit also changes the interface to initialize Bloom filters: there exist
now two initialization functions, one for each type:

  (1) bloomfilter_basic_init(fp: double,
                             capacity: count,
                             name: string &default=""): opaque of bloomfilter

  (2) bloomfilter_counting_init(k: count,
                                cells: count,
                                max: count,
                                name: string &default=""): opaque of bloomfilter

The BiFs for adding elements and performing lookups remain the same. This
essentially gives us "BiF polymorphism" at script land, where the
initialization BiF constructs the most derived type while subsequent BiFs
adhere to the same interface.

The reason why we split up the constructor in this case is that we have not yet
derived the math that computes the optimal number of hash functions for
counting Bloom filters---users have to explicitly parameterize them for now.
2013-06-17 16:14:11 -07:00
Matthias Vallentin
d25984ba45 Update baseline for unit tests. 2013-06-10 12:55:03 -07:00
Matthias Vallentin
86becdd6e4 Add tests. 2013-06-06 15:08:24 -07:00
Robin Sommer
eb637f9f3e Merge remote-tracking branch 'origin/master' into topic/robin/plugins
Thanks to git this merge was less troublesome that I was afraid it
would be. Not all tests pass yet though (and file hashes have changed
unfortunately).

Conflicts:
	cmake
	doc/scripts/DocSourcesList.cmake
	scripts/base/init-bare.bro
	scripts/base/protocols/ftp/main.bro
	scripts/base/protocols/irc/dcc-send.bro
	scripts/test-all-policy.bro
	src/AnalyzerTags.h
	src/CMakeLists.txt
	src/analyzer/Analyzer.cc
	src/analyzer/protocol/file/File.cc
	src/analyzer/protocol/file/File.h
	src/analyzer/protocol/http/HTTP.cc
	src/analyzer/protocol/http/HTTP.h
	src/analyzer/protocol/mime/MIME.cc
	src/event.bif
	src/main.cc
	src/util-config.h.in
	testing/btest/Baseline/coverage.bare-load-baseline/canonified_loaded_scripts.log
	testing/btest/Baseline/coverage.default-load-baseline/canonified_loaded_scripts.log
	testing/btest/Baseline/istate.events-ssl/receiver.http.log
	testing/btest/Baseline/istate.events-ssl/sender.http.log
	testing/btest/Baseline/istate.events/receiver.http.log
	testing/btest/Baseline/istate.events/sender.http.log
2013-05-16 17:58:48 -07:00
Bernhard Amann
56ab9285a4 Merge remote-tracking branch 'origin/master' into topic/bernhard/topk 2013-05-13 21:03:23 -07:00
Jon Siwek
e2a1d4a233 Allow default function/hook/event parameters. Addresses #972.
And changed the endianness parameter of bytestring_to_count() BIF to
default to false (big endian), mostly just to prove that the BIF parser
doesn't choke on default parameters.
2013-05-07 14:32:22 -05:00
Bernhard Amann
160da6f1a6 add sum function that can be used to get the number of total
observed elements.

Add methods to merge with and without pruning (before only merge
method was with pruning, which invalidates the number of total
observed elements)
2013-04-28 21:55:06 -07:00
Bernhard Amann
f2967f485b add persistence test not using predetermined random seeds.
This is failing at the moment.
2013-04-24 16:03:40 -07:00
Bernhard Amann
f69db71f57 Merge remote-tracking branch 'origin/master' into topic/bernhard/hyperloglog 2013-04-24 16:01:05 -07:00
Bernhard Amann
dbd53a09a6 Merge remote-tracking branch 'origin/master' into topic/bernhard/topk 2013-04-24 15:02:19 -07:00
Bernhard Amann
2f48008c42 implement merging for top-k.
I am not (entirely) sure that this is mathematically correct, but
I am (more and more) getting the feeling that it... might be.

In any case - this was the last step and now it should work
in cluster settings.
2013-04-24 06:17:51 -07:00
Bernhard Amann
6f863d2259 add serialization for topk 2013-04-23 23:24:02 -07:00
Yun Zheng Hu
3fff71b37a Add bytestring_to_count function to bro.bif 2013-04-23 20:18:38 -07:00
Bernhard Amann
de5769a88f topk for sumstats 2013-04-23 15:19:01 -07:00
Bernhard Amann
ce7ad003f2 well, a test that works..
Note: merging top-k data structures is not yet possible (and is
actually quite awkward/expensive). I will have to think about
how to do that for a bit...
2013-04-22 02:40:42 -07:00
Bernhard Amann
8340af55d1 persistence really works.
It took me way too long to find this - I got the uint8 serialize/deserialize
wrong :/
2013-04-19 09:52:45 -07:00