Remove the siphash->hmac-md5 switch after 36 bytes.

Currently, siphash is used for strings up to 36 bytes. hmac-md5 is used
for longer strings.

This switch-over is a remnant of the previous hash-function that was
used, which apparently was slower with longer input strings.

This change serves no purpose anymore. I performed a few performance tests
on strings of varying sizes:

For a 40 byte string with 10 million iterations:

siphash: 0.31 seconds
hmac-md5: 3.8 seconds

For a 1080 byte string with 10 million iterations:

siphash: 4.2 seconds
hmac-md5: 17 seconds

For a 18360 byte string with 10 million iterations:

siphash: 69 seconds
hmac-md5: 240 seconds

Hence, this commit removes the use of hmac-md5.

This change causes reordering of lines in a few logs.

This commit also changes the datastructure for the seed in probabilistic/Hasher
to get rid of a type-punning warning.
This commit is contained in:
Johanna Amann 2020-04-24 13:12:01 -07:00
parent bb050910bb
commit 5e7915ae7a
13 changed files with 269 additions and 297 deletions

View file

@ -3,13 +3,13 @@
#empty_field (empty)
#unset_field -
#path intel
#open 2019-06-07-02-20-05
#open 2020-04-23-23-52-54
#fields ts uid id.orig_h id.orig_p id.resp_h id.resp_p seen.indicator seen.indicator_type seen.where seen.node matched sources fuid file_mime_type file_desc
#types time string addr port addr port string enum enum string set[enum] set[string] string string string
1559874004.952411 - - - - - 192.168.1.1 Intel::ADDR SOMEWHERE zeek Intel::ADDR source1 - - -
1559874004.952411 - - - - - 192.168.2.1 Intel::ADDR SOMEWHERE zeek Intel::SUBNET source1 - - -
1559874004.952411 - - - - - 192.168.142.1 Intel::ADDR SOMEWHERE zeek Intel::SUBNET,Intel::ADDR source1 - - -
#close 2019-06-07-02-20-05
1587685974.717161 - - - - - 192.168.1.1 Intel::ADDR SOMEWHERE zeek Intel::ADDR source1 - - -
1587685974.717161 - - - - - 192.168.2.1 Intel::ADDR SOMEWHERE zeek Intel::SUBNET source1 - - -
1587685974.717161 - - - - - 192.168.142.1 Intel::ADDR SOMEWHERE zeek Intel::SUBNET,Intel::ADDR source1 - - -
#close 2020-04-23-23-52-54
Seen: [indicator=192.168.1.1, indicator_type=Intel::ADDR, host=192.168.1.1, where=SOMEWHERE, node=zeek, conn=<uninitialized>, uid=<uninitialized>, f=<uninitialized>, fuid=<uninitialized>]
Item: [indicator=192.168.1.1, indicator_type=Intel::ADDR, meta=[source=source1, desc=this host is just plain baaad, url=http://some-data-distributor.com/1]]
@ -18,7 +18,7 @@ Seen: [indicator=192.168.2.1, indicator_type=Intel::ADDR, host=192.168.2.1, wher
Item: [indicator=192.168.2.0/24, indicator_type=Intel::SUBNET, meta=[source=source1, desc=this subnetwork is just plain baaad, url=http://some-data-distributor.com/2]]
Seen: [indicator=192.168.142.1, indicator_type=Intel::ADDR, host=192.168.142.1, where=SOMEWHERE, node=zeek, conn=<uninitialized>, uid=<uninitialized>, f=<uninitialized>, fuid=<uninitialized>]
Item: [indicator=192.168.142.1, indicator_type=Intel::ADDR, meta=[source=source1, desc=this host is just plain baaad, url=http://some-data-distributor.com/3]]
Item: [indicator=192.168.128.0/18, indicator_type=Intel::SUBNET, meta=[source=source1, desc=this subnetwork might be baaad, url=http://some-data-distributor.com/5]]
Item: [indicator=192.168.142.0/26, indicator_type=Intel::SUBNET, meta=[source=source1, desc=this subnetwork is inside, url=http://some-data-distributor.com/4]]
Item: [indicator=192.168.142.0/24, indicator_type=Intel::SUBNET, meta=[source=source1, desc=this subnetwork is baaad, url=http://some-data-distributor.com/4]]
Item: [indicator=192.168.128.0/18, indicator_type=Intel::SUBNET, meta=[source=source1, desc=this subnetwork might be baaad, url=http://some-data-distributor.com/5]]
Item: [indicator=192.168.142.1, indicator_type=Intel::ADDR, meta=[source=source1, desc=this host is just plain baaad, url=http://some-data-distributor.com/3]]