Commit graph

21 commits

Author SHA1 Message Date
Jon Siwek
ca1e5fe4be Replace deprecated usage of BifFunc:: with zeek::BifFunc::
Names of functions also changed slightly, like bro_fmt -> fmt_bif.

Should generally be unusual/unexpected to see somone calling these
directly from C++ in their plugin, but since technically possible in
previous versions, I also removed the "private" restriction on accessing
the BifReturnVal member.
2020-05-14 17:26:30 -07:00
Johanna Amann
ce8b121e12 Hash unification: address PR feedback 2020-05-13 14:07:59 +00:00
Johanna Amann
04ed125941 Merge remote-tracking branch 'origin/master' into topic/johanna/hash-unification 2020-05-06 23:18:33 +00:00
Johanna Amann
3bce313b12 Switch file UID hashing from md5 to highwayhash.
This commit switches UID hashing from md5 to a highway hash. It also
moves the salt value out of the file plugin - and makes it
installation-specific instead - it is moved to the global namespace.

There now are digest hash functions to make "static"
installation-specific hashes that are stable over workers available to
everyone; hashes can be 64, 128 or 256 bits in size.

Due to the fact that we switch the file hashing algorithm, all file
hashes change.

The underlyigng algorithm that is used for hashing is highwayhash-128,
which is significantly faster than md5.
2020-04-30 10:20:09 -07:00
Johanna Amann
360c06a3f8 Start refactoring hashing.
This commit moves some of the hash datastructures and code from
util.cc into Hash.cc - where it seems more appropriate.

It also starts to make more Keyed hash functions available - still
using siphash as the default 64 bit keyed hash, but also making
128 and 256 bit highway hashes available.

There already are a few other functions that are defined but not
yet implemented - these will be "static" keyed hashes - which use
an installation specific key. These will be used to, e.g., get
rid of md5 hashing for the generation of file UIDs.
2020-04-24 18:27:09 -07:00
Johanna Amann
5e7915ae7a Remove the siphash->hmac-md5 switch after 36 bytes.
Currently, siphash is used for strings up to 36 bytes. hmac-md5 is used
for longer strings.

This switch-over is a remnant of the previous hash-function that was
used, which apparently was slower with longer input strings.

This change serves no purpose anymore. I performed a few performance tests
on strings of varying sizes:

For a 40 byte string with 10 million iterations:

siphash: 0.31 seconds
hmac-md5: 3.8 seconds

For a 1080 byte string with 10 million iterations:

siphash: 4.2 seconds
hmac-md5: 17 seconds

For a 18360 byte string with 10 million iterations:

siphash: 69 seconds
hmac-md5: 240 seconds

Hence, this commit removes the use of hmac-md5.

This change causes reordering of lines in a few logs.

This commit also changes the datastructure for the seed in probabilistic/Hasher
to get rid of a type-punning warning.
2020-04-24 13:14:29 -07:00
Tim Wojtulewicz
fd5e15b116 The Great Embooleanating
A large number of functions had return values and/or arguments changed
to use ``bool`` types instead of ``int``.
2020-03-31 06:41:54 +00:00
Max Kellermann
0db61f3094 include cleanup
The Zeek code base has very inconsistent #includes.  Many sources
included a few headers, and those headers included other headers, and
in the end, nearly everything is included everywhere, so missing
#includes were never noticed.  Another side effect was a lot of header
bloat which slows down the build.

First step to fix it: in each source file, its own header should be
included first to verify that each header's includes are correct, and
none is missing.

After adding the missing #includes, I replaced lots of #includes
inside headers with class forward declarations.  In most headers,
object pointers are never referenced, so declaring the function
prototypes with forward-declared classes is just fine.

This patch speeds up the build by 19%, because each compilation unit
gets smaller.  Here are the "time" numbers for a fresh build (with a
warm page cache but without ccache):

Before this patch:

 3144.94user 161.63system 3:02.87elapsed 1808%CPU (0avgtext+0avgdata 2168608maxresident)k
 760inputs+12008400outputs (1511major+57747204minor)pagefaults 0swaps

After this patch:

 2565.17user 141.83system 2:25.46elapsed 1860%CPU (0avgtext+0avgdata 1489076maxresident)k
 72576inputs+9130920outputs (1667major+49400430minor)pagefaults 0swaps
2020-02-04 20:51:02 +01:00
Dominik Charousset
c1f3fe7829 Switch from header guards to pragma once 2019-09-17 14:10:30 +02:00
Tim Wojtulewicz
54752ef9a1 Deprecate the internal int/uint types in favor of the cstdint types they were based on 2019-08-12 13:50:07 -07:00
Johanna Amann
6d612ced3d Mark one-parameter constructors as explicit & use override where possible
This commit marks (hopefully) ever one-parameter constructor as explicit.

It also uses override in (hopefully) all circumstances where a virtual
method is overridden.

There are a very few other minor changes - most of them were necessary
to get everything to compile (like one additional constructor). In one
case I changed an implicit operation to an explicit string conversion -
I think the automatically chosen conversion was much more convoluted.

This took longer than I want to admit but not as long as I feared :)
2018-03-27 07:17:32 -07:00
Johanna Amann
e1218cc7fa Change Hashing from H3 to Siphash.
This commit mostly changes the hash function that is used for Internal
hashing of data < 36 bytes from H3 to Siphash. This change is motivated
by the fact that it turns out that H3 apparently does not deliver a very
good source of data uniqueness; running HLL with H3 as a hashing
function results in quite poor results (up to of 75% off in my tests).
In difference, running HLL with Siphash (or HMAC-MD5) changes this
factor to ~2%.

This also fixes a long-standing bug in Hash.h which truncated our hash
values to 32 bit on most machines.

Furthermore, it once again fixes a problem with the Rank function in
HLL.
2016-07-13 06:44:51 -07:00
Jon Siwek
d7dafe2fe2 Refactoring various usages of new IPAddr class.
Reducing number of places that internal representation was exposed
via GetBytes/CopyIPv6.

Also fixed a bug in remask_addr bif.
2012-02-22 14:45:44 -06:00
Jon Siwek
0f207c243c Port DNS_Mgr to use new IPAddr class, enable lookups on IPv6 addrs.
Host lookups still need to be changed to also do AAAA queries.
2012-02-13 15:57:59 -06:00
Jon Siwek
f945f3c518 Change HashKey threshold for using H3 to 36 bytes.
This is enough to accommodate using H3 instead of HMAC/MD5 for IPv6
Conn::Key's and performs better since a hash happens for every packet.
2012-02-09 12:55:55 -06:00
Jon Siwek
495e987938 Remove $Id$ tags 2011-08-04 15:21:18 -05:00
Robin Sommer
59d6202104 Merge remote branch 'origin/topic/robin/conn-ids'
* origin/topic/robin/conn-ids:
  Moving uid from conn_id to connection, and making output determistic if a hash seed is given.
  Extending conn_id with a globally unique identifiers.
2011-04-22 22:13:44 -07:00
Robin Sommer
03c0d587a4 Removing code for unused hash functions. 2011-04-01 16:09:28 -07:00
Robin Sommer
881071cc99 Extending conn_id with a globally unique identifiers. 2011-03-15 20:42:56 -07:00
Jon Siwek
13569aaab7 Removal of the --enable-int64 config option.
This will now be always on.  As such, uses of the USE_INT64 preprocessor
definition have been cleaned out.
2010-11-17 20:38:33 -06:00
Robin Sommer
61757ac78b Initial import of svn+ssh:://svn.icir.org/bro/trunk/bro as of r7088 2010-09-27 20:42:30 -07:00