This allows tracing of hash key buffer reservations, reads, and writes via a new
debug stream, and supports printing a summary of a HashKey object via
Describe(). The latter comes in handy e.g. in TableVal::Describe() (where
including the hash key is now available but commented out).
This preserves the optimization of storing values directly in the key_u member
union when feasible, and using a variable size buffer otherwise. It also adds
bounds-checking for that buffer, moves size arguments to size_t, decouples
construction from hash computation, emulates the tagging feature found in
SerializationFormat to assist troubleshooting, and switches feasible
reinterpret_casts to static_casts.
Coverity Scan builds currently encounter catastrophic error, claiming
alignas requires use on both declaration and definition, so appears to
actually not understand "static inline" in combo with alignas.
* Add deprecation for MIME_Entity::ContentType(), use GetContentType()
* Add deprecation for MIME_Entity::ContentSubType(), use GetContentSubType()
* Add deprecation for MIME_Message::BuildHeaderVal(), use ToHeaderVal()
* Add deprecation for MIME_Message::BuildHeaderTable(), use ToHeaderTable()
* Add deprecation for mime::new_string_val(), use mime::to_stringval()
* Add deprecation for ARP_Analyzer::ConstructAddrVal(), use ToAddrVal()
* Add deprecation for ARP_Analyzer::EthAddrToStr(), use ToEthAddrStr()
* Change the Func::Call() replacement to be named Func::Invoke()
Names of functions also changed slightly, like bro_fmt -> fmt_bif.
Should generally be unusual/unexpected to see somone calling these
directly from C++ in their plugin, but since technically possible in
previous versions, I also removed the "private" restriction on accessing
the BifReturnVal member.
This commit switches UID hashing from md5 to a highway hash. It also
moves the salt value out of the file plugin - and makes it
installation-specific instead - it is moved to the global namespace.
There now are digest hash functions to make "static"
installation-specific hashes that are stable over workers available to
everyone; hashes can be 64, 128 or 256 bits in size.
Due to the fact that we switch the file hashing algorithm, all file
hashes change.
The underlyigng algorithm that is used for hashing is highwayhash-128,
which is significantly faster than md5.
This commit moves some of the hash datastructures and code from
util.cc into Hash.cc - where it seems more appropriate.
It also starts to make more Keyed hash functions available - still
using siphash as the default 64 bit keyed hash, but also making
128 and 256 bit highway hashes available.
There already are a few other functions that are defined but not
yet implemented - these will be "static" keyed hashes - which use
an installation specific key. These will be used to, e.g., get
rid of md5 hashing for the generation of file UIDs.
Currently, siphash is used for strings up to 36 bytes. hmac-md5 is used
for longer strings.
This switch-over is a remnant of the previous hash-function that was
used, which apparently was slower with longer input strings.
This change serves no purpose anymore. I performed a few performance tests
on strings of varying sizes:
For a 40 byte string with 10 million iterations:
siphash: 0.31 seconds
hmac-md5: 3.8 seconds
For a 1080 byte string with 10 million iterations:
siphash: 4.2 seconds
hmac-md5: 17 seconds
For a 18360 byte string with 10 million iterations:
siphash: 69 seconds
hmac-md5: 240 seconds
Hence, this commit removes the use of hmac-md5.
This change causes reordering of lines in a few logs.
This commit also changes the datastructure for the seed in probabilistic/Hasher
to get rid of a type-punning warning.
The Zeek code base has very inconsistent #includes. Many sources
included a few headers, and those headers included other headers, and
in the end, nearly everything is included everywhere, so missing
#includes were never noticed. Another side effect was a lot of header
bloat which slows down the build.
First step to fix it: in each source file, its own header should be
included first to verify that each header's includes are correct, and
none is missing.
After adding the missing #includes, I replaced lots of #includes
inside headers with class forward declarations. In most headers,
object pointers are never referenced, so declaring the function
prototypes with forward-declared classes is just fine.
This patch speeds up the build by 19%, because each compilation unit
gets smaller. Here are the "time" numbers for a fresh build (with a
warm page cache but without ccache):
Before this patch:
3144.94user 161.63system 3:02.87elapsed 1808%CPU (0avgtext+0avgdata 2168608maxresident)k
760inputs+12008400outputs (1511major+57747204minor)pagefaults 0swaps
After this patch:
2565.17user 141.83system 2:25.46elapsed 1860%CPU (0avgtext+0avgdata 1489076maxresident)k
72576inputs+9130920outputs (1667major+49400430minor)pagefaults 0swaps
This commit marks (hopefully) ever one-parameter constructor as explicit.
It also uses override in (hopefully) all circumstances where a virtual
method is overridden.
There are a very few other minor changes - most of them were necessary
to get everything to compile (like one additional constructor). In one
case I changed an implicit operation to an explicit string conversion -
I think the automatically chosen conversion was much more convoluted.
This took longer than I want to admit but not as long as I feared :)
This commit mostly changes the hash function that is used for Internal
hashing of data < 36 bytes from H3 to Siphash. This change is motivated
by the fact that it turns out that H3 apparently does not deliver a very
good source of data uniqueness; running HLL with H3 as a hashing
function results in quite poor results (up to of 75% off in my tests).
In difference, running HLL with Siphash (or HMAC-MD5) changes this
factor to ~2%.
This also fixes a long-standing bug in Hash.h which truncated our hash
values to 32 bit on most machines.
Furthermore, it once again fixes a problem with the Rank function in
HLL.
* origin/topic/robin/conn-ids:
Moving uid from conn_id to connection, and making output determistic if a hash seed is given.
Extending conn_id with a globally unique identifiers.