- Minor whitespace adjutment in merge
* origin/topic/vern/any-typetype-when-fix:
bug fixes for using "when" in functions that have a local of type "any"
This commit moves some of the hash datastructures and code from
util.cc into Hash.cc - where it seems more appropriate.
It also starts to make more Keyed hash functions available - still
using siphash as the default 64 bit keyed hash, but also making
128 and 256 bit highway hashes available.
There already are a few other functions that are defined but not
yet implemented - these will be "static" keyed hashes - which use
an installation specific key. These will be used to, e.g., get
rid of md5 hashing for the generation of file UIDs.
This function just calculates the chosen digest and returns the result
in either the passed buffer, or in a static buffer. Basically a superset
to the surprisingly popular internal_md5.
Currently, siphash is used for strings up to 36 bytes. hmac-md5 is used
for longer strings.
This switch-over is a remnant of the previous hash-function that was
used, which apparently was slower with longer input strings.
This change serves no purpose anymore. I performed a few performance tests
on strings of varying sizes:
For a 40 byte string with 10 million iterations:
siphash: 0.31 seconds
hmac-md5: 3.8 seconds
For a 1080 byte string with 10 million iterations:
siphash: 4.2 seconds
hmac-md5: 17 seconds
For a 18360 byte string with 10 million iterations:
siphash: 69 seconds
hmac-md5: 240 seconds
Hence, this commit removes the use of hmac-md5.
This change causes reordering of lines in a few logs.
This commit also changes the datastructure for the seed in probabilistic/Hasher
to get rid of a type-punning warning.
If a bloomfilter doesn't have a type, that just means no
bloomfilter_add() has been called yet, so seems undesirable to emit an
error for a lookup against something that's known to be empty.
Useful for cases that don't need to use a fuzzing engine, but just run
the fuzz targets over some set of inputs, like for regression/CI tests.
Also added a POP3 fuzzer dictionary, seed corpus, and README with
examples.
This adds the entirety of the highwayhash implementation of Google.
This includes siphash as well as severl highwayhash variants - which
are faster.
This first commit only switches out the siphash implementation. All
hashes that are generated are exactly the same as before. However, this
does make all other hashes available to be used by us.
I did some performance tests vs the previous siphash implementation by
running the 2009-M57-day11-18 trace 100x through both cases. The average
runtime was virtually the same (within 0.014 seconds of each other).
Note that the way that I included the highwayhash implementation in our
cmake setup is... well, let's say hacky. This definitely needs to be
changed a bit before including this in a real build.
General changes:
* Add -D/--deterministic command line option as
convenience/alternative to -G/--load-seeds (i.e. no file needed, it just
uses zero-initialized random seeds). It also changes Broker data
stores over to using deterministic timing rather than real time.
* Add option to make Reporter abort on runtime scripting errors
The "http_header" event now has an "original_name" parameter that allows
access to the original header name (the "name" parameter reamins the
same as before: it's the uppercased header name).
The "mime_header_rec" record type now also includes an "original_name"
field to similarly provide access to original header name in the
following events: "http_all_headers", "mime_one_header", and
"mime_all_headers".