Remove the siphash->hmac-md5 switch after 36 bytes.

Currently, siphash is used for strings up to 36 bytes. hmac-md5 is used
for longer strings.

This switch-over is a remnant of the previous hash-function that was
used, which apparently was slower with longer input strings.

This change serves no purpose anymore. I performed a few performance tests
on strings of varying sizes:

For a 40 byte string with 10 million iterations:

siphash: 0.31 seconds
hmac-md5: 3.8 seconds

For a 1080 byte string with 10 million iterations:

siphash: 4.2 seconds
hmac-md5: 17 seconds

For a 18360 byte string with 10 million iterations:

siphash: 69 seconds
hmac-md5: 240 seconds

Hence, this commit removes the use of hmac-md5.

This change causes reordering of lines in a few logs.

This commit also changes the datastructure for the seed in probabilistic/Hasher
to get rid of a type-punning warning.
This commit is contained in:
Johanna Amann 2020-04-24 13:12:01 -07:00
parent bb050910bb
commit 5e7915ae7a
13 changed files with 269 additions and 297 deletions

View file

@ -24,11 +24,10 @@ public:
typedef hash_t digest;
typedef std::vector<digest> digest_vector;
struct seed_t {
uint64_t h1;
uint64_t h2;
alignas(16) highwayhash::HH_U64 h[2];
friend seed_t operator+(seed_t lhs, const uint64_t rhs) {
lhs.h1 += rhs;
lhs.h[0] += rhs;
return lhs;
}
};
@ -179,8 +178,8 @@ public:
friend bool operator==(const UHF& x, const UHF& y)
{
return (x.seed.h1 == y.seed.h1) &&
(x.seed.h2 == y.seed.h2);
return (x.seed.h[0] == y.seed.h[0]) &&
(x.seed.h[1] == y.seed.h[1]);
}
friend bool operator!=(const UHF& x, const UHF& y)