Change Hashing from H3 to Siphash.

This commit mostly changes the hash function that is used for Internal
hashing of data < 36 bytes from H3 to Siphash. This change is motivated
by the fact that it turns out that H3 apparently does not deliver a very
good source of data uniqueness; running HLL with H3 as a hashing
function results in quite poor results (up to of 75% off in my tests).
In difference, running HLL with Siphash (or HMAC-MD5) changes this
factor to ~2%.

This also fixes a long-standing bug in Hash.h which truncated our hash
values to 32 bit on most machines.

Furthermore, it once again fixes a problem with the Rank function in
HLL.
This commit is contained in:
Johanna Amann 2016-07-13 06:35:32 -07:00
parent c15f48661d
commit e1218cc7fa
10 changed files with 257 additions and 25 deletions

View file

@ -181,10 +181,11 @@ extern std::string strreplace(const std::string& s, const std::string& o, const
// Remove all leading and trailing white space from string.
extern std::string strstrip(std::string s);
extern bool hmac_key_set;
extern uint8 shared_hmac_md5_key[16];
extern bool siphash_key_set;
extern uint8 shared_siphash_key[16];
extern int hmac_key_set;
extern unsigned char shared_hmac_md5_key[16];
extern void hmac_md5(size_t size, const unsigned char* bytes,
unsigned char digest[16]);