Commit graph

8 commits

Author SHA1 Message Date
Johanna Amann
e1218cc7fa Change Hashing from H3 to Siphash.
This commit mostly changes the hash function that is used for Internal
hashing of data < 36 bytes from H3 to Siphash. This change is motivated
by the fact that it turns out that H3 apparently does not deliver a very
good source of data uniqueness; running HLL with H3 as a hashing
function results in quite poor results (up to of 75% off in my tests).
In difference, running HLL with Siphash (or HMAC-MD5) changes this
factor to ~2%.

This also fixes a long-standing bug in Hash.h which truncated our hash
values to 32 bit on most machines.

Furthermore, it once again fixes a problem with the Rank function in
HLL.
2016-07-13 06:44:51 -07:00
Johanna Amann
3aabe83ec6 Hyperloglog: change calculation of Rho
This commit changes the calculation of the rho-value to be in line with
the implementation of the original research paper, counting the number
of zero bits before the data.

This also fixes an infinite loop in case the hash value is 0.

I also cleaned up the code a bit, converting the raw pointers that were
used to a STL vector.

Addresses BIT-1612
2016-06-13 15:18:44 -07:00
Robin Sommer
c6de23ebe1 Merge remote-tracking branch 'origin/topic/bernhard/ticket1072'
* origin/topic/bernhard/ticket1072:
  and const 2 more functions
  update hll documentation, make a few functions private and create a new copy constructor.
  fix case where hll_error_margin could be undefined (thanks John)

BIT-1072 #merged
2013-09-18 15:00:06 -07:00
Bernhard Amann
ecc20b932a and const 2 more functions 2013-09-16 11:00:54 -07:00
Bernhard Amann
c0f780c728 update hll documentation, make a few functions private and create
a new copy constructor.
2013-09-16 10:40:25 -07:00
Robin Sommer
6f9d28cc18 Merge branch 'topic/robin/hyperloglog-merge'
* topic/robin/hyperloglog-merge: (35 commits)
  Making the confidence configurable.
  Renaming HyperLogLog->CardinalityCounter.
  Fixing bug introduced during merging.
  add clustered leak test for hll. No issues.
  make gcc happy
  (hopefully) fix refcounting problem in hll/bloom-filter opaque vals. Thanks Robin.
  re-use same hash class for all add operations
  get hll ready for merging
  and forgot a file...
  adapt to new structure
  fix opaqueval-related memleak.
  make it compile on case-sensitive file systems and fix warnings
  make error rate configureable
  add persistence test not using predetermined random seeds.
  update cluster test to also use hll
  persistence really works.
  well, with this commit synchronizing the data structure should work.. ...if we had consistent hashing.
  and also serialize the other things we need
  ok, this bug was hard to find.
  serialization compiles.
  ...
2013-08-31 10:42:42 -07:00
Robin Sommer
295987c8d0 Making the confidence configurable. 2013-08-31 10:34:50 -07:00
Robin Sommer
fb3ceae6d5 Renaming HyperLogLog->CardinalityCounter.
For consistency with the class' name.
2013-08-31 10:22:27 -07:00
Renamed from src/probabilistic/HyperLogLog.h (Browse further)