I’ve been noodling a lot with my single header C/C++ hashmap recently - hashmap.h. I knew from reading around that the power-of-two size requirement I had meant that it made getting a bucket location in the range was easy, but that it could result in suboptimal bucket selection. Various places on the interwebs said that using a prime number hash was better as doing the modulus by a prime would give a much better avalanche effect, but this would come at the cost of having to do an actual integer modulus operation for every hash - which is quite a bit slower.

Then friend of the blog Nathan Reed made me aware of Fibonacci Hashing, which is utterly wonderful.

Before my hasher looked like:

unsigned get_bucket(const void* const data, const unsigned length) {
  unsigned hash = crc32_hasher(data, length);
  return hash & (capacity - 1);
}

And now with the fibonacci hash:

unsigned get_bucket(const void* const data, const unsigned length) {
  unsigned hash = crc32_hasher(data, length);
  return (hash * 2654435769u) >> (32 - integer_log2(capacity));
}

2654435769 is chosen by doing pow(2, 32) / fibonaccis_golden_ratio and rounding towards the nearest odd integer (so that the bottom bit is set).

One of the claims of fibonacci hashing is that consecutive inputs will produce a good spread of outputs, meaning you don’t hit problems when multiple things with similar hashes get inserted next to each other. So how does it perform?

A graph of the avalanche properties of the fibonacci hash

The four lines on the chart as as follows:

  • avalanche - the percentage of locations that have at least half of their active bits flipped between consecutive inputs. The closer to 100% this is the better a hashing approach is regarded to be.
  • average difference - how different two consecutive inputs are to each other by using a delta between them, as a percentage of the range. So an average difference of 50% with a bucket size 8 means that for any two consecutive inputs we’d expect them to have a difference of 4.
  • min difference - for two consecutive inputs in the integer range that pass through the fibonacci hash, what is the minimum delta difference between them. So a bucket size of 16 with a minimum difference of 25% would mean that every consecutive input has a difference of at least 4.
  • max difference - the opposite of the min difference!

Notice:

  • From a bucket size of 8 and up (highly likely with most hashmaps in my experience!) the min, average, and max differences are all stable. I’m especially happy with the min difference - as this means that we can rely on the fibonacci hash putting a 38% delta difference between consecutive inputs.
  • We average about a 47% difference between any two consecutive inputs.
  • The avalanche percentage is quite lumpy - this is because any log2(bucket_size) that is odd will result in us under or over rounding when we check if at least half the bits flipped. I’ve leaned on under rounding any ties.
  • Overall we’re seeing about a 50% probability for avalanching to have happened between consecutive inputs. But remember - this fibonacci hash isn’t our actual hash! It’s what we apply after we’ve already hashed the key. So this will compound with the hash I actually use on the input keys, meaning this just helps us avalanche better than before. Nice!

I’m pretty happy overall with this - the hashing performance has no effective regressions in my benchmarks, but the hashing is much better as a result. I’ve landed this code in GitHub already so anyone who uses my hashmap.h library will already benefit from this.