-
Geoff Simmons authored
Universal hashing has a sounder theoretical basis; in particular, it doesn't have the dubious minimum hash table size below which a perfect hash may not be possible, and which was set by trial and error. For nearly all test data, universal hashing performs at least as well or better. Especially better for sets with longer strings, since the subject string is cast as an array of uint32_t, so the hash is computed in fewer operations. The only exception I've noticed is /usr/share/dict/words, which now appears to have more collisions than under the previous approach. But it appears likely that this only becomes an issue for sets that are much larger than are probable for VCL use cases (in the 100,000 range), and if all of the sets' elements are tested for matches about equally often (whereas real-world usage patterns tend to match a subset much more frequently).
1f7815f1