• Geoff Simmons's avatar
    Implement perfect hashing based on universal hashing. · 1f7815f1
    Geoff Simmons authored
    Universal hashing has a sounder theoretical basis; in particular, it
    doesn't have the dubious minimum hash table size below which a
    perfect hash may not be possible, and which was set by trial and error.
    
    For nearly all test data, universal hashing performs at least as
    well or better. Especially better for sets with longer strings,
    since the subject string is cast as an array of uint32_t, so the
    hash is computed in fewer operations.
    
    The only exception I've noticed is /usr/share/dict/words, which now
    appears to have more collisions than under the previous approach.
    But it appears likely that this only becomes an issue for sets that
    are much larger than are probable for VCL use cases (in the 100,000
    range), and if all of the sets' elements are tested for matches
    about equally often (whereas real-world usage patterns tend to
    match a subset much more frequently).
    1f7815f1