- 16 Oct, 2020 40 commits
-
-
Geoff Simmons authored
-
Geoff Simmons authored
-
Geoff Simmons authored
-
Geoff Simmons authored
-
Geoff Simmons authored
-
Geoff Simmons authored
Fix rounding for h2strings_avg while we're here.
-
Geoff Simmons authored
-
Geoff Simmons authored
-
Geoff Simmons authored
-
Geoff Simmons authored
-
Geoff Simmons authored
The hash inner loop iterates no more often than necessary. We also set min and max lengths for strings for each secondary hash, so that misses may be found more quickly.
-
Geoff Simmons authored
-
Geoff Simmons authored
-
Geoff Simmons authored
- Add PH stats - Stats are visible in varnishstat curses mode at verbosity level debug. - Rename stats for better readability. - Stats specific to PH and QP have names with prefixed hash_ and trie_, respectively.
-
Geoff Simmons authored
-
Geoff Simmons authored
-
Geoff Simmons authored
-
Geoff Simmons authored
-
Geoff Simmons authored
-
Geoff Simmons authored
-
Geoff Simmons authored
-
Geoff Simmons authored
-
Geoff Simmons authored
Benchmarks of this idea can lead to improvements, by about 10 ns per match operation, which makes a difference in throughput. But we keep things simple for now.
-
Geoff Simmons authored
-
Geoff Simmons authored
The math was wrong, and changing the hash function to correctly compute mod Mersenne prime just made it slower, and didn't seem to lower collision rates. Hash table sizes are just the next higher power of 2. As I interpret Thorup (2020), this is still strongly universal hashing.
-
Geoff Simmons authored
-
Geoff Simmons authored
-
Geoff Simmons authored
-
Geoff Simmons authored
For large sets, workspace could be too small.
-
Geoff Simmons authored
Universal hashing has a sounder theoretical basis; in particular, it doesn't have the dubious minimum hash table size below which a perfect hash may not be possible, and which was set by trial and error. For nearly all test data, universal hashing performs at least as well or better. Especially better for sets with longer strings, since the subject string is cast as an array of uint32_t, so the hash is computed in fewer operations. The only exception I've noticed is /usr/share/dict/words, which now appears to have more collisions than under the previous approach. But it appears likely that this only becomes an issue for sets that are much larger than are probable for VCL use cases (in the 100,000 range), and if all of the sets' elements are tested for matches about equally often (whereas real-world usage patterns tend to match a subset much more frequently).
-
Geoff Simmons authored
The VMOD does this during .compile(), and QP_Insert() is no longer called during .add(). The .compile() call is now required in all cases, and it must be called before .create_stats(). This is because QP_Insert() was not correctly rotating the trie when a set has overlapping prefixes, and a shorter prefix was added before the longer one. With sorted order, shorter prefixes are always added first, so rotation is unnecessary.
-
Geoff Simmons authored
This reverts commit afded0f5.
-
Geoff Simmons authored
-
Geoff Simmons authored
The new algorithm improves efficiency with iteration in place of recursion, and in a number of other ways: - Avoid searches into dead-end branches. The traversal of all branches was done because of the overlapping prefix case -- "foo" and "foobar" both in the set. Now we just search the tree for a match, but before descending into the next branch, check if there are other branches at which the current prefix matches a terminating node. - Only do string comparisons when we hit a terminating node. - Mark terminating nodes with a flag in the tree, so that we don't go looking for the null byte in the strings table during the search. While we're here, rename the flag for the nibble search as hinib -- non-zero if and only if we inspect the most significant nibble at that node. Also remove some dead code from QP_Insert().
-
Geoff Simmons authored
-
Geoff Simmons authored
-
Geoff Simmons authored
-
Geoff Simmons authored
May be advantageous for loop/branch prediction.
-
Geoff Simmons authored
Theoretically, this reduces the probability of collisions. Benchmarks don't show much of a difference.
-
Geoff Simmons authored
This adds the .compile() method to set objects, required for the use of .match(). Docs for the .compile() method are currently incomplete.
-