- 16 Oct, 2020 40 commits
-
-
Geoff Simmons authored
-
Geoff Simmons authored
-
Geoff Simmons authored
-
Geoff Simmons authored
-
Geoff Simmons authored
-
Geoff Simmons authored
We can now use the benchmarks to dump data structures. This can be done for both QP and PH, and is not limited by workspace or the varnishtest log buffer.
-
Geoff Simmons authored
-
Geoff Simmons authored
VCL failure is invoked if: - no entries were added to a set - a set was not compiled - .compile() is called in a VCL sub other than vcl_init - a numeric index is out of range (larger than nmembers) - the conditions for UNIQUE or EXACT fail - associated data to be retrieved (string, backend etc) was not added If .match() or .hasprefix() are called with a NULL subject, it is logged using tag Notice, but is not an error (return value is false). This is because it may or may not be intentional to attempt a match against an unset header. The .matched() method now may have a select argument, and works similarly to other methods with the f(INT n, ENUM select) signature, except that it returns false when the select condition fails, but does not invoke VCL failure. This makes it possible to check if UNIQUE or EXACT may be used, and avoid VCL failure if desired.
-
Geoff Simmons authored
-
Geoff Simmons authored
-
Geoff Simmons authored
-
Geoff Simmons authored
-
Geoff Simmons authored
-
Geoff Simmons authored
-
Geoff Simmons authored
-
Geoff Simmons authored
Fix rounding for h2strings_avg while we're here.
-
Geoff Simmons authored
-
Geoff Simmons authored
-
Geoff Simmons authored
-
Geoff Simmons authored
-
Geoff Simmons authored
The hash inner loop iterates no more often than necessary. We also set min and max lengths for strings for each secondary hash, so that misses may be found more quickly.
-
Geoff Simmons authored
-
Geoff Simmons authored
-
Geoff Simmons authored
- Add PH stats - Stats are visible in varnishstat curses mode at verbosity level debug. - Rename stats for better readability. - Stats specific to PH and QP have names with prefixed hash_ and trie_, respectively.
-
Geoff Simmons authored
-
Geoff Simmons authored
-
Geoff Simmons authored
-
Geoff Simmons authored
-
Geoff Simmons authored
-
Geoff Simmons authored
-
Geoff Simmons authored
-
Geoff Simmons authored
-
Geoff Simmons authored
Benchmarks of this idea can lead to improvements, by about 10 ns per match operation, which makes a difference in throughput. But we keep things simple for now.
-
Geoff Simmons authored
-
Geoff Simmons authored
The math was wrong, and changing the hash function to correctly compute mod Mersenne prime just made it slower, and didn't seem to lower collision rates. Hash table sizes are just the next higher power of 2. As I interpret Thorup (2020), this is still strongly universal hashing.
-
Geoff Simmons authored
-
Geoff Simmons authored
-
Geoff Simmons authored
-
Geoff Simmons authored
For large sets, workspace could be too small.
-
Geoff Simmons authored
Universal hashing has a sounder theoretical basis; in particular, it doesn't have the dubious minimum hash table size below which a perfect hash may not be possible, and which was set by trial and error. For nearly all test data, universal hashing performs at least as well or better. Especially better for sets with longer strings, since the subject string is cast as an array of uint32_t, so the hash is computed in fewer operations. The only exception I've noticed is /usr/share/dict/words, which now appears to have more collisions than under the previous approach. But it appears likely that this only becomes an issue for sets that are much larger than are probable for VCL use cases (in the 100,000 range), and if all of the sets' elements are tested for matches about equally often (whereas real-world usage patterns tend to match a subset much more frequently).
-