Commits · 85fa8abfd0b79490d019be557b9dc466a52bdba8 · uplex-varnish / libvmod-selector

17 Sep, 2020 3 commits
- Update TODO · 85fa8abf
  Geoff Simmons authored Sep 17, 2020
  
  85fa8abf
- Retire the undocumented .debug() method. · 4d3db9f7
  Geoff Simmons authored Sep 17, 2020
```
We can now use the benchmarks to dump data structures. This can be
done for both QP and PH, and is not limited by workspace or the
varnishtest log buffer.
```
  4d3db9f7
- Add the allow_overlaps flag. · cd0a0d45
  Geoff Simmons authored Sep 17, 2020
  
  cd0a0d45
16 Sep, 2020 1 commit

Extend .matched(), and invoke VCL failure for more error conditions. · 78b0f570

Geoff Simmons authored Sep 16, 2020

VCL failure is invoked if:
- no entries were added to a set
- a set was not compiled
- .compile() is called in a VCL sub other than vcl_init
- a numeric index is out of range (larger than nmembers)
- the conditions for UNIQUE or EXACT fail
- associated data to be retrieved (string, backend etc) was not added

If .match() or .hasprefix() are called with a NULL subject, it is
logged using tag Notice, but is not an error (return value is false).
This is because it may or may not be intentional to attempt a match
against an unset header.

The .matched() method now may have a select argument, and works
similarly to other methods with the f(INT n, ENUM select) signature,
except that it returns false when the select condition fails, but
does not invoke VCL failure. This makes it possible to check if
UNIQUE or EXACT may be used, and avoid VCL failure if desired.

78b0f570

13 Sep, 2020 1 commit
- Silence an "unused result" warning for the benchmark code. · 4491e943
  Geoff Simmons authored Sep 13, 2020
  
  4491e943
08 Sep, 2020 3 commits
- Remove the ZERO_OBJ workaround. · 509a3585
  Geoff Simmons authored Sep 08, 2020
  
  509a3585
- PH lookup fails fast when the subject string length is out of range. · a30da8fc
  Geoff Simmons authored Sep 08, 2020
  
  a30da8fc
- Add some data used for benchmarks. · 3fd26215
  Geoff Simmons authored Sep 08, 2020
  
  3fd26215
07 Sep, 2020 9 commits
- A few extra notes in the comments. · ad2ae73e
  Geoff Simmons authored Sep 07, 2020
  
  ad2ae73e
- Remove the patricia interface. · 4c8eda58
  Geoff Simmons authored Sep 07, 2020
  
  4c8eda58
- Simplify expression of the PH hash function. · 2241d15d
  Geoff Simmons authored Sep 07, 2020
  
  2241d15d
- Add VSC stats for the key vector lengths of PH secondary hashes. · 52e204f6
  Geoff Simmons authored Sep 07, 2020
```
Fix rounding for h2strings_avg while we're here.
```
  52e204f6
- Correct rounding for the h2buckets_avg stat. · e1379e0b
  Geoff Simmons authored Sep 07, 2020
  
  e1379e0b
- Fix copy pasta in a VSC documentation line. · d45959ec
  Geoff Simmons authored Sep 07, 2020
  
  d45959ec
- Add PH stats for secondary hash key vector lengths. · f66374ea
  Geoff Simmons authored Sep 07, 2020
  
  f66374ea
- For PH, all-odd keys in the vector are not necessary. · f79f2e20
  Geoff Simmons authored Sep 07, 2020
  
  f79f2e20
- For PH, set a key vector length for each secondary hash. · df554890
  Geoff Simmons authored Sep 07, 2020
```
The hash inner loop iterates no more often than necessary.

We also set min and max lengths for strings for each secondary hash,
so that misses may be found more quickly.
```
  df554890
04 Sep, 2020 2 commits
- Add to the list of common uses cases in the docs. · e3662df1
  Geoff Simmons authored Sep 04, 2020
  
  e3662df1
- Build the benchmarks only if a configure option is enabled. · 5bbc2691
  Geoff Simmons authored Sep 04, 2020
  
  5bbc2691
01 Sep, 2020 2 commits
- Adjust to he changed WS_* interface. · 5c97bc84
  Geoff Simmons authored Sep 01, 2020
  
  5c97bc84
- Change in README due to change in the vmodtool. · cbbfc552
  Geoff Simmons authored Sep 01, 2020
  
  cbbfc552
13 Jun, 2020 6 commits
- Extend and refactor VMOD stats. · 82017620
  Geoff Simmons authored Jun 13, 2020
```
- Add PH stats
- Stats are visible in varnishstat curses mode at verbosity level
  debug.
- Rename stats for better readability.
- Stats specific to PH and QP have names with prefixed hash_ and
  trie_, respectively.
```
  82017620
- QP fmin stat is 0 when there is only one node (no fanout). · 355a3aa4
  Geoff Simmons authored Jun 13, 2020
  
  355a3aa4
- Fix QP favg stat. · c51554ec
  Geoff Simmons authored Jun 13, 2020
  
  c51554ec
- Very minor algorithm fix. · 80b5d569
  Geoff Simmons authored Jun 13, 2020
  
  80b5d569
- PH min stats are 0 if there were no collision buckets. · 098d67f8
  Geoff Simmons authored Jun 13, 2020
  
  098d67f8
- Get rid of an unnecessary assertion. · d7367746
  Geoff Simmons authored Jun 13, 2020
  
  d7367746
05 Jun, 2020 1 commit
- Add more stats. · af1c0a55
  Geoff Simmons authored Jun 05, 2020
  
  af1c0a55
04 Jun, 2020 2 commits
- Add some more stats. · 89595fc0
  Geoff Simmons authored Jun 04, 2020
  
  89595fc0
- Remove some dead code. · 0a828126
  Geoff Simmons authored Jun 04, 2020
  
  0a828126
03 Jun, 2020 2 commits
- Add a comment about possible PH parameters for time/space tradeoffs. · 146307d0
  Geoff Simmons authored Jun 03, 2020
```
Benchmarks of this idea can lead to improvements, by about 10 ns
per match operation, which makes a difference in throughput. But
we keep things simple for now.
```
  146307d0
- The benchmark for PH prints expected collisions and collision rate. · 9c4a4cde
  Geoff Simmons authored Jun 03, 2020
  
  9c4a4cde
02 Jun, 2020 4 commits
- Ditch the use of Mersenne primes. · ff16f72c
  Geoff Simmons authored Jun 02, 2020
```
The math was wrong, and changing the hash function to correctly
compute mod Mersenne prime just made it slower, and didn't seem
to lower collision rates.

Hash table sizes are just the next higher power of 2. As I interpret
Thorup (2020), this is still strongly universal hashing.
```
  ff16f72c
- Add PH_Stats(). · 160b48ea
  Geoff Simmons authored Jun 02, 2020
  
  160b48ea
- Add a test for the RNG used for perfect hashing. · 9e2b1795
  Geoff Simmons authored Jun 02, 2020
  
  9e2b1795
- Fix make distcheck. · 947d98b0
  Geoff Simmons authored Jun 02, 2020
  
  947d98b0
01 Jun, 2020 2 commits

malloc the temp array used for sorting in .compile(). · 599fb3ac
Geoff Simmons authored Jun 01, 2020
```
For large sets, workspace could be too small.
```
599fb3ac

Implement perfect hashing based on universal hashing. · 278d6968

Geoff Simmons authored Jun 01, 2020

Universal hashing has a sounder theoretical basis; in particular, it
doesn't have the dubious minimum hash table size below which a
perfect hash may not be possible, and which was set by trial and error.

For nearly all test data, universal hashing performs at least as
well or better. Especially better for sets with longer strings,
since the subject string is cast as an array of uint32_t, so the
hash is computed in fewer operations.

The only exception I've noticed is /usr/share/dict/words, which now
appears to have more collisions than under the previous approach.
But it appears likely that this only becomes an issue for sets that
are much larger than are probable for VCL use cases (in the 100,000
range), and if all of the sets' elements are tested for matches
about equally often (whereas real-world usage patterns tend to
match a subset much more frequently).

278d6968

31 May, 2020 2 commits

QP_Insert() requires that strings are added in sorted order. · 80a38908

Geoff Simmons authored May 31, 2020

The VMOD does this during .compile(), and QP_Insert() is no longer
called during .add(). The .compile() call is now required in all
cases, and it must be called before .create_stats().

This is because QP_Insert() was not correctly rotating the trie
when a set has overlapping prefixes, and a shorter prefix was
added before the longer one. With sorted order, shorter prefixes
are always added first, so rotation is unnecessary.

80a38908

Revert "Prune the QP prefix sub-branch search more efficiently." · f17e166f
Geoff Simmons authored May 31, 2020
```
This reverts commit afded0f5.
```
f17e166f