- 16 Oct, 2020 20 commits
-
-
Geoff Simmons authored
For large sets, workspace could be too small.
-
Geoff Simmons authored
Universal hashing has a sounder theoretical basis; in particular, it doesn't have the dubious minimum hash table size below which a perfect hash may not be possible, and which was set by trial and error. For nearly all test data, universal hashing performs at least as well or better. Especially better for sets with longer strings, since the subject string is cast as an array of uint32_t, so the hash is computed in fewer operations. The only exception I've noticed is /usr/share/dict/words, which now appears to have more collisions than under the previous approach. But it appears likely that this only becomes an issue for sets that are much larger than are probable for VCL use cases (in the 100,000 range), and if all of the sets' elements are tested for matches about equally often (whereas real-world usage patterns tend to match a subset much more frequently).
-
Geoff Simmons authored
The VMOD does this during .compile(), and QP_Insert() is no longer called during .add(). The .compile() call is now required in all cases, and it must be called before .create_stats(). This is because QP_Insert() was not correctly rotating the trie when a set has overlapping prefixes, and a shorter prefix was added before the longer one. With sorted order, shorter prefixes are always added first, so rotation is unnecessary.
-
Geoff Simmons authored
This reverts commit afded0f5.
-
Geoff Simmons authored
-
Geoff Simmons authored
The new algorithm improves efficiency with iteration in place of recursion, and in a number of other ways: - Avoid searches into dead-end branches. The traversal of all branches was done because of the overlapping prefix case -- "foo" and "foobar" both in the set. Now we just search the tree for a match, but before descending into the next branch, check if there are other branches at which the current prefix matches a terminating node. - Only do string comparisons when we hit a terminating node. - Mark terminating nodes with a flag in the tree, so that we don't go looking for the null byte in the strings table during the search. While we're here, rename the flag for the nibble search as hinib -- non-zero if and only if we inspect the most significant nibble at that node. Also remove some dead code from QP_Insert().
-
Geoff Simmons authored
-
Geoff Simmons authored
-
Geoff Simmons authored
-
Geoff Simmons authored
May be advantageous for loop/branch prediction.
-
Geoff Simmons authored
Theoretically, this reduces the probability of collisions. Benchmarks don't show much of a difference.
-
Geoff Simmons authored
This adds the .compile() method to set objects, required for the use of .match(). Docs for the .compile() method are currently incomplete.
-
Geoff Simmons authored
strlen() is also cheap if it has a SIMD implementation, so we can afford this optimization to reject some strings immediately.
-
Geoff Simmons authored
-
Geoff Simmons authored
-
Geoff Simmons authored
-
Geoff Simmons authored
Cannot be used for prefix matches.
-
Geoff Simmons authored
-
Geoff Simmons authored
-
Geoff Simmons authored
For "quadbit patricia tries", inspired by the work of Tony Finch: https://dotat.at/prog/qp/README.html Radix 16 tries, examining a nibble at a time, to make the tries smaller and reduce pointer chasing.
-
- 01 Sep, 2020 3 commits
-
-
Geoff Simmons authored
-
Geoff Simmons authored
-
Geoff Simmons authored
-
- 03 Mar, 2020 5 commits
-
-
Geoff Simmons authored
-
Geoff Simmons authored
-
Geoff Simmons authored
-
Geoff Simmons authored
-
Geoff Simmons authored
Only call strcmp() once, when a node is reached that must be either a hit or a miss.
-
- 28 Feb, 2020 1 commit
-
-
Geoff Simmons authored
-
- 27 Feb, 2020 1 commit
-
-
Geoff Simmons authored
The search may have matched a string that is actually a prefix of the subject string, if a longer string with the same prefix is also in the set. This "happens" to give correct results for match(), but which() would return the wrong value. The fix uses strcmp() instead of memcmp(), but that is also vectorized, where the C library uses vector instructions.
-
- 26 Feb, 2020 4 commits
-
-
Geoff Simmons authored
-
Geoff Simmons authored
-
Geoff Simmons authored
-
Geoff Simmons authored
Vector extensions are common hardware now, as are C libraries that use vector instructions to implement functions like memcmp(). So we hand off compares to the lib to get the advantage. For the same reason, we can afford to call strlen() on the subject string to locate the terminating null, rather than scan for it. Also, the match function descends through the trie to find a potential match, and does the comparison only then, as is common for trie/critbit/patricia implementations.
-
- 10 Dec, 2019 1 commit
-
-
Nils Goroll authored
since varnish-cache ecef48518f3b3f4bbf28256e090bdbb5cd2b163c backends can be NULL (as defined with backend <name> None)
-
- 09 Dec, 2019 1 commit
-
-
Geoff Simmons authored
Fixes #1
-
- 31 Oct, 2019 1 commit
-
-
Geoff Simmons authored
configure checks if you have lcov & genhtml; these can be specified with --with-lcov and/or --with-genhtml. If they are available, then make coverage does the following: - make clean, then make check with CC=gcc and CFLAGS set so that inputs for gcov/lcov are generated. - lcov creates the src/coverage subdir and generates a targetfile there. - genhtml generates HTML reports in src/coverage.
-
- 02 Oct, 2019 2 commits
-
-
Geoff Simmons authored
-
Geoff Simmons authored
-
- 22 Aug, 2019 1 commit
-
-
Geoff Simmons authored
-