Start documenting the benchmarks.

da41e697 · Geoff Simmons · 3c676ffb · da41e697
Commit da41e697 authored Sep 19, 2020 by Geoff Simmons
Hide whitespace changes
Inline Side-by-side

Showing with 130 additions and 0 deletions

README.md src/tests/bench/README.md +130 -0

No files found.
--- a/src/tests/bench/README.md
+++ b/src/tests/bench/README.md
+# Benchmarking the search implementations
+
+This directory contains utilities for benchmarking the PH (perfect
+hash) and QP (quadbit patricia trie) implementations, used by the VMOD
+for the `.match()` and `.hasprefix()` methods, respectively. They are
+meant to aid testing the search algorithms and implementations, and do
+not measure any overhead added by the VMOD or Varnish.
+
+The directory also contains files with test data and inputs, some of
+which are meant to simulate common use cases for the VMOD.
+
+## `bench_ph` -- benchmark perfect hashing
+
+`bench_ph` reads a set of strings from a file or stdin, runs exact
+matches against the strings, and reports statistics about the match
+operation:
+
+```
+bench_ph [-hs] [-c csvfile] [-d dumpfile] [-i inputfile] [-n iterations] [file]
+```
+
+`bench_ph` reads the string set from `file`, or from stdin if no file
+is specified. Each line of input forms a string in the set, excluding
+the terminating newline.
+
+The string set MAY NOT include the same string more than once, but
+`bench_ph` does not check for duplicates, and will likely not
+terminate if duplicates are present. (The VMOD runs `QP_Insert()`,
+which rejects duplicate strings, before building the perfect hash.)
+
+If `-i inputfile` is specified, then test inputs -- strings to be
+matched against the set -- are read from `inputfile`, one string per
+line (newlines excluded). If there is no `inputfile` then the strings
+from the set are also used as test inputs, in which case every lookup
+is a successful match. Lookup misses can only be tested by using an
+input file.
+
+`-n iterations` specifies the number of times each test input string
+is matched against the set, default 1000. If `-n 0` is specified, then
+`bench_ph` builds the set and reports statistics about it, and may
+generate a dump file if requested, but does not run any matches.
+
+If `-s` is specified, the test inputs are shuffled before each
+iteration.  This may reveal effects of locality of reference and
+branch prediction on the performance of matches.
+
+Note that the benchmarks may implement an unnatural usage pattern --
+every test input is matched against the set exactly the same number of
+times. Real-world usages commonly match some strings more frequently
+than others, which may be beneficial for locality and branch
+prediction.  Performance differences with and without `-s` show that
+there can be an impact -- generally, mean match times tend to be
+longer for large string sets and/or for sets with long strings. But it
+may be necessary to craft input data to simulate real usage patterns,
+for example by including some strings more frequently in the input.
+
+If `-d dumpfile` is specified, then the text dump of the perfect hash
+structure produced by `PH_Dump()` is written to `dumpfile`.
+
+If `-c csvfile` is specified, then `csvfile` is written with data
+about each match operation in the benchmark. For consistency, the
+format is the same used for `bench_qp` (see below). The CSV file has a
+header line with the column names, with the following columns:
+
+- `type`: always `match`
+
+- `matches`: 1 for a match, 0 for a miss
+
+- `exact`: 1 for a match, 0 for a miss
+
+- `t`: time for the match operation in nanoseconds
+
+If `-h` is specified, `bench_ph` prints a usage message and exits.
+
+A benchmark proceeds as follows, with timings obtained by calling
+`clock_gettime(2)` just before and just after each operation to be
+measured, using the monotonic clock:
+
+* The string set is read from `file` or stdin, and test inputs are
+  read if there is an `inputfile`.
+
+* The perfect hash is generated by calling `PH_Init()` (with seeds
+  from `/dev/urandom`) and `PH_Generate()`. The total time for
+  `PH_Generate()` is reported, as well as the mean time per string in
+  the set.
+
+* If a `dumpfile` was specified, call `PH_Dump()` and write its
+  contents to the file.
+
+* Statistics obtained from `PH_Stats()` are printed, as well as the
+  time to run `PH_Stats()`.
+
+* Exit if `-n` is set to 0.
+
+* Run the benchmark with the specified number of iterations (default
+  1000).
+
+* Report results:
+
+  * The number of match operations executed.
+
+  * The number of matches and misses.
+
+  * The cumulative time for all match operations, and the mean time
+    per operation.
+
+  * Throughput, as the number of operations per second.
+
+* Report stats from `getrusage(2)` for the complete run of `bench_ph`:
+
+  * user and system time
+
+  * numbers of voluntary and involuntary context switches as `vcsw`
+    and `ivcsw`
+
+Since the match operation is CPU and memory bandwidth intensive, mean
+times may be increased if `ivcsw` is high.
+
+Examples:
+
+```
+# Benchmark the set in set.txt, using inputs from inputs.txt, with 100
+# iterations, shuffling inputs on each iteration, and recording
+# results in set.csv.
+$ ./bench_ph -s -i inputs.txt -n 100 -c set.csv set.txt
+
+# Form a set from 5000 strings chosen randomly from the words list,
+# using the set as its own test inputs.
+$ shuf -n 5000 /usr/share/dict/words | ./bench_ph
+```