Add documentation.

ed328f11 · Geoff Simmons · de676c22 · ed328f11 · ed328f11
Commit ed328f11 authored Jul 04, 2017 by Geoff Simmons
Hide whitespace changes
Inline Side-by-side

Showing with 2677 additions and 23 deletions

README.rst README.rst +1339 -12

vmod_pcre2.vcc src/vmod_pcre2.vcc +1338 -11

No files found.
--- a/README.rst
+++ b/README.rst
@@ -26,13 +26,602 @@ import pcre2 [from "path"] ;

 ::

-  new OBJECT = ...
+  # object interface
+  new OBJECT = pcre2.regex(STRING pattern [, compile options])
+  BOOL <OBJ>.match(STRING subject [, match options])
+  STRING <OBJ>.backref(INT ref)
+  STRING <OBJ>.namedref(STRING name)
+  STRING <OBJ>.sub(STRING subject, STRING replacement [, match options]
+                   [, substitution options])
+  BOOL <OBJ>.info_bool(ENUM)
+  INT <OBJ>.info_int(ENUM)
+  STRING <OBJ>.info_str(ENUM)
+
+  # function interface
+  BOOL pcre2.match(STRING pattern, STRING subject [, compile options]
+                   [, match options])
+  STRING pcre2.backref(INT ref)
+  STRING pcre2.namedref(STRING name)
+  STRING pcre2.sub(STRING pattern, STRING subject, STRING replacement
+                   [, compile options] [, match options]
+                   [, substitution options])
+
+  # library configuration
+  BOOL pcre2.config_bool(ENUM)
+  INT pcre2.config_int(ENUM)
+  STRING pcre2.config_str(ENUM)

 DESCRIPTION
 ===========

 This Varnish Module (VMOD) provides access to the PCRE2 regular
-expresion library.
+expression library. PCRE2 is the Perl-compatible regular expression
+library with a revised API, the successor to the PCRE library that
+implements native regexen in Varnish VCL. See `pcre2(3)`_ and the
+manuals that it references for details about the PCRE2 library.
+
+PCRE2, by itself, does not change regular expressions from the
+perspective of the end user -- the syntax and semantics of patterns
+and pattern matching remained largely the same at the time PCRE2 was
+introduced. The new library is a refactoring of the internal API,
+which is transparent to the user, and the VMOD endeavors to make use
+of the new internal features advantageously for VCL.
+
+Some of the differences between the VMOD and native VCL regexen are:
+
+* The VMOD provides methods and functions to retrieve back references
+  after a match that are easier to use than the idiom with the
+  ``regsub`` function that is necessary in native VCL. It also
+  provides the means to retrieve references to named capturing groups.
+
+* The functional interface makes it possible to use patterns that are
+  not known until runtime.
+
+* PCRE2 introduces a new native substitution function, similar to the
+  ``regsub`` and ``regsuball`` functions in VCL, except that the
+  substitution syntax is different and provides more features.
+
+* Parameters that limit the depth of recursion and backtracking in
+  match operations, which are set globally in Varnish, can be set for
+  individual matches in the VMOD.
+
+* The VMOD can support matching against UTF-8 strings, if it is
+  running against a PCRE2 library that was built to support Unicode.
+
+* The VMOD exposes considerably more functionality of the underlying
+  library. VCL provides a general-purpose regular expression facility
+  -- PCRE could be easily replaced as its regex engine. The VMOD is
+  meant to be specific to PCRE2, and makes a full range of its
+  features available in VCL.
+
+* The VMOD provides methods and functions that allow you to inspect
+  properties of patterns and of the library. These are not likely to
+  be useful on the fast path of production deployments, and are not
+  optimized for that. But they may be useful during development to
+  debug and optimize regex matching.
+
+Since the introduction of PCRE2, the original PCRE library is being
+maintained for bugfixes, but development of new features and
+optimizations are only being done for PCRE2. So the VMOD will make it
+possible to take advantage of improvements in the library as they are
+released.
+
+Here are some simple usage examples::
+
+  # regex objects are created in vcl_init, and the regular expressions
+  # are compiled when VCL is loaded.
+  sub vcl_init {
+      # A regex to match the "foo" cookie, and capture its value.
+      new foo = pcre2.regex("\bfoo=([^;,\s]+\b)");
+
+      # A regex to match a URL beginning with the prefix "/bar/", and
+      # capture its suffix.
+      new bar = pcre2.regex("^/bar/(.+)");
+  }
+
+  sub vcl_recv {
+      # If the cookie header contains "foo", then assign its value
+      # to another header.
+      if (foo.match(req.http.Cookie)) {
+          set req.http.X-Foo-Value = foo.backref(1);
+      }
+
+      # If the URL begins with "/bar/", then replace the prefix with
+      # "/baz/quux/".
+      if (bar.match(req.url)) {
+          set req.url = "/baz/quux/" + bar.backref(1);
+      }
+  }
+
+Object and functional interfaces
+--------------------------------
+
+The VMOD provides regular expression operations by way of the
+``regex`` object interface and a functional interface. For ``regex``
+objects, the pattern is compiled at VCL initialization time, and the
+compiled pattern is re-used for each invocation of its
+methods. Compilation failures (due to errors in the pattern) cause
+failure at initialization time, and the VCL fails to load. The
+``.backref()`` and ``.namedref()`` methods refer back to the last
+invocation of the ``.match()`` method for the same object. The
+``.sub()`` method also re-uses an object's compiled pattern.
+
+The functional interface provides the same set of operations, but the
+pattern is compiled at runtime on each invocation of the ``match()``
+and ``sub()`` functions (and then discarded). Compilation failures are
+reported as errors in the Varnish log. The ``backref()`` and
+``namedref()`` functions refer back to the last invocation of the
+``match()`` function, for any pattern.
+
+Compiling a pattern at runtime on each invocation is considerably more
+costly than re-using a compiled pattern. So for patterns that are
+fixed and known at VCL initialization, the object interface should be
+used. The functional interface should only be used for patterns whose
+contents are not known until runtime.
+
+Compile, match and substitution options
+---------------------------------------
+
+The VMOD has unusually long lists of parameters for its methods and
+functions -- over 40 for the ``sub()`` function, for example. But
+nearly all of these have default values, and it is only necessary to
+specify options in VCL that differ from the defaults.
+
+The optional parameters affect the interpretations of patterns and the
+operation of matches and substitutions, and come in three groups:
+
+* *Compile* options, used wherever a pattern is compiled: in the
+  ``regex`` object constructor, and the ``match()`` and ``sub()``
+  functions.
+
+* *Match* options, used wherever a match is performed: in the
+  ``match`` and ``sub`` methods and functions.
+
+* *Substitution* options, used in the ``sub`` method and function.
+
+The options have call scope, meaning that they are evaluated only once
+for each invocation of a function or method at its particular location
+in the VCL source, on the first invocation after the VCL instance is
+loaded. The options are then cached and re-used for all subsequent
+invocations, and cannot be changed (until a new VCL instance is
+loaded).
+
+Compile options
+~~~~~~~~~~~~~~~
+
+Compile options define properties of patterns. See `pcre2pattern(3)`_
+for details of PCRE2 pattern syntax, and `pcre2syntax(3)`_ for a quick
+reference.
+
+The default value of all of the BOOL options is **false**.
+
+See also `JIT compilation and matching`_ below.
+
+``allow_empty_class``
+  If true, then a pattern may include ``[]`` to denote an empty
+  character class. This, in part, supports compatibility with regexen
+  in ECMAscript (also known as Javascript). By default, a closing
+  square bracket after an opening one is interpreted as a character in
+  the class (and ``]`` must appear later in the pattern).
+
+``alt_bsux``
+  (Referring to "backslash-u" and "backslash-x".) If true, then three
+  escape sequences are interpreted differently (for compatibility with
+  ECMAscript):
+
+  * ``\U`` matches an upper case ``U`` character. By default, ``\U``
+    causes a compile error.
+
+  * ``\u`` matches a lower case ``u``, unless it is followed by four
+    hexadecimal digits, in which case the hex number identifies the
+    code point to be matched. By default, ``\u`` causes a compile
+    error.
+
+  * ``\x`` matches a lower case ``x``, unless it is followed by four
+    hex digits, in which case it identifies the code point to match.
+    By default, ``\x`` must always be followed by zero to two hex
+    digits to identify a one-byte character (for example, ``\xz``
+    matches binary zero followed by ``z``).
+
+``alt_circumflex``
+  If true, and if ``multiline`` is also true, then the ``^``
+  meta-character matches after a newline appearing as the last
+  character in a string. By default, ``^`` does not match after
+  a terminating newline.
+
+``alt_verbnames``
+  If true, then backslash processing may be applied to verb names in
+  verb sequences such as ``(*MARK:NAME)``, so that the name can, for
+  example, include a closing parenthesis as ``\)`` or between ``\Q``
+  and ``\E``. By default, no processing is applied to verb names, and
+  they end at the first closing parenthesis (regardless of any
+  backslash).
+
+``anchored``
+  If true, then the pattern is anchored, meaning that it is
+  constrained to match at the starting point of a string. This may
+  also be achieved with constructs in the pattern.
+
+``bsr``
+  (For "backslash-R".) If this ENUM value is set, then it determines
+  which sequences are matched by ``\R``. If set to ``UNICODE``, then
+  ``\R`` matches any UTF-8 newline sequence. If set to ``ANYCRLF``,
+  then it matches CR (carriage return, or ``\r``), LF (linefeed, or
+  ``\n``), or CR followed by LF. By default, ``\R`` matches the
+  sequence chosen when the PCRE2 library was built, which can be
+  determined from ``config_str(BSR)`` (the default default is
+  Unicode). See `pcre2pattern(3)`_ for details about ``\R``.
+
+``caseless``
+  If true, then matches for this pattern are case-insensitive. This
+  may also be achieved with ``(?i)`` in the pattern.
+
+``dollar_endonly``
+  If true, then the ``$`` metacharacter matches only at the end of a
+  string. By default, ``$`` also matches before newlines within the
+  string (but not before newlines that come immediately after a
+  newline). ``dollar_endonly`` is ignored when ``multiline`` is true.
+
+``dotall``
+  If true, then the ``.`` metacharacter matches any character,
+  including newlines. But it only ever matches one character, even if
+  newlines are coded as CRLF. By default, dots do not match
+  newlines. The effect of ``dotall`` can also be achieved with
+  ``(?s)`` in the pattern.
+
+``dupnames``
+  If true, then the names used for named capturing groups are not
+  required to be unique. By default, names for capturing groups may
+  only be used once.
+
+``extended``
+  If true, then pattern syntax is permitted to contain constructs that
+  serve as self-documentation:
+
+  * Most whitespace is ignored, except when escaped or inside a
+    character class (and a few other exceptions detailed in
+    `pcre2api(3)`_).
+
+  * All characters between an unescaped ``#`` and the next newline are
+    ignored, and can be used as comments.
+
+    For example, this is a self-documenting declaration of a pattern
+    that matches IPv6 addresses::
+
+      new ipv6 = pcre2.regex(extended=true, caseless=true, pattern=
+      {"^(?!:)                 # colon disallowed at start
+        (?:                    # start of item
+          (?: [0-9a-f]{1,4} |  # 1-4 hex digits or
+          (?(1)0 | () ) )      # fail if null previously matched
+          :                    # followed by colon
+        ){1,7}                 # end item; 1-7 of them required
+        [0-9a-f]{1,4} $        # final hex number at end of string
+        (?(1)|.)               # there was an empty component
+      "});
+
+  The effect of ``extended`` can also be achieved with the ``(?x)``
+  option in a pattern.
+
+``firstline``
+  If true, an unanchored pattern must match before or at the first
+  newline in the subject string (though the matched text may continue
+  over a newline). If the ``offset_limit`` option is also set for a
+  match, then the match must occur within the offset limit and in the
+  first line.
+
+``locale``
+  If ``locale`` is set to a string matching a locale that is available
+  on the system on which Varnish is running, then that locale is used
+  for the pattern to determine which characters are letters, digits,
+  upper and lower case, and so forth. Hence this option affects the
+  interpretation of constructs such as ``\w`` and ``\d``, the
+  ``caseless`` option, and so on. This only applies to single-byte
+  characters.
+
+  If ``locale`` is set to a string that is not recognized as a locale,
+  then compilation fails.
+
+  By default, PCRE2 uses tables established when the library is built
+  to recognize character properties; normally, these only recognize
+  ASCII characters.
+
+  Quoting `pcre2api(3)`_:
+
+    The use of locales with Unicode is discouraged.  If you are
+    handling characters with code points greater than 128, you should
+    either use Unicode support, or use locales, but not try to mix the
+    two.
+
+``match_unset_backref``
+  If true, then a back reference to an unset capturing group matches
+  an empty string; thus ``(\1)(a)`` successfully matches ``a``. This
+  makes the pattern similar to an ECMAscript pattern. By default, an
+  unset backref causes the matcher to backtrack, and possibly fail.
+
+``max_pattern_len``
+  If this INT value is greater than 0, then it sets a maximum length
+  for the pattern string to be compiled. If the pattern is longer, then
+  compilation fails.
+
+``multiline``
+  If true, then the ``^`` and ``$`` meta-characters match immediately
+  after and before internal newlines in the subject string, respectively,
+  in addition to matching at the start and end of the string. By default,
+  the start and end anchors only match at the beginning and end of the
+  string, regardless of internal newlines. The effect of ``multiline``
+  can also be achieved with ``(?m)`` in the pattern.
+
+``never_backslash_c``
+  If true, then ``\C`` may not be used in a pattern, and causes
+  compile failure. ``\C`` always matches exactly one byte, even in UTF
+  mode, and may lead to unpredictable effects if it matches in the
+  middle of a multibyte UTF-8 character. ``\C`` may have been
+  prohibited by a build-time option in the library, which can be
+  discovered by calling ``config_bool(NEVER_BACKSLASH_C)``.
+
+``never_ucp``
+  If true, then Unicode properties are not used to interpret ``\B``,
+  ``\b``, ``\D``, ``\d``, ``\S``, ``\s``, ``\W``, ``\w``, and some of
+  the POSIX character classes in the pattern. It is then impossible to
+  activate this facility by including ``(*UCP)`` at the start of the
+  pattern. If ``never_ucp`` and ``ucp`` are both set to true, then
+  the compile fails.
+
+``newline``
+  If this ENUM value is set, it determines which characters are to be
+  matched as newlines in the pattern. It can be set to:
+
+  * ``CR`` (carriage return)
+  * ``LF`` (linefeed)
+  * ``CRLF`` (CR followed by LF)
+  * ``ANYCRLF`` (CR, LF or CRLF)
+  * ``UNICODE`` (any Unicode line-ending sequences)
+
+  By default, the newline sequence chosen for the PCRE2 library when
+  it was built is used, which can be determined from
+  ``config_str(NEWLINE)``.
+
+``no_auto_capture``
+  If true, then numbered capturing groups are disabled in the pattern.
+  Any opening parenthesis not followed by ``?`` is then interpreted as
+  if it were followed by ``?:`` (that is, it forms a non-capturing
+  group).  Named capturing groups can still be used, and these also
+  acquire a capturing group number, so ``namedref`` and ``backref``
+  can still be used (but only for the named groups).
+
+``no_auto_possess``
+  If true, then the "auto-possessification" optimization is disabled
+  for the pattern, which for example interprets ``a+b`` as ``a++b``,
+  using the "possessive quantifier", to prevent backtracks into ``a+``
+  that can never be successful. If the option is true, then the full
+  unoptimized search is run.
+
+``no_start_optimize``
+  If true, then some optimizations for the start of the match are
+  disabled. This has the effect that certain constructs in the
+  pattern, such as ``(*COMMIT)`` or ``(*MARK)``, are evaluated at
+  every possible starting position in the string, while they may have
+  been skipped when the optimizations are applied. Thus this option
+  may change the result of ``match`` calls in patterns that include
+  such constructs. See `pcre2api(3)`_ for details.
+
+``no_utf_check``
+  If this option and ``utf`` are both true, then validity checks to
+  determine if the pattern is a valid UTF string are disabled. This
+  may save CPU usage and time for the ``match()`` and ``sub()``
+  functions, which compile patterns on every invocation, and check UTF
+  strings for validity by default. But you should only do so if you
+  are sure that the inputs are valid, because running matches in UTF
+  mode against invalid strings is undefined, and may cause Varnish to
+  crash or loop.  By default, invalid UTF strings in the pattern cause
+  the compile to fail in UTF mode. See `pcre2unicode(3)`_ for details.
+
+``parens_nest_limit``
+  If this INT value is greater than 0, it sets the maximum depth of
+  parenthesis nesting in a pattern. It applies to all kinds of
+  parentheses, not just captruing groups. The limit prevents patterns
+  from using too much of the stack when compiled, and may be useful
+  for the functional interface, for which patterns are compiled at
+  runtime. By default, the nesting limit set for the PCRE2 library at
+  build time is imposed, which is returned by
+  ``config_int(PARENSLIMIT)``.
+
+``ucp``
+  If this option and ``utf`` are both true, then Unicode properties
+  are used to interpret ``\B``, ``\b``, ``\D``, ``\d``, ``\S``,
+  ``\s``, ``\W``, ``\w``, and some of the POSIX character classes in
+  the pattern. The same effect can be achieved by including ``(*UCP)``
+  at the start of the pattern. By default, only ASCII characters are
+  considered for these constructs, which is faster than considering
+  Unicode properties. If Unicode was disabled at build time for the
+  PCRE2 library, which can be discovered by calling
+  ``config_bool(UNICODE)``, then the compile fails when this option is
+  true. Compiles also fail if this option and ``never_ucp`` are both
+  true. See `pcre2unicode(3)`_ for details about Unicode character
+  properties.
+
+``ungreedy``
+  If true, then the "greediness" of quantifiers in the pattern is
+  inverted, so that they are not greedy by default, but become
+  greedy when followed by ``?``. The same effect can be achieved
+  by including ``(?U)`` in the pattern.
+
+``use_offset_limit``
+  This option must be set to true for a pattern if you intend to use
+  the ``offset_limit`` parameter in match and substitution operations
+  to limit how far a string is searched for an unanchored match. If an
+  ``offset_limit`` is set for an invocation of the ``match`` or
+  ``sub`` methods or functions, but this option was not set to true
+  for the pattern, then then the match fails.
+
+``utf``
+  If true, then both the pattern and the strings against which it is
+  matched are processed as UTF-8 strings. If Unicode support was
+  disabled when the PCRE2 library was built, which can be determined
+  from ``config_bool(UNICODE)``, then the compile fails when ``utf``
+  is true. See `pcre2unicode(3)`_ for details about Unicode support in
+  PCRE2.
+
+Match options
+~~~~~~~~~~~~~
+
+Match options affect the operation of matching in the ``match`` and
+``sub`` methods and functions. By default, all of the BOOL options
+are **false**. The INT options are 0 by default (meaning that they
+are ignored, and the global defaults hold). The INT options MAY NOT
+be less than 0; if they are, then the match fails.
+
+``anchored``
+  If true, then the match is constrained to match at the start of the
+  string, regardless of whether the pattern is anchored. By default, a
+  match is searched for anywhere in the string if the pattern is not
+  anchored.
+
+``len``
+  If this INT value is greater than 0, it sets the length of the
+  subject string to be matched. By default, the full string is matched.
+
+``match_limit``
+  If this INT value is greater than 0, it sets a limit to the effort
+  used by the PCRE2 matching function to find a match. This can
+  prevent matches from excessive backtracking, if there is a very
+  large search space but a match is never found. It is equivalent to
+  the varnishd parameter ``pcre_match_limit``, except that it applies
+  only to the match operation in which it was set, not globally. The
+  varnishd parameters for PCRE have no effect on this VMOD. By
+  default, the match limit is imposed that was set for the PCRE2
+  library at build time, which can discovered from
+  ``config_int(MATCHLIMIT)``.
+
+``not_bol``
+  If true, the first character of the subject is string is not
+  considered to be the beginning of a line, so the ``^`` metacharacter
+  does not match before it. If the compile option ``multiline`` was
+  not set to true for the pattern, then ``^`` never matches. This
+  option only affects the circumflex metacharacter.
+
+``not_eol``
+  If true, the end of the subject string is not considered to be the
+  end of a line, so the ``$`` metacharacter does not match after it.
+  If ``multiline`` was not set to true for the pattern, then ``$``
+  never matches. This option only affects the dollar metacharacter.
+
+``not_empty``
+  If true, then the empty string is not a valid match. If the matcher
+  finds an empty match, then it considers other alternatives, and if
+  no other valid matches are found, then the match fails.
+
+``not_empty_atstart``
+  If true, then the empty string is not a valid match at the start of
+  the subject string. An empty string match later in the subject is
+  permitted.
+
+``no_jit``
+  If true, then the just-in-time matcher is not used, even when the
+  pattern was compiled for JIT. In that case, PCRE2's "traditional"
+  interpretive matcher is used (as is always the case if JIT is not
+  available, or if the pattern was not JIT-compiled). If ``not_jit``
+  is true for an invocation of the ``match()`` or ``sub()`` functions,
+  which compile a pattern on every call, then the pattern is also not
+  JIT-compiled. See `JIT compilation and matching`_ below.
+
+``no_utf_check``
+  If true, then the subject is not checked for validity as a UTF-8
+  string when matched against a pattern for which ``utf`` was set to
+  true. This may speed up matching, but should only be done if you
+  are sure that the inputs are valid UTF-8. By default, UTF validity
+  is checked for matches against patterns that were compiled with
+  ``utf``.
+
+``offset_limit``
+  If this INT value is greater than 0, it limits how far an unanchored
+  search can advance in the subject string. For example, if the
+  pattern ``abc`` is matched against the string ``"123abc"`` and the
+  offset limit is less than 3, the match fails. To use this parameter,
+  the compile option ``use_offset_limit`` must have been set to true
+  for the pattern at compile time; otherwise the match fails. By
+  default, unanchored matches are searched for until the end of the
+  string.
+
+``recursion_limit``
+  If this INT value is greater than 0, then it limits the depth of
+  recursion for matches using the interpretive matcher. It is
+  equivalent to the varnishd parameter ``pcre_match_limit_recursion``,
+  but only applies to the individual match. This limits the depth of
+  recursion and use of the stack for matches that may cause excessive
+  recursion and stack overflow (which usually causes Varnish to
+  crash). The limit is not relevant to the JIT matcher, and is ignored
+  for JIT matching. By default, the recursion limit set for the PCRE2
+  library at build time applies, which can be determined from
+  ``config_int(RECURSIONLIMIT)``.
+
+Substitution options
+~~~~~~~~~~~~~~~~~~~~
+
+The ``sub`` method and function use all of the match options (since
+they run a match), and the following additional options. (The ``sub``
+function also uses the compile options, since it compiles a pattern.)
+
+``suball``
+  If true, then the substitution iterates over the subject string and
+  replaces every matching substring, making the substitution similar
+  to the native VCL ``regsuball`` function. By default, only the first
+  matching substring is replaced, making the substitution similar to
+  VCL's ``regsub`` function.
+
+``sub_extended``
+  If true, then an extended syntax is enabled for the replacement
+  string. Details of the replacement syntax are documented for the
+  ``.sub()`` method below.
+
+``unknown_unset``
+  If true, then references to capturing groups in the replacement
+  string that do not appear in the pattern are treated as unset
+  groups.  By default, unknown references cause the substitution to
+  fail. Use this option with care, because it causes misspelled group
+  names or numbers to be silently ignored.
+
+``unset_empty``
+  If true, then unset capturing groups (including unknown groups when
+  ``unknown_unset`` is also true) are replaced as empty strings. By
+  default, an attempt to insert an unset group causes the substitution
+  to fail.
+
+JIT compilation and matching
+----------------------------
+
+PCRE2 supports just-in-time compilation for patterns, and a matcher to
+go with it. JIT is a heavyweight optimization that may greatly speed
+up matching, but requires extra processing at pattern compilation
+time.  The VMOD supports JIT if it was enabled for the PCRE2 library
+when it was built, which can be determined from ``config_bool(JIT)``.
+
+If JIT is available, then it is always applied to the compilation of
+patterns in the ``regex`` object constructor. By default it is also
+applied when patterns are compiled at runtime in the ``match`` and
+``sub`` methods and functions, unless the ``no_jit`` option is true.
+For patterns compiled at runtime, it may be worth it to turn off JIT,
+if the overhead for JIT-compiles outweighs the advantage of JIT
+matching.
+
+If JIT is not available, then PCRE2 always uses the interpretive
+matcher.
+
+Unicode
+-------
+
+The VMOD only links to the 8-bit version of PCRE2, and hence can
+support UTF-8 if Unicode was enabled when the library was built. The
+VMOD does not support UTF-16 or UTF-32. Thus the term "code unit", as
+used for Unicode and in the PCRE2 documentation, always refers to one
+byte.
+
+In UTF mode, characters in patterns and the strings to be matched are
+interpreted as UTF-8 code points, and hence may correspond to one to
+four bytes. When UTF is not enabled, characters in patterns and
+strings are represented by exactly one byte.
+
+See `pcre2unicode(3)`_ for the details of PCRE2 Unicode support.

 CONTENTS
 ========
@@ -56,8 +645,27 @@ regex

 	new OBJ = regex(STRING pattern, BOOL allow_empty_class=0, BOOL anchored=0, ENUM {ANYCRLF,UNICODE} bsr=0, BOOL alt_bsux=0, BOOL alt_circumflex=0, BOOL alt_verbnames=0, BOOL caseless=0, BOOL dollar_endonly=0, BOOL dotall=0, BOOL dupnames=0, BOOL extended=0, BOOL firstline=0, STRING locale=0, BOOL match_unset_backref=0, INT max_pattern_len=0, BOOL multiline=0, BOOL never_backslash_c=0, BOOL never_ucp=0, BOOL never_utf=0, ENUM {CR,LF,CRLF,ANYCRLF,ANY} newline=0, BOOL no_auto_capture=0, BOOL no_auto_possess=0, BOOL no_dotstar_anchor=0, BOOL no_start_optimize=0, BOOL no_utf_check=0, INT parens_nest_limit=0, BOOL ucp=0, BOOL ungreedy=0, BOOL use_offset_limit=0, BOOL utf=0)

-# XXX options for dfa_match, jit fast path, start_offset
-# XXX option to make saving the match ctx with PRIV_CALL optional
+Create a ``regex`` object from ``pattern`` according to the given
+compile options (or option defaults). If the pattern is invalid, then
+the VCL will fail to load, and the VCC compiler will emit an error
+message.
+
+Examples::
+
+  sub vcl_init {
+
+      # Match this pattern against the Host header (hence
+      # case-insensitively), and capture part of the domain name.
+      new domain = pcre2.regex("^www\.([^.]+)\.com$", caseless=true);
+
+      # Match a max-age tag and capture the number.
+      new maxage = pcre2.regex("max-age\s*=\s*(\d+)");
+
+      # Group possible subdomains without capturing
+      new submatcher = pcre2.regex("^www\.(domain1|domain2)\.com$",
+	                           never_capture=true, caseless=true);
+  }
+
 .. _func_regex.match:

 regex.match
@@ -67,6 +675,26 @@ regex.match

 	BOOL regex.match(PRIV_CALL, PRIV_TASK, STRING subject, INT len=0, BOOL anchored=0, INT match_limit=0, INT offset_limit=0, BOOL notbol=0, BOOL noteol=0, BOOL notempty=0, BOOL notempty_atstart=0, BOOL no_jit=0, BOOL no_utf_check=0, INT recursion_limit=0)

+Return ``true`` if the compiled regex matches the ``subject`` string,
+as constrained by the given match options or option defaults.
+
+The match may fail if any of the options are illegal for one of the
+reasons given above, or if a limit such as the match or recursion
+limit is reached. In that case, and error message is written to the
+Varnish log using the ``VCL_Error`` tag, and the method returns
+``false``.
+
+If ``subject`` is undefined, for example if it is set from an unset
+header variable, then it is assumed to be the empty string. This
+follows VCL's handling of regex matching when the string to be matched
+is unset.
+
+Example::
+
+  if (domain.match(req.http.Host)) {
+     call do_on_match;
+  }
+
 .. _func_regex.backref:

 regex.backref
@@ -76,6 +704,64 @@ regex.backref

 	STRING regex.backref(INT ref, STRING fallback="**BACKREF METHOD FAILED**")

+Returns the `nth` captured subexpression from the most recent
+successful call of the ``.match()`` method for this object in the same
+client or backend context, or a fallback string in case the capture
+fails. Backref 0 indicates the entire matched string. Thus this
+function behaves like the ``\n`` in the native VCL functions
+``regsub`` and ``regsuball``, and the ``$1``, ``$2`` ... variables in
+Perl. Unlike the regsubs, which limit the backref number to 0 through
+9, ``backref`` permits any number that identifies a capturing group in
+the pattern.
+
+Since Varnish client and backend operations run in different threads,
+``.backref()`` can only refer back to a ``.match()`` call in the same
+thread. Thus a ``.backref()`` call in any of the ``vcl_backend_*``
+subroutines -- the backend context -- refers back to a previous
+``.match()`` in any of those same subroutines; and a call in any of
+the other VCL subroutines -- the client context -- refers back to a
+``.match()`` in the same client context.
+
+After unsuccessful matches, the ``fallback`` string is returned for
+any call to ``.backref()``. The default value of ``fallback`` is
+``"**BACKREF METHOD FAILED**"``. ``.backref()`` always fails after a
+failed match, even if ``.match()`` had been called successfully before
+the failure.
+
+``.backref()`` may also return ``fallback`` after a successful match,
+if no captured group in the matching string corresponds to the backref
+number. For example, when the pattern ``(a|(b))c`` matches the string
+``ac``, there is no backref 2, since nothing matches ``b`` in the
+string.
+
+The VCL infix operators ``~`` and ``!~`` do not affect this method,
+nor do the functions ``regsub`` or ``regsuball``. Nor is it affected
+by the matches performed by any other method or function in this VMOD,
+(the ``match()`` function or the ``sub`` method or function).
+
+``.backref()`` fails, returning ``fallback`` and writing an error
+message to the Varnish log with the ``VCL_Error`` tag, under the
+following conditions (even if a previous match was successful and a
+substring could have been captured):
+
+* Any of the match options are illegal (for example, if one of the
+  numeric limits was set to less than 0).
+
+* The ``fallback`` string is undefined.
+
+* ``ref`` (the backref number) is out of range -- if it is less than 0
+  or larger than the highest number for a capturing group in the
+  pattern.
+
+* ``.match()`` was never called for this object in the task scope
+  prior to calling ``.backref()``.
+
+Example::
+
+  if (domain.match(req.http.Host)) {
+     set req.http.X-Domain = domain.backref(1);
+  }
+
 .. _func_regex.namedref:

 regex.namedref
@@ -85,6 +771,49 @@ regex.namedref

 	STRING regex.namedref(STRING name, STRING fallback="**NAMEDREF METHOD FAILED**")

+Returns the captured subexpression designated by ``name`` from the
+most recent successful call to ``.match()`` in the current context
+(client or backend), or ``fallback`` in case of failure. See
+`pcre2pattern(3)`_ for details about the use of named subpatterns in
+PCRE2 regexen.
+
+Note that a named capturing group can also be referenced as a numbered
+group -- the named groups are numbered exactly as if the names were
+not present. So an expression returned by ``.namedref()`` will also be
+returned by ``.backref()`` with the appropriate number.
+
+``fallback`` is returned when ``.namedref()`` is called after an
+unsuccessful match. The default fallback is ``"**NAMEDREF METHOD
+FAILED**"``.
+
+Like ``.backref()``, ``.namedref()`` is not affected by native VCL
+regex operations, nor by any other matches performed by methods or
+functions of the VMOD, except for a prior ``.match()`` for the same
+object.
+
+``.namedref()`` fails, returning ``fallback`` and logging a
+``VCL_Error`` message, if:
+
+* The ``fallback`` string is undefined.
+
+* ``name`` is undefined.
+
+* There is no such named group.
+
+* ``.match()`` was not called for this object.
+
+Example::
+
+  sub vcl_init {
+  	new domain = pcre2.regex("^www\.(?<domain>[^.]+)\.com$");
+  }
+  
+  sub vcl_recv {
+  	if (domain.match(req.http.Host)) {
+  	   set req.http.X-Domain = domain.namedref("domain");
+	}
+  }
+
 .. _func_regex.sub:

 regex.sub
@@ -94,6 +823,113 @@ regex.sub

 	STRING regex.sub(PRIV_CALL, PRIV_TASK, STRING subject, STRING replacement, INT len=0, BOOL anchored=0, INT match_limit=0, INT offset_limit=0, BOOL notbol=0, BOOL noteol=0, BOOL notempty=0, BOOL notempty_atstart=0, BOOL no_jit=0, BOOL no_utf_check=0, INT recursion_limit=0, BOOL suball=0, BOOL sub_extended=0, BOOL unknown_unset=0, BOOL unset_empty=0)

+If the pattern represented by this object matches ``subject``, then
+return a string formed by replacing the part that was matched by
+``replacement``.  If the pattern does not match, then return the
+``subject`` string unchanged. The match and substitution options affect
+these operations as described above.
+
+This method is similar to the native VCL ``regsub`` function, or
+``regsuball`` when the ``suball`` option is true, but the syntax of
+the replacement string is different. In the replacement string, these
+sequences can be used to insert strings:
+
+``$$``
+  Inserts a dollar character.
+
+``$<n>`` of ``${<n>}``
+  Inserts the contents of group ``<n>`` captured during the match,
+  where ``<n>`` can be a number or a name. The number can be 0 to
+  include the entire matched string. Braces are only required if the
+  following character would be interpreted as part of the number or
+  name.
+
+``$*MARK`` or ``${*MARK}``
+  Insert the name of the last ``(*MARK)`` encountered in the match.
+
+For example, to rewrite URLs with prefixes of the form ``"/~<user>"``
+so that their prefix is ``"/u/<user>"`` (and leave other URLs
+unchanged)::
+
+  sub vcl_init {
+      new user = pcre2.regex("/~([^/]+)(.*)", anchored=true);
+  }
+  
+  sub vcl_recv {
+      set req.url = user.sub(req.url, "/u/${1}${2}");
+  }
+
+When the ``sub_extended`` option is false, only the dollar character
+is special in the replacement string. When ``sub_extended`` is true,
+the replacement syntax also has these capabilites:
+
+* Backslashes in the replacement string are interpreted as escapes,
+  and special backslash sequences are interpreted as for PCRE2
+  patterns.  For example, ``\n`` denotes newline, and ``\x{ddd}``,
+  where each ``d`` is a digit, specifies a character code. A backslash
+  followed by a non-alphanumeric character quotes the character, and
+  ``\Q`` and ``\E`` can be used to quote a longer sequence.
+
+* Four additional escape sequences can be used to force the case of
+  inserted letters:
+
+  * ``\U`` forces upper case for all of the following text until
+    ``\E``, or to the end of the string if there is no ``\E``.
+
+  * ``\L`` through ``\E`` or end of string forces lower case.
+
+  * ``\u`` and ``\l`` force the next character, if it is a letter, to
+    upper and lower case, respectively.
+
+  Case forcing applies to all inserted characters, including those from
+  captured groups and in sequences quoted by ``\Q`` through ``\E``.
+
+  Sequences ending in ``\E`` do not nest. So for example,
+  ``"\Uaa\LBB\Ecc\E"`` results in ``"AAbbcc"``, and the final ``\E`` has
+  no effect.
+
+* The "dollar" replacement expressions have an additional capability
+  inspired by Bash to handle unset capturing groups:
+
+  ``${<n>:-<string>}``
+    As with ``${<n>}``, ``<n>`` can be a number or name. If group
+    ``<n>`` is set, then its contents are inserted, otherwise
+    ``<string>`` is expanded and inserted. ``<string>`` may, in turn,
+    include elements of the replacement syntax that are interpreted
+    accordingly.
+
+  ``${<n>:+<string1>:<string2}``
+    If group ``<n>`` is set, insert the result of expanding
+    ``<string1>``, otherwise insert the result of expanding
+    ``<string2>``.
+
+  Colons and escapes in the replacement strings can be escaped with
+  backslashes.
+
+For example, to rewrite Host headers of the form
+``www.<sub1>.<sub2>.<tld>`` to ``<sub2>.<tld>``, and of the form
+``www.<sub>.<tld>`` to ``<sub>.<tld>``, while also normalizing the header
+to lower-case, and leaving other Host headers unchanged::
+
+  sub vcl_init {
+      new hostsub = pcre2.regex(extended=true, pattern={"
+                    "^www\.		# www. prefix
+                    ([^.]+)		# group 1, "<sub1>"
+                    (?:			# non-capturing parentheses
+                      \.([^.]+)		# dot, then group 2, "<sub2>"
+                    )?			# 0 or 1 of group 2
+                    \.([^.]+)$		# dot, then group 3, "<tld>"
+                    "});
+  }
+
+  sub vcl_recv {
+      set req.http.Host = hostsub.sub(req.http.Host, sub_extended=true,
+                                      replacement="\L${2:+$2:$1}.$3");
+  }
+
+``.sub()`` fails, returning NULL while logging a ``VCL_Error`` message,
+if ``replacement`` is undefined.
+
 .. _func_regex.info_bool:

 regex.info_bool
@@ -103,6 +939,108 @@ regex.info_bool

 	BOOL regex.info_bool(ENUM {ALLOW_EMPTY_CLASS,ANCHORED,ALT_BSUX,ALT_CIRCUMFLEX,ALT_VERBNAMES,CASELESS,DOLLAR_ENDONLY,DOTALL,DUPNAMES,EXTENDED,FIRSTLINE,MATCH_UNSET_BACKREF,MULTILINE,NEVER_BACKSLASH_C,NEVER_UCP,NEVER_UTF,NO_AUTO_CAPTURE,NO_AUTO_POSSESS,NO_DOTSTAR_ANCHOR,NO_START_OPTIMIZE,NO_UTF_CHECK,UCP,UNGREEDY,USE_OFFSET_LIMIT,UTF,HAS_FIRSTCODEUNIT,MATCH_ATSTART,HAS_LASTCODEUNIT,HAS_BACKSLASHC,HAS_CRORLF,JCHANGED,MATCH_EMPTY}, BOOL compiled=1)

+Return true or false about a property of the regex that the object
+represents.  This method and the other ``.info_*`` methods may be
+helpful for debugging and optimizing regular expression matching, for
+example by determining whether PCRE2 could enable certain
+optimizations for the pattern.
+
+The ENUM determines which property is to be inspected. If the ENUM is any
+one of::
+
+  ALLOW_EMPTY_CLASS, ANCHORED, ALT_BSUX, ALT_CIRCUMFLEX,
+  ALT_VERBNAMES, CASELESS, DOLLAR_ENDONLY, DOTALL, DUPNAMES, EXTENDED,
+  FIRSTLINE, MATCH_UNSET_BACKREF, MULTILINE, NEVER_BACKSLASH_C,
+  NEVER_UCP, NEVER_UTF, NO_AUTO_CAPTURE, NO_AUTO_POSSESS,
+  NO_DOTSTAR_ANCHOR, NO_START_OPTIMIZE, NO_UTF_CHECK, UCP, UNGREEDY,
+  USE_OFFSET_LIMIT, UTF
+
+then the return value of ``info_bool()`` indicates whether the
+corresponding compile option is true for the pattern. If ``compiled``
+is true, then the return indicates whether the option was set to true
+after the pattern was compiled, even if it was specified differently
+(or left to the default) in the object constructor.  If ``compiled``
+is false, then the method returns the value of the option as it was
+provided in the constructor.
+
+For example, if the compile option ``anchored`` was set to false in
+the constructor (or left to the default), PCRE2 may nevertheless
+determine that the pattern is anchored if certain conditions are
+satisfied (which are described in detail in `pcre2api(3)`_). In that
+case, ``info_bool()`` will return true if ``compiled`` is true, and
+false if ``compiled`` is false.
+
+``compiled`` is true by default, and is ignored for the other ENUM
+values.
+
+The other ENUMs are interpreted as follows:
+
+``HAS_FIRSTCODEUNIT``
+  If the pattern is unanchored, PCRE2 may determine that there is a
+  unique code unit (a byte) that must appear at the start of the
+  matching part of a string. For example, the part of a string that
+  matches ``(cat|cow|coyote)`` must begin with a
+  ``c``. ``info_bool(HAS_FIRSTCODEUNIT)`` returns true if there is
+  such a code unit, and false if the pattern is anchored or if no
+  unique first code unit could be determined. If there is such a first
+  code unit, it is returned by ``info_str(FIRSTCODEUNIT)``. Note that
+  in non-UTF mode, the first code unit is the same as the first
+  character, but for UTF-8 patterns, it may be the first byte in a
+  multibyte character.
+
+``MATCH_ATSTART``
+  If the pattern is unanchored and no unique first code unit in the
+  matching part of the string is known, PCRE2 may determine that the
+  pattern is constrained to match at the start of the subject string,
+  or following a newline in the subject. In that case,
+  ``info_bool(MATCH_ATSTART)`` returns true; it returns false if the
+  pattern is anchored, if a unique first code unit could be found, or
+  if the pattern could not be determined to match at the start.
+
+``HAS_LASTCODEUNIT``
+  Under certain circumstances, PCRE2 may determine a rightmost literal
+  code unit that must exist in a matching string, other than at the
+  start. This is not necessarily the last byte in the matching part of
+  a string, but rather the last literal code unit known to be
+  required. For example, the ``b`` is recorded for this purpose for
+  the pattern ``ab\d+``, although the ``b`` must be followed by
+  digits. In there is such a last code unit,
+  ``info_bool(HAS_LASTCODEUNIT)`` returns true, and that value can be
+  retrieved from ``info_str(LASTCODEUNIT)``. For anchored patterns,
+  PCRE2 records a possible last literal code unit only if a part of
+  the pattern that comes before it has variable length. For example,
+  ``z`` is recorded for ``^a\d+z\d+`` (because one or more digits must
+  come before it), but none is recorded for ``^a\dz\d`` (because
+  matching strings have a fixed length). As with the first code unit,
+  the last code unit may be a byte in a multibyte UTF-8 character, if
+  UTF is enabled for the pattern.
+
+``HAS_BACKSLASHC``
+  Return true if and only if ``\C`` appears in the pattern.
+
+``HAS_CRORLF``
+  Return true if and only if the pattern contains explicit matches for
+  CR or LF characters. These can be literal carriage returns or
+  linefeeds in the pattern, or the escape sequences ``\r`` or ``\n``.
+
+``JCHANGED``
+  Return true if and only if the pattern contains ``(?J)`` or ``(?-J)``
+  to enable or disable JIT-matching.
+
+``MATCH_EMPTY``
+  Return true if and only if PCRE2 determines that the pattern might
+  match the empty string. For certain complex patterns (with recursive
+  subroutines), it may not be possible to determine; in that case,
+  PCRE2 cautiously returns true.
+
+Example::
+
+  # To determine if the FIRSTCODEUNIT optimization could be applied.
+  if (myregex.info_bool(HAS_FIRSTCODEUNIT)) {
+      std.log("First matching char in the pattern = "
+              + myregex.info_str(FIRSTCODEUNIT));
+  }
+
 .. _func_regex.info_int:

 regex.info_int
@@ -112,6 +1050,65 @@ regex.info_int

 	INT regex.info_int(ENUM {BACKREFMAX,CAPTURECOUNT,JITSIZE,MATCHLIMIT,MAXLOOKBEHIND,MINLENGTH,RECURSIONLIMIT,SIZE})

+Return an integer that describes a property of the pattern that the
+object represents, as determined by the ENUM.
+
+``BACKREFMAX``
+  Return the highest back reference within the pattern. Remember that
+  named groups also acquire group numbers, and thus count towards the
+  highest backref. A conditional subpattern such as ``(?(3)a|b)``,
+  which checks if a capturing group is set, also counts as a
+  backref. If there are no backrefs, return 0.
+
+``CAPTURECOUNT``
+  Return the highest capturing group number in the pattern. If the
+  ``(?|`` construct (which allows duplicate group numbers, see
+  `pcre2pattern(3)`_) is not used in the pattern, then the value
+  returned is also the total number of capturing groups.
+
+``JITSIZE``
+  Return the size of JIT-compiled code for the pattern. Returns 0 if
+  the pattern was not JIT-compiled.
+
+``MATCHLIMIT``
+  If the pattern contains the construct ``(*LIMIT_MATCH=nnnn)`` to set
+  the match limit (see the match option ``match_limit`` above), then
+  return the limit that it sets. Returns -1 if no such value has been
+  set.
+
+``MAXLOOKBEHIND``
+  Return the number of characters in the longest lookbehind assertion
+  in the pattern. Returns 0 if there are no lookbehinds.
+
+``MINLENGTH``
+  If PCRE2 has determined that there is a lower bound for the length
+  of a string that may match the pattern, then return that
+  value. Returns 0 if no lower bound is known. This is not necessarily
+  the same as the shortest string that may possibly match; but any
+  string that does match must be at least that long.
+
+``RECURSIONLIMIT``
+  If the pattern contains the construct ``(*LIMIT_RECUSRION=nnnn)``
+  (see the match option ``recursion_limit`` above), then return the
+  value that was set. Returns -1 if no such value has been set.
+
+``SIZE``
+  Return the size of the compiled pattern, as used for the
+  interpretive matcher, in bytes. This is independent of the value
+  returned by ``info_int(JITSIZE)``.
+
+Example::
+
+  # To determine if a lower bound on the length of matching strings
+  # could be found.
+  if (myregex.info_int(MINLENGTH) != 0) {
+      std.log("Lower bound on matching string length = "
+               myregex.info_int(MINLENGTH));
+  }
+  else {
+      std.log("No lower bound for matching string lengths found");
+  }
+
 .. _func_regex.info_str:

 regex.info_str
@@ -121,6 +1118,64 @@ regex.info_str

 	STRING regex.info_str(ENUM {BSR,FIRSTCODEUNIT,FIRSTCODEUNITS,LASTCODEUNIT,NEWLINE}, STRING sep=" ")

+Return a string that describes a property of the pattern represented
+by the object, as determined by the ENUM. The ``sep`` parameter is
+only relevant when the ENUM ``FIRSTCODEUNITS`` is used, as described
+below.
+
+``BSR``
+  Return ``"UNICODE"``, meaning that ``\R`` in the pattern matches any
+  Unicode line ending sequence, or ``"ANYCRLF"``, meaning that it
+  matches only CR, LF or CRLF.
+
+``FIRSTCODEUNIT``
+  If PCRE2 determines that there is a unique first code unit that must
+  begin the matching part of a string (as described above for
+  ``info_bool(HAS_FIRSTCODEUNIT)``), then return that code unit in a
+  string.  Returns the empty string if no such code unit was
+  determined; this is also the case if the pattern is anchored. Recall
+  that a code unit corresponds to a character in non-UTF mode, but may
+  be a byte in a multibyte character when UTF-8 is enabled. The code
+  unit is not escaped in the return string.
+
+``FIRSTCODEUNITS``
+  (Note the difference between ``FIRSTCODEUNIT``, singular, and
+  ``FIRSTCODEUNITS``, plural.) For an unanchored pattern, if PCRE2
+  cannot determine a unique code unit that must appear at the start of
+  the matching part of a string, it may be able to determine a set of
+  such code units. For example, if the pattern starts with ``[abc]``,
+  then the matching part must begin with ``a``, ``b`` or ``c``. In
+  that case, ``info_str(FIRSTCODEUNITS)`` returns those code units in
+  a string, separated by the string given as ``sep``. The default
+  value of ``sep`` is ``" "`` (the string containing one space). If
+  the pattern is anchored, or if a unique first code unit could be
+  found, or if no set of first code units could be found, then return
+  the empty string.
+
+``LASTCODEUNIT``
+  If PCRE2 has recorded a rightmost literal code unit that must exist
+  in a matching string, as described for ``info_bool(HAS_LASTCODEUNIT)``
+  above, then return that code unit in a string. Returns the empty
+  string if no such code unit was recorded.
+
+``NEWLINE``
+  Return a string describing the default sequence recognized as a
+  "newline" for the pattern:
+
+  * ``"CR"`` (carriage return)
+  * ``"LF"`` (linefeed)
+  * ``"CRLF"`` (CR followed by LF)
+  * ``"ANYCRLF"`` (CR, LF or CRLF)
+  * ``"UNICODE"`` (any Unicode line-ending sequence)
+
+Example::
+
+  # Determine if a set of first matching characters could be found.
+  std.log("First matching chars: " + myregex.info_str(FIRSTCODEUNITS));
+
+Regex functional interface
+--------------------------
+
 .. _func_match:

 match
@@ -130,6 +1185,31 @@ match

 	BOOL match(PRIV_CALL, PRIV_TASK, STRING pattern, STRING subject, BOOL allow_empty_class=0, BOOL anchored=0, ENUM {ANYCRLF,UNICODE} bsr=0, BOOL alt_bsux=0, BOOL alt_circumflex=0, BOOL alt_verbnames=0, BOOL caseless=0, BOOL dollar_endonly=0, BOOL dotall=0, BOOL dupnames=0, BOOL extended=0, BOOL firstline=0, STRING locale=0, BOOL match_unset_backref=0, INT max_pattern_len=0, BOOL multiline=0, BOOL never_backslash_c=0, BOOL never_ucp=0, BOOL never_utf=0, ENUM {CR,LF,CRLF,ANYCRLF,ANY} newline=0, BOOL no_auto_capture=0, BOOL no_auto_possess=0, BOOL no_dotstar_anchor=0, BOOL no_start_optimize=0, BOOL no_utf_check=0, INT parens_nest_limit=0, BOOL ucp=0, BOOL ungreedy=0, BOOL use_offset_limit=0, BOOL utf=0, INT len=0, INT match_limit=0, INT offset_limit=0, BOOL notbol=0, BOOL noteol=0, BOOL notempty=0, BOOL notempty_atstart=0, BOOL no_jit=0, INT recursion_limit=0)

+Compile the ``pattern`` and return true if it matches
+``subject``. Compilation and matching are subject to the given
+options, or default options. The compiled pattern is discarded after
+use, and ``pattern`` is compiled on every invocation.
+
+The call fails, logging an ``VCL_Error`` message and returning false,
+if:
+
+* ``pattern`` is undefined.
+
+* The compile fails (for example due to a syntax error).
+
+* Any compile or match option is illegal as described above.
+
+As with the ``.match()`` method, if ``subject`` is undefined, then it
+is assumed to be the empty string.
+  
+Example::
+
+  # Match a request header against a pattern provided in a response
+  # header.
+  if (pcre2.match(resp.http.X-Pattern, req.http.X-Subject)) {
+      call do_on_match;
+  }
+
 .. _func_backref:

 backref
@@ -139,6 +1219,33 @@ backref

 	STRING backref(PRIV_TASK, INT ref, STRING fallback="**BACKREF FUNCTION FAILED**")

+Return the `nth` captured subexpression from the most recent
+successful call of the ``match()`` function in the current client or
+backend context, or a fallback string if the capture fails. The
+default ``fallback`` is ``"**BACKREF FUNCTION FAILED**"``.
+
+As with the ``regex.backref()`` method, ``fallback`` is returned
+after any failed invocation of the ``match()`` function, or if there
+is no captured group corresponding to the backref number. The function
+is not affected by native VCL regex operations, or any other method or
+function of the VMOD except for the ``match()`` function.
+
+The function fails, returning ``fallback`` and logging a ``VCL_Error``
+message, under the same conditions as the corresponding method:
+
+* ``fallback`` is undefined.
+* ``ref`` is out of range.
+* The ``match()`` function was never called in this context.
+* The pattern failed to compile for the previous ``match()`` call.
+
+Example::
+
+  # Match against a pattern provided in a response header, and capture
+  # subexpression 1.
+  if (pcre2.match(resp.http.X-Pattern, req.http.X-Subject)) {
+     set resp.http.X-Group-1 = pcre2.backref(1);
+  }
+
 .. _func_namedref:

 namedref
@@ -148,6 +1255,29 @@ namedref

 	STRING namedref(PRIV_TASK, STRING name, STRING fallback="**NAMEDREF FUNCTION FAILED**")

+Return the captured subexpression designated by ``name`` from the most
+recent successful call of the ``match()`` function in the current
+context, or ``fallback`` in case of failure. The default fallback is
+``"**NAMEDREF FUNCTION FAILED**"``.
+
+The function returns ``fallback`` when the previous invocation of the
+``match()`` function failed, and is only affected by use of the
+``match()`` function. The function fails, returning ``fallback`` and
+logging a ``VCL_Error`` message, under the same conditions as the
+corresponding method:
+
+* ``fallback`` is undefined.
+* ``name`` is undefined or the empty string.
+* There is no such named group.
+* ``match()`` was not called in this context.
+* The pattern failed to compile for the previous ``match()`` call.
+
+Example::
+
+  if (pcre2.match(resp.http.X-Pattern-With-Names, req.http.X-Subject)) {
+     set resp.http.X-Group-Foo = pcre2.namedref("foo");
+  }
+
 .. _func_sub:

 sub
@@ -157,6 +1287,34 @@ sub

 	STRING sub(PRIV_CALL, PRIV_TASK, STRING pattern, STRING subject, STRING replacement, BOOL allow_empty_class=0, BOOL anchored=0, ENUM {ANYCRLF,UNICODE} bsr=0, BOOL alt_bsux=0, BOOL alt_circumflex=0, BOOL alt_verbnames=0, BOOL caseless=0, BOOL dollar_endonly=0, BOOL dotall=0, BOOL dupnames=0, BOOL extended=0, BOOL firstline=0, STRING locale=0, BOOL match_unset_backref=0, INT max_pattern_len=0, BOOL multiline=0, BOOL never_backslash_c=0, BOOL never_ucp=0, BOOL never_utf=0, ENUM {CR,LF,CRLF,ANYCRLF,ANY} newline=0, BOOL no_auto_capture=0, BOOL no_auto_possess=0, BOOL no_dotstar_anchor=0, BOOL no_start_optimize=0, BOOL no_utf_check=0, INT parens_nest_limit=0, BOOL ucp=0, BOOL ungreedy=0, BOOL use_offset_limit=0, BOOL utf=0, INT len=0, INT match_limit=0, INT offset_limit=0, BOOL notbol=0, BOOL noteol=0, BOOL notempty=0, BOOL notempty_atstart=0, BOOL no_jit=0, INT recursion_limit=0, BOOL suball=0, BOOL sub_extended=0, BOOL unknown_unset=0, BOOL unset_empty=0)

+Compile ``pattern``, and if it matches ``subject``, then return a string
+formed by replacing the part that matched by ``replacement``. If the
+pattern does not match, return ``subject`` unchanged. The compile, match
+and substitution options affect all of these operations, as described
+above.
+
+The syntax of the ``replacement`` string, as modified if the
+``sub_extended`` option is true, is the same as documented above for
+the ``.sub()`` method.
+
+``sub()`` fails, returning NULL and logging a ``VCL_Error`` message,
+if:
+
+* Either of ``pattern`` or ``replacement`` is undefined.
+
+* ``pattern`` cannot be compiled.
+
+Example::
+
+  # If the beresp header X-Sub-Letters contains "b+", and Host contains
+  # "www.yabba.dabba.doo.com", then set X-Yada to
+  # "www.yada.dabba.doo.com".
+  set beresp.http.X-Yada = re2.sub(beresp.http.X-Sub-Letters,
+                                   bereq.http.Host, "d");
+
+Library configuration
+---------------------
+
 .. _func_config_bool:

 config_bool
@@ -166,6 +1324,35 @@ config_bool

 	BOOL config_bool(ENUM {JIT,STACKRECURSE,UNICODE})

+Return true or false about a property of the PCRE2 library to which
+the VMOD is linked, identified by the ENUM. The ``config_*`` functions
+make it possible to discover features of the library that were chosen
+when it was built.
+
+``JIT``
+  Return true if the library supports just-in-time compilation and
+  matching.
+
+``STACKRECURSE``
+  Return true if internal recursion for the PCRE2 matcher uses the
+  system stack to maintain its state, which is the usual way the
+  library is built. If false is returned, PCRE2 uses blocks of data on
+  the heap rather than recursive function calls.
+
+``UNICODE``
+  Return true if Unicode support is available. If so, then the compile
+  option ``utf`` can be used to define a pattern and the strings
+  against which it is matched as UTF-8 strings.
+
+Example::
+
+  if (pcre2.config_bool(JIT)) {
+      std.log("JIT supported for PCRE2");
+  }
+  else {
+      std.log("JIT not supported for PCRE2");
+  }
+
 .. _func_config_str:

 config_str
@@ -175,6 +1362,43 @@ config_str

 	STRING config_str(ENUM {BSR,JITTARGET,NEWLINE,UNICODE_VERSION,VERSION})

+Return a string describing a property of the PCRE2 library.
+
+``BSR``
+  Return a string indicating what the ``\R`` escape sequence matches
+  by default: ``UNICODE`` for Unicode line-ending sequences, or
+  ``ANYCRLF`` for only CR, LF and CRLF. This is the default that holds
+  if no value is given for the compile option ``bsr``.
+
+``JITTARGET``
+  Return a string identifying the architecture for which the JIT
+  compiler is configured. If JIT is not enabled, the returned string
+  contains the phrase ``"JIT not supported"``.
+
+``NEWLINE``
+  Return a string identifying the character sequence that is recognized
+  by default as a newline:
+
+  * ``"CR"`` (carriage return)
+  * ``"LF"`` (linefeed)
+  * ``"CRLF"`` (CR followed by LF)
+  * ``"ANY"`` (any Unicode line ending)
+  * ``"ANYCRLF"`` (any of CR, LF or CRLF)
+
+  This is the default if no value is given for the compile option
+  ``newline``.
+
+``UNICODE_VERSION``
+  If Unicode is supported by the library, return the Unicode version
+  string. If not, return ``"Unicode not supported"``.
+
+``VERSION``
+  Return the PCRE2 version string.
+
+Example::
+
+  std.log("Linked to PCRE2 version " + pcre2.config_str(VERSION));
+
 .. _func_config_int:

 config_int
@@ -184,6 +1408,32 @@ config_int

 	INT config_int(ENUM {LINKSIZE,MATCHLIMIT,PARENSLIMIT,RECURSIONLIMIT})

+Return an integer describing a property of the PCRE2 library.
+
+``LINKSIZE``
+  Return the number of bytes used for internal linkage (offsets) in
+  compiled regular expressions. This determines the size of the
+  largest possible pattern; the default link size of 2 allows for
+  patterns of up to 64K bytes.
+
+``MATCHLIMIT``
+  Return the default value for the ``match_limit`` compile option,
+  which limits the effort of the matcher when no match is found.
+
+``PARENSLIMIT``
+  Return the default value of the ``parens_nest_limit`` compile
+  option, which limits the depth of parenthesis nesting in patterns,
+  and hence the use of the stack during compilation.
+
+``RECURSIONLIMIT``
+  Return the default value of the ``recursion_limit`` compile option,
+  which limits the depth of recursion, and hence stack usage, for the
+  the interpretive (non-JIT) matcher.
+
+Example::
+
+  std.log("Default PCRE2 match limit = " + config_int(MATCHLIMIT));
+
 .. _func_version:

 version
@@ -197,29 +1447,106 @@ Returns the version string for this VMOD.

 Example::

-        std.log("Using VMOD pcre2 version " + pcre2.version());
+  std.log("Using VMOD pcre2 version " + pcre2.version());

 REQUIREMENTS
 ============

-This VMOD requires Varnish ...
-
-LIMITATIONS
-===========
-
-...
+This VMOD has been tested with Varnish version 5.1.2 and PCRE2 version
+10.23.

 INSTALLATION
 ============

 See `INSTALL.rst <INSTALL.rst>`_ in the source repository.

+LIMITATIONS
+===========
+
+The VMOD allocates Varnish workspace for a variety of purposes:
+
+* The string returned by the ``sub`` method and function.
+
+* Buffers for temporary data structures needed by the PCRE2 library,
+  for example to save information about a match for use by the
+  ``backref`` and ``namedref`` methods and functions.
+
+* A copy of the subject string for the ``match`` method and function,
+  if it is not already in workspace, so that it can be safely accessed
+  by ``backref`` and ``namedref``.
+
+* Return strings for some uses of ``info_str`` and ``config_str``.
+
+* Temporary buffers for error message strings from the PCRE2 library.
+
+If VMOD operations fail with the "out of space" error message in the
+Varnish log (with the ``VCL_Error`` tag), increase the varnishd runtime
+parameters ``workspace_client`` and/or ``workspace_backend``.
+
+The PCRE2 interpretive and JIT matchers are backtracking matchers, and
+the interpretive matcher is recursive, using part of the stack on each
+recursive call (in the default library configuration). For patterns
+with large search spaces, this can lead to slow matches, high CPU
+usage, and stack overflow due to deep recursion, which typically
+causes Varnish to segfault. This has occasionally been the subject of
+issues reported to the Varnish project.
+
+For most common uses of regular expressions in VCL, PCRE2 is very fast
+and has minimal resource consumption. This depends strongly on how the
+regex is written -- a well-crafted pattern helps the matcher limit
+backtracking, fail early on non-matches, and make use of some the
+optimizations that PCRE2 can apply. Some of the compile and match
+options also help to optimize the match operation. Which of these
+measures is possible depends, of course, on what you want the regex to
+do.
+
+Writing optimized regexen is a very broad subject, beyond the scope of
+this manual. There is some advice in `pcre2perform(3)`_, and in many
+other sources.
+
+If your use case requires patterns and subject strings that can lead
+to very large search spaces, consider using some of the options
+available in the VMOD that limit excessive effort for unsuccessful
+matches. In particular, consider lowering the match options
+``match_limit`` and ``recursion_limit``. You can also use
+``offset_limit`` to set a maximum length to search for a match in the
+subject string (for which you will have to set the compile option
+``use_offset_limit``). These may cause the matcher to halt before it
+has exhausted all possibilities for a match (but it appears to be
+common that, if the matcher has to search for a long time, then there
+was never any match to be found).
+
+If you encounter stack overflow, it may help to increase the stack
+size (by changing ``limits.conf`` or calling ``ulimit -s`` before
+starting Varnish). Since Varnish 4.1, you can also increase the
+varnishd parameter ``thread_pool_stack``. Bear in mind that this
+increases the total RAM usage of Varnish.
+
+ACKNOWLEDGEMENTS
+================
+
+A tip of the hat to Philip Hazel, who released the first version of
+PCRE twenty years before this VMOD was developed.
+
+A few sentences in this manual are identical to or very closely track
+phrasings in the PCRE2 documentation, if there was simply no better
+way to say what needs to be said.
+
 SEE ALSO
 ========

 * varnishd(1)
 * vcl(7)
-* source repository: https://code.uplex.de/uplex-varnish/libvmod-pcre2
+* pcre2(3)
+* PCRE web site: http://www.pcre.org/
+* VMOD source repository: https://code.uplex.de/uplex-varnish/libvmod-pcre2
+
+.. _pcre2(3): http://www.pcre.org/current/doc/html/pcre2.html
+.. _pcre2pattern(3): http://www.pcre.org/current/doc/html/pcre2pattern.html
+.. _pcre2syntax(3): http://www.pcre.org/current/doc/html/pcre2syntax.html
+.. _pcre2api(3): http://www.pcre.org/current/doc/html/pcre2api.html
+.. _pcre2unicode(3): http://www.pcre.org/current/doc/html/pcre2unicode.html
+.. _pcre2perform(3): http://www.pcre.org/current/doc/html/pcre2perform.html

 COPYRIGHT
 =========

--- a/src/vmod_pcre2.vcc
+++ b/src/vmod_pcre2.vcc
@@ -9,13 +9,602 @@ $Module pcre2 3 access the pcre2 regular expression library

 ::

-  new OBJECT = ...
+  # object interface
+  new OBJECT = pcre2.regex(STRING pattern [, compile options])
+  BOOL <OBJ>.match(STRING subject [, match options])
+  STRING <OBJ>.backref(INT ref)
+  STRING <OBJ>.namedref(STRING name)
+  STRING <OBJ>.sub(STRING subject, STRING replacement [, match options]
+                   [, substitution options])
+  BOOL <OBJ>.info_bool(ENUM)
+  INT <OBJ>.info_int(ENUM)
+  STRING <OBJ>.info_str(ENUM)
+
+  # function interface
+  BOOL pcre2.match(STRING pattern, STRING subject [, compile options]
+                   [, match options])
+  STRING pcre2.backref(INT ref)
+  STRING pcre2.namedref(STRING name)
+  STRING pcre2.sub(STRING pattern, STRING subject, STRING replacement
+                   [, compile options] [, match options]
+                   [, substitution options])
+
+  # library configuration
+  BOOL pcre2.config_bool(ENUM)
+  INT pcre2.config_int(ENUM)
+  STRING pcre2.config_str(ENUM)

 DESCRIPTION
 ===========

 This Varnish Module (VMOD) provides access to the PCRE2 regular
-expresion library.
+expression library. PCRE2 is the Perl-compatible regular expression
+library with a revised API, the successor to the PCRE library that
+implements native regexen in Varnish VCL. See `pcre2(3)`_ and the
+manuals that it references for details about the PCRE2 library.
+
+PCRE2, by itself, does not change regular expressions from the
+perspective of the end user -- the syntax and semantics of patterns
+and pattern matching remained largely the same at the time PCRE2 was
+introduced. The new library is a refactoring of the internal API,
+which is transparent to the user, and the VMOD endeavors to make use
+of the new internal features advantageously for VCL.
+
+Some of the differences between the VMOD and native VCL regexen are:
+
+* The VMOD provides methods and functions to retrieve back references
+  after a match that are easier to use than the idiom with the
+  ``regsub`` function that is necessary in native VCL. It also
+  provides the means to retrieve references to named capturing groups.
+
+* The functional interface makes it possible to use patterns that are
+  not known until runtime.
+
+* PCRE2 introduces a new native substitution function, similar to the
+  ``regsub`` and ``regsuball`` functions in VCL, except that the
+  substitution syntax is different and provides more features.
+
+* Parameters that limit the depth of recursion and backtracking in
+  match operations, which are set globally in Varnish, can be set for
+  individual matches in the VMOD.
+
+* The VMOD can support matching against UTF-8 strings, if it is
+  running against a PCRE2 library that was built to support Unicode.
+
+* The VMOD exposes considerably more functionality of the underlying
+  library. VCL provides a general-purpose regular expression facility
+  -- PCRE could be easily replaced as its regex engine. The VMOD is
+  meant to be specific to PCRE2, and makes a full range of its
+  features available in VCL.
+
+* The VMOD provides methods and functions that allow you to inspect
+  properties of patterns and of the library. These are not likely to
+  be useful on the fast path of production deployments, and are not
+  optimized for that. But they may be useful during development to
+  debug and optimize regex matching.
+
+Since the introduction of PCRE2, the original PCRE library is being
+maintained for bugfixes, but development of new features and
+optimizations are only being done for PCRE2. So the VMOD will make it
+possible to take advantage of improvements in the library as they are
+released.
+
+Here are some simple usage examples::
+
+  # regex objects are created in vcl_init, and the regular expressions
+  # are compiled when VCL is loaded.
+  sub vcl_init {
+      # A regex to match the "foo" cookie, and capture its value.
+      new foo = pcre2.regex("\bfoo=([^;,\s]+\b)");
+
+      # A regex to match a URL beginning with the prefix "/bar/", and
+      # capture its suffix.
+      new bar = pcre2.regex("^/bar/(.+)");
+  }
+
+  sub vcl_recv {
+      # If the cookie header contains "foo", then assign its value
+      # to another header.
+      if (foo.match(req.http.Cookie)) {
+          set req.http.X-Foo-Value = foo.backref(1);
+      }
+
+      # If the URL begins with "/bar/", then replace the prefix with
+      # "/baz/quux/".
+      if (bar.match(req.url)) {
+          set req.url = "/baz/quux/" + bar.backref(1);
+      }
+  }
+
+Object and functional interfaces
+--------------------------------
+
+The VMOD provides regular expression operations by way of the
+``regex`` object interface and a functional interface. For ``regex``
+objects, the pattern is compiled at VCL initialization time, and the
+compiled pattern is re-used for each invocation of its
+methods. Compilation failures (due to errors in the pattern) cause
+failure at initialization time, and the VCL fails to load. The
+``.backref()`` and ``.namedref()`` methods refer back to the last
+invocation of the ``.match()`` method for the same object. The
+``.sub()`` method also re-uses an object's compiled pattern.
+
+The functional interface provides the same set of operations, but the
+pattern is compiled at runtime on each invocation of the ``match()``
+and ``sub()`` functions (and then discarded). Compilation failures are
+reported as errors in the Varnish log. The ``backref()`` and
+``namedref()`` functions refer back to the last invocation of the
+``match()`` function, for any pattern.
+
+Compiling a pattern at runtime on each invocation is considerably more
+costly than re-using a compiled pattern. So for patterns that are
+fixed and known at VCL initialization, the object interface should be
+used. The functional interface should only be used for patterns whose
+contents are not known until runtime.
+
+Compile, match and substitution options
+---------------------------------------
+
+The VMOD has unusually long lists of parameters for its methods and
+functions -- over 40 for the ``sub()`` function, for example. But
+nearly all of these have default values, and it is only necessary to
+specify options in VCL that differ from the defaults.
+
+The optional parameters affect the interpretations of patterns and the
+operation of matches and substitutions, and come in three groups:
+
+* *Compile* options, used wherever a pattern is compiled: in the
+  ``regex`` object constructor, and the ``match()`` and ``sub()``
+  functions.
+
+* *Match* options, used wherever a match is performed: in the
+  ``match`` and ``sub`` methods and functions.
+
+* *Substitution* options, used in the ``sub`` method and function.
+
+The options have call scope, meaning that they are evaluated only once
+for each invocation of a function or method at its particular location
+in the VCL source, on the first invocation after the VCL instance is
+loaded. The options are then cached and re-used for all subsequent
+invocations, and cannot be changed (until a new VCL instance is
+loaded).
+
+Compile options
+~~~~~~~~~~~~~~~
+
+Compile options define properties of patterns. See `pcre2pattern(3)`_
+for details of PCRE2 pattern syntax, and `pcre2syntax(3)`_ for a quick
+reference.
+
+The default value of all of the BOOL options is **false**.
+
+See also `JIT compilation and matching`_ below.
+
+``allow_empty_class``
+  If true, then a pattern may include ``[]`` to denote an empty
+  character class. This, in part, supports compatibility with regexen
+  in ECMAscript (also known as Javascript). By default, a closing
+  square bracket after an opening one is interpreted as a character in
+  the class (and ``]`` must appear later in the pattern).
+
+``alt_bsux``
+  (Referring to "backslash-u" and "backslash-x".) If true, then three
+  escape sequences are interpreted differently (for compatibility with
+  ECMAscript):
+
+  * ``\U`` matches an upper case ``U`` character. By default, ``\U``
+    causes a compile error.
+
+  * ``\u`` matches a lower case ``u``, unless it is followed by four
+    hexadecimal digits, in which case the hex number identifies the
+    code point to be matched. By default, ``\u`` causes a compile
+    error.
+
+  * ``\x`` matches a lower case ``x``, unless it is followed by four
+    hex digits, in which case it identifies the code point to match.
+    By default, ``\x`` must always be followed by zero to two hex
+    digits to identify a one-byte character (for example, ``\xz``
+    matches binary zero followed by ``z``).
+
+``alt_circumflex``
+  If true, and if ``multiline`` is also true, then the ``^``
+  meta-character matches after a newline appearing as the last
+  character in a string. By default, ``^`` does not match after
+  a terminating newline.
+
+``alt_verbnames``
+  If true, then backslash processing may be applied to verb names in
+  verb sequences such as ``(*MARK:NAME)``, so that the name can, for
+  example, include a closing parenthesis as ``\)`` or between ``\Q``
+  and ``\E``. By default, no processing is applied to verb names, and
+  they end at the first closing parenthesis (regardless of any
+  backslash).
+
+``anchored``
+  If true, then the pattern is anchored, meaning that it is
+  constrained to match at the starting point of a string. This may
+  also be achieved with constructs in the pattern.
+
+``bsr``
+  (For "backslash-R".) If this ENUM value is set, then it determines
+  which sequences are matched by ``\R``. If set to ``UNICODE``, then
+  ``\R`` matches any UTF-8 newline sequence. If set to ``ANYCRLF``,
+  then it matches CR (carriage return, or ``\r``), LF (linefeed, or
+  ``\n``), or CR followed by LF. By default, ``\R`` matches the
+  sequence chosen when the PCRE2 library was built, which can be
+  determined from ``config_str(BSR)`` (the default default is
+  Unicode). See `pcre2pattern(3)`_ for details about ``\R``.
+
+``caseless``
+  If true, then matches for this pattern are case-insensitive. This
+  may also be achieved with ``(?i)`` in the pattern.
+
+``dollar_endonly``
+  If true, then the ``$`` metacharacter matches only at the end of a
+  string. By default, ``$`` also matches before newlines within the
+  string (but not before newlines that come immediately after a
+  newline). ``dollar_endonly`` is ignored when ``multiline`` is true.
+
+``dotall``
+  If true, then the ``.`` metacharacter matches any character,
+  including newlines. But it only ever matches one character, even if
+  newlines are coded as CRLF. By default, dots do not match
+  newlines. The effect of ``dotall`` can also be achieved with
+  ``(?s)`` in the pattern.
+
+``dupnames``
+  If true, then the names used for named capturing groups are not
+  required to be unique. By default, names for capturing groups may
+  only be used once.
+
+``extended``
+  If true, then pattern syntax is permitted to contain constructs that
+  serve as self-documentation:
+
+  * Most whitespace is ignored, except when escaped or inside a
+    character class (and a few other exceptions detailed in
+    `pcre2api(3)`_).
+
+  * All characters between an unescaped ``#`` and the next newline are
+    ignored, and can be used as comments.
+
+    For example, this is a self-documenting declaration of a pattern
+    that matches IPv6 addresses::
+
+      new ipv6 = pcre2.regex(extended=true, caseless=true, pattern=
+      {"^(?!:)                 # colon disallowed at start
+        (?:                    # start of item
+          (?: [0-9a-f]{1,4} |  # 1-4 hex digits or
+          (?(1)0 | () ) )      # fail if null previously matched
+          :                    # followed by colon
+        ){1,7}                 # end item; 1-7 of them required
+        [0-9a-f]{1,4} $        # final hex number at end of string
+        (?(1)|.)               # there was an empty component
+      "});
+
+  The effect of ``extended`` can also be achieved with the ``(?x)``
+  option in a pattern.
+
+``firstline``
+  If true, an unanchored pattern must match before or at the first
+  newline in the subject string (though the matched text may continue
+  over a newline). If the ``offset_limit`` option is also set for a
+  match, then the match must occur within the offset limit and in the
+  first line.
+
+``locale``
+  If ``locale`` is set to a string matching a locale that is available
+  on the system on which Varnish is running, then that locale is used
+  for the pattern to determine which characters are letters, digits,
+  upper and lower case, and so forth. Hence this option affects the
+  interpretation of constructs such as ``\w`` and ``\d``, the
+  ``caseless`` option, and so on. This only applies to single-byte
+  characters.
+
+  If ``locale`` is set to a string that is not recognized as a locale,
+  then compilation fails.
+
+  By default, PCRE2 uses tables established when the library is built
+  to recognize character properties; normally, these only recognize
+  ASCII characters.
+
+  Quoting `pcre2api(3)`_:
+
+    The use of locales with Unicode is discouraged.  If you are
+    handling characters with code points greater than 128, you should
+    either use Unicode support, or use locales, but not try to mix the
+    two.
+
+``match_unset_backref``
+  If true, then a back reference to an unset capturing group matches
+  an empty string; thus ``(\1)(a)`` successfully matches ``a``. This
+  makes the pattern similar to an ECMAscript pattern. By default, an
+  unset backref causes the matcher to backtrack, and possibly fail.
+
+``max_pattern_len``
+  If this INT value is greater than 0, then it sets a maximum length
+  for the pattern string to be compiled. If the pattern is longer, then
+  compilation fails.
+
+``multiline``
+  If true, then the ``^`` and ``$`` meta-characters match immediately
+  after and before internal newlines in the subject string, respectively,
+  in addition to matching at the start and end of the string. By default,
+  the start and end anchors only match at the beginning and end of the
+  string, regardless of internal newlines. The effect of ``multiline``
+  can also be achieved with ``(?m)`` in the pattern.
+
+``never_backslash_c``
+  If true, then ``\C`` may not be used in a pattern, and causes
+  compile failure. ``\C`` always matches exactly one byte, even in UTF
+  mode, and may lead to unpredictable effects if it matches in the
+  middle of a multibyte UTF-8 character. ``\C`` may have been
+  prohibited by a build-time option in the library, which can be
+  discovered by calling ``config_bool(NEVER_BACKSLASH_C)``.
+
+``never_ucp``
+  If true, then Unicode properties are not used to interpret ``\B``,
+  ``\b``, ``\D``, ``\d``, ``\S``, ``\s``, ``\W``, ``\w``, and some of
+  the POSIX character classes in the pattern. It is then impossible to
+  activate this facility by including ``(*UCP)`` at the start of the
+  pattern. If ``never_ucp`` and ``ucp`` are both set to true, then
+  the compile fails.
+
+``newline``
+  If this ENUM value is set, it determines which characters are to be
+  matched as newlines in the pattern. It can be set to:
+
+  * ``CR`` (carriage return)
+  * ``LF`` (linefeed)
+  * ``CRLF`` (CR followed by LF)
+  * ``ANYCRLF`` (CR, LF or CRLF)
+  * ``UNICODE`` (any Unicode line-ending sequences)
+
+  By default, the newline sequence chosen for the PCRE2 library when
+  it was built is used, which can be determined from
+  ``config_str(NEWLINE)``.
+
+``no_auto_capture``
+  If true, then numbered capturing groups are disabled in the pattern.
+  Any opening parenthesis not followed by ``?`` is then interpreted as
+  if it were followed by ``?:`` (that is, it forms a non-capturing
+  group).  Named capturing groups can still be used, and these also
+  acquire a capturing group number, so ``namedref`` and ``backref``
+  can still be used (but only for the named groups).
+
+``no_auto_possess``
+  If true, then the "auto-possessification" optimization is disabled
+  for the pattern, which for example interprets ``a+b`` as ``a++b``,
+  using the "possessive quantifier", to prevent backtracks into ``a+``
+  that can never be successful. If the option is true, then the full
+  unoptimized search is run.
+
+``no_start_optimize``
+  If true, then some optimizations for the start of the match are
+  disabled. This has the effect that certain constructs in the
+  pattern, such as ``(*COMMIT)`` or ``(*MARK)``, are evaluated at
+  every possible starting position in the string, while they may have
+  been skipped when the optimizations are applied. Thus this option
+  may change the result of ``match`` calls in patterns that include
+  such constructs. See `pcre2api(3)`_ for details.
+
+``no_utf_check``
+  If this option and ``utf`` are both true, then validity checks to
+  determine if the pattern is a valid UTF string are disabled. This
+  may save CPU usage and time for the ``match()`` and ``sub()``
+  functions, which compile patterns on every invocation, and check UTF
+  strings for validity by default. But you should only do so if you
+  are sure that the inputs are valid, because running matches in UTF
+  mode against invalid strings is undefined, and may cause Varnish to
+  crash or loop.  By default, invalid UTF strings in the pattern cause
+  the compile to fail in UTF mode. See `pcre2unicode(3)`_ for details.
+
+``parens_nest_limit``
+  If this INT value is greater than 0, it sets the maximum depth of
+  parenthesis nesting in a pattern. It applies to all kinds of
+  parentheses, not just captruing groups. The limit prevents patterns
+  from using too much of the stack when compiled, and may be useful
+  for the functional interface, for which patterns are compiled at
+  runtime. By default, the nesting limit set for the PCRE2 library at
+  build time is imposed, which is returned by
+  ``config_int(PARENSLIMIT)``.
+
+``ucp``
+  If this option and ``utf`` are both true, then Unicode properties
+  are used to interpret ``\B``, ``\b``, ``\D``, ``\d``, ``\S``,
+  ``\s``, ``\W``, ``\w``, and some of the POSIX character classes in
+  the pattern. The same effect can be achieved by including ``(*UCP)``
+  at the start of the pattern. By default, only ASCII characters are
+  considered for these constructs, which is faster than considering
+  Unicode properties. If Unicode was disabled at build time for the
+  PCRE2 library, which can be discovered by calling
+  ``config_bool(UNICODE)``, then the compile fails when this option is
+  true. Compiles also fail if this option and ``never_ucp`` are both
+  true. See `pcre2unicode(3)`_ for details about Unicode character
+  properties.
+
+``ungreedy``
+  If true, then the "greediness" of quantifiers in the pattern is
+  inverted, so that they are not greedy by default, but become
+  greedy when followed by ``?``. The same effect can be achieved
+  by including ``(?U)`` in the pattern.
+
+``use_offset_limit``
+  This option must be set to true for a pattern if you intend to use
+  the ``offset_limit`` parameter in match and substitution operations
+  to limit how far a string is searched for an unanchored match. If an
+  ``offset_limit`` is set for an invocation of the ``match`` or
+  ``sub`` methods or functions, but this option was not set to true
+  for the pattern, then then the match fails.
+
+``utf``
+  If true, then both the pattern and the strings against which it is
+  matched are processed as UTF-8 strings. If Unicode support was
+  disabled when the PCRE2 library was built, which can be determined
+  from ``config_bool(UNICODE)``, then the compile fails when ``utf``
+  is true. See `pcre2unicode(3)`_ for details about Unicode support in
+  PCRE2.
+
+Match options
+~~~~~~~~~~~~~
+
+Match options affect the operation of matching in the ``match`` and
+``sub`` methods and functions. By default, all of the BOOL options
+are **false**. The INT options are 0 by default (meaning that they
+are ignored, and the global defaults hold). The INT options MAY NOT
+be less than 0; if they are, then the match fails.
+
+``anchored``
+  If true, then the match is constrained to match at the start of the
+  string, regardless of whether the pattern is anchored. By default, a
+  match is searched for anywhere in the string if the pattern is not
+  anchored.
+
+``len``
+  If this INT value is greater than 0, it sets the length of the
+  subject string to be matched. By default, the full string is matched.
+
+``match_limit``
+  If this INT value is greater than 0, it sets a limit to the effort
+  used by the PCRE2 matching function to find a match. This can
+  prevent matches from excessive backtracking, if there is a very
+  large search space but a match is never found. It is equivalent to
+  the varnishd parameter ``pcre_match_limit``, except that it applies
+  only to the match operation in which it was set, not globally. The
+  varnishd parameters for PCRE have no effect on this VMOD. By
+  default, the match limit is imposed that was set for the PCRE2
+  library at build time, which can discovered from
+  ``config_int(MATCHLIMIT)``.
+
+``not_bol``
+  If true, the first character of the subject is string is not
+  considered to be the beginning of a line, so the ``^`` metacharacter
+  does not match before it. If the compile option ``multiline`` was
+  not set to true for the pattern, then ``^`` never matches. This
+  option only affects the circumflex metacharacter.
+
+``not_eol``
+  If true, the end of the subject string is not considered to be the
+  end of a line, so the ``$`` metacharacter does not match after it.
+  If ``multiline`` was not set to true for the pattern, then ``$``
+  never matches. This option only affects the dollar metacharacter.
+
+``not_empty``
+  If true, then the empty string is not a valid match. If the matcher
+  finds an empty match, then it considers other alternatives, and if
+  no other valid matches are found, then the match fails.
+
+``not_empty_atstart``
+  If true, then the empty string is not a valid match at the start of
+  the subject string. An empty string match later in the subject is
+  permitted.
+
+``no_jit``
+  If true, then the just-in-time matcher is not used, even when the
+  pattern was compiled for JIT. In that case, PCRE2's "traditional"
+  interpretive matcher is used (as is always the case if JIT is not
+  available, or if the pattern was not JIT-compiled). If ``not_jit``
+  is true for an invocation of the ``match()`` or ``sub()`` functions,
+  which compile a pattern on every call, then the pattern is also not
+  JIT-compiled. See `JIT compilation and matching`_ below.
+
+``no_utf_check``
+  If true, then the subject is not checked for validity as a UTF-8
+  string when matched against a pattern for which ``utf`` was set to
+  true. This may speed up matching, but should only be done if you
+  are sure that the inputs are valid UTF-8. By default, UTF validity
+  is checked for matches against patterns that were compiled with
+  ``utf``.
+
+``offset_limit``
+  If this INT value is greater than 0, it limits how far an unanchored
+  search can advance in the subject string. For example, if the
+  pattern ``abc`` is matched against the string ``"123abc"`` and the
+  offset limit is less than 3, the match fails. To use this parameter,
+  the compile option ``use_offset_limit`` must have been set to true
+  for the pattern at compile time; otherwise the match fails. By
+  default, unanchored matches are searched for until the end of the
+  string.
+
+``recursion_limit``
+  If this INT value is greater than 0, then it limits the depth of
+  recursion for matches using the interpretive matcher. It is
+  equivalent to the varnishd parameter ``pcre_match_limit_recursion``,
+  but only applies to the individual match. This limits the depth of
+  recursion and use of the stack for matches that may cause excessive
+  recursion and stack overflow (which usually causes Varnish to
+  crash). The limit is not relevant to the JIT matcher, and is ignored
+  for JIT matching. By default, the recursion limit set for the PCRE2
+  library at build time applies, which can be determined from
+  ``config_int(RECURSIONLIMIT)``.
+
+Substitution options
+~~~~~~~~~~~~~~~~~~~~
+
+The ``sub`` method and function use all of the match options (since
+they run a match), and the following additional options. (The ``sub``
+function also uses the compile options, since it compiles a pattern.)
+
+``suball``
+  If true, then the substitution iterates over the subject string and
+  replaces every matching substring, making the substitution similar
+  to the native VCL ``regsuball`` function. By default, only the first
+  matching substring is replaced, making the substitution similar to
+  VCL's ``regsub`` function.
+
+``sub_extended``
+  If true, then an extended syntax is enabled for the replacement
+  string. Details of the replacement syntax are documented for the
+  ``.sub()`` method below.
+
+``unknown_unset``
+  If true, then references to capturing groups in the replacement
+  string that do not appear in the pattern are treated as unset
+  groups.  By default, unknown references cause the substitution to
+  fail. Use this option with care, because it causes misspelled group
+  names or numbers to be silently ignored.
+
+``unset_empty``
+  If true, then unset capturing groups (including unknown groups when
+  ``unknown_unset`` is also true) are replaced as empty strings. By
+  default, an attempt to insert an unset group causes the substitution
+  to fail.
+
+JIT compilation and matching
+----------------------------
+
+PCRE2 supports just-in-time compilation for patterns, and a matcher to
+go with it. JIT is a heavyweight optimization that may greatly speed
+up matching, but requires extra processing at pattern compilation
+time.  The VMOD supports JIT if it was enabled for the PCRE2 library
+when it was built, which can be determined from ``config_bool(JIT)``.
+
+If JIT is available, then it is always applied to the compilation of
+patterns in the ``regex`` object constructor. By default it is also
+applied when patterns are compiled at runtime in the ``match`` and
+``sub`` methods and functions, unless the ``no_jit`` option is true.
+For patterns compiled at runtime, it may be worth it to turn off JIT,
+if the overhead for JIT-compiles outweighs the advantage of JIT
+matching.
+
+If JIT is not available, then PCRE2 always uses the interpretive
+matcher.
+
+Unicode
+-------
+
+The VMOD only links to the 8-bit version of PCRE2, and hence can
+support UTF-8 if Unicode was enabled when the library was built. The
+VMOD does not support UTF-16 or UTF-32. Thus the term "code unit", as
+used for Unicode and in the PCRE2 documentation, always refers to one
+byte.
+
+In UTF mode, characters in patterns and the strings to be matched are
+interpreted as UTF-8 code points, and hence may correspond to one to
+four bytes. When UTF is not enabled, characters in patterns and
+strings are represented by exactly one byte.
+
+See `pcre2unicode(3)`_ for the details of PCRE2 Unicode support.

 $Object regex(STRING pattern, BOOL allow_empty_class=0, BOOL anchored=0,
 	      ENUM {ANYCRLF, UNICODE} bsr=0, BOOL alt_bsux=0,
@@ -30,19 +619,159 @@ $Object regex(STRING pattern, BOOL allow_empty_class=0, BOOL anchored=0,
 	      BOOL no_utf_check=0, INT parens_nest_limit=0, BOOL ucp=0,
 	      BOOL ungreedy=0, BOOL use_offset_limit=0, BOOL utf=0)

-# XXX options for dfa_match, jit fast path, start_offset
-# XXX option to make saving the match ctx with PRIV_CALL optional
+Create a ``regex`` object from ``pattern`` according to the given
+compile options (or option defaults). If the pattern is invalid, then
+the VCL will fail to load, and the VCC compiler will emit an error
+message.
+
+Examples::
+
+  sub vcl_init {
+
+      # Match this pattern against the Host header (hence
+      # case-insensitively), and capture part of the domain name.
+      new domain = pcre2.regex("^www\.([^.]+)\.com$", caseless=true);
+
+      # Match a max-age tag and capture the number.
+      new maxage = pcre2.regex("max-age\s*=\s*(\d+)");
+
+      # Group possible subdomains without capturing
+      new submatcher = pcre2.regex("^www\.(domain1|domain2)\.com$",
+	                           never_capture=true, caseless=true);
+  }
+
 $Method BOOL .match(PRIV_CALL, PRIV_TASK, STRING subject, INT len=0,
                    BOOL anchored=0, INT match_limit=0, INT offset_limit=0,
 		    BOOL notbol=0, BOOL noteol=0, BOOL notempty=0,
 		    BOOL notempty_atstart=0, BOOL no_jit=0,
 		    BOOL no_utf_check=0, INT recursion_limit=0)

+Return ``true`` if the compiled regex matches the ``subject`` string,
+as constrained by the given match options or option defaults.
+
+The match may fail if any of the options are illegal for one of the
+reasons given above, or if a limit such as the match or recursion
+limit is reached. In that case, and error message is written to the
+Varnish log using the ``VCL_Error`` tag, and the method returns
+``false``.
+
+If ``subject`` is undefined, for example if it is set from an unset
+header variable, then it is assumed to be the empty string. This
+follows VCL's handling of regex matching when the string to be matched
+is unset.
+
+Example::
+
+  if (domain.match(req.http.Host)) {
+     call do_on_match;
+  }
+
 $Method STRING .backref(INT ref, STRING fallback = "**BACKREF METHOD FAILED**")

+Returns the `nth` captured subexpression from the most recent
+successful call of the ``.match()`` method for this object in the same
+client or backend context, or a fallback string in case the capture
+fails. Backref 0 indicates the entire matched string. Thus this
+function behaves like the ``\n`` in the native VCL functions
+``regsub`` and ``regsuball``, and the ``$1``, ``$2`` ... variables in
+Perl. Unlike the regsubs, which limit the backref number to 0 through
+9, ``backref`` permits any number that identifies a capturing group in
+the pattern.
+
+Since Varnish client and backend operations run in different threads,
+``.backref()`` can only refer back to a ``.match()`` call in the same
+thread. Thus a ``.backref()`` call in any of the ``vcl_backend_*``
+subroutines -- the backend context -- refers back to a previous
+``.match()`` in any of those same subroutines; and a call in any of
+the other VCL subroutines -- the client context -- refers back to a
+``.match()`` in the same client context.
+
+After unsuccessful matches, the ``fallback`` string is returned for
+any call to ``.backref()``. The default value of ``fallback`` is
+``"**BACKREF METHOD FAILED**"``. ``.backref()`` always fails after a
+failed match, even if ``.match()`` had been called successfully before
+the failure.
+
+``.backref()`` may also return ``fallback`` after a successful match,
+if no captured group in the matching string corresponds to the backref
+number. For example, when the pattern ``(a|(b))c`` matches the string
+``ac``, there is no backref 2, since nothing matches ``b`` in the
+string.
+
+The VCL infix operators ``~`` and ``!~`` do not affect this method,
+nor do the functions ``regsub`` or ``regsuball``. Nor is it affected
+by the matches performed by any other method or function in this VMOD,
+(the ``match()`` function or the ``sub`` method or function).
+
+``.backref()`` fails, returning ``fallback`` and writing an error
+message to the Varnish log with the ``VCL_Error`` tag, under the
+following conditions (even if a previous match was successful and a
+substring could have been captured):
+
+* Any of the match options are illegal (for example, if one of the
+  numeric limits was set to less than 0).
+
+* The ``fallback`` string is undefined.
+
+* ``ref`` (the backref number) is out of range -- if it is less than 0
+  or larger than the highest number for a capturing group in the
+  pattern.
+
+* ``.match()`` was never called for this object in the task scope
+  prior to calling ``.backref()``.
+
+Example::
+
+  if (domain.match(req.http.Host)) {
+     set req.http.X-Domain = domain.backref(1);
+  }
+
 $Method STRING .namedref(STRING name,
                         STRING fallback = "**NAMEDREF METHOD FAILED**")

+Returns the captured subexpression designated by ``name`` from the
+most recent successful call to ``.match()`` in the current context
+(client or backend), or ``fallback`` in case of failure. See
+`pcre2pattern(3)`_ for details about the use of named subpatterns in
+PCRE2 regexen.
+
+Note that a named capturing group can also be referenced as a numbered
+group -- the named groups are numbered exactly as if the names were
+not present. So an expression returned by ``.namedref()`` will also be
+returned by ``.backref()`` with the appropriate number.
+
+``fallback`` is returned when ``.namedref()`` is called after an
+unsuccessful match. The default fallback is ``"**NAMEDREF METHOD
+FAILED**"``.
+
+Like ``.backref()``, ``.namedref()`` is not affected by native VCL
+regex operations, nor by any other matches performed by methods or
+functions of the VMOD, except for a prior ``.match()`` for the same
+object.
+
+``.namedref()`` fails, returning ``fallback`` and logging a
+``VCL_Error`` message, if:
+
+* The ``fallback`` string is undefined.
+
+* ``name`` is undefined.
+
+* There is no such named group.
+
+* ``.match()`` was not called for this object.
+
+Example::
+
+  sub vcl_init {
+  	new domain = pcre2.regex("^www\.(?<domain>[^.]+)\.com$");
+  }
+  
+  sub vcl_recv {
+  	if (domain.match(req.http.Host)) {
+  	   set req.http.X-Domain = domain.namedref("domain");
+	}
+  }
+
 $Method STRING .sub(PRIV_CALL, PRIV_TASK, STRING subject, STRING replacement,
 		    INT len=0, BOOL anchored=0, INT match_limit=0,
 		    INT offset_limit=0, BOOL notbol=0, BOOL noteol=0,
@@ -51,6 +780,113 @@ $Method STRING .sub(PRIV_CALL, PRIV_TASK, STRING subject, STRING replacement,
 		    BOOL sub_extended=0, BOOL unknown_unset=0,
 		    BOOL unset_empty=0)

+If the pattern represented by this object matches ``subject``, then
+return a string formed by replacing the part that was matched by
+``replacement``.  If the pattern does not match, then return the
+``subject`` string unchanged. The match and substitution options affect
+these operations as described above.
+
+This method is similar to the native VCL ``regsub`` function, or
+``regsuball`` when the ``suball`` option is true, but the syntax of
+the replacement string is different. In the replacement string, these
+sequences can be used to insert strings:
+
+``$$``
+  Inserts a dollar character.
+
+``$<n>`` of ``${<n>}``
+  Inserts the contents of group ``<n>`` captured during the match,
+  where ``<n>`` can be a number or a name. The number can be 0 to
+  include the entire matched string. Braces are only required if the
+  following character would be interpreted as part of the number or
+  name.
+
+``$*MARK`` or ``${*MARK}``
+  Insert the name of the last ``(*MARK)`` encountered in the match.
+
+For example, to rewrite URLs with prefixes of the form ``"/~<user>"``
+so that their prefix is ``"/u/<user>"`` (and leave other URLs
+unchanged)::
+
+  sub vcl_init {
+      new user = pcre2.regex("/~([^/]+)(.*)", anchored=true);
+  }
+  
+  sub vcl_recv {
+      set req.url = user.sub(req.url, "/u/${1}${2}");
+  }
+
+When the ``sub_extended`` option is false, only the dollar character
+is special in the replacement string. When ``sub_extended`` is true,
+the replacement syntax also has these capabilites:
+
+* Backslashes in the replacement string are interpreted as escapes,
+  and special backslash sequences are interpreted as for PCRE2
+  patterns.  For example, ``\n`` denotes newline, and ``\x{ddd}``,
+  where each ``d`` is a digit, specifies a character code. A backslash
+  followed by a non-alphanumeric character quotes the character, and
+  ``\Q`` and ``\E`` can be used to quote a longer sequence.
+
+* Four additional escape sequences can be used to force the case of
+  inserted letters:
+
+  * ``\U`` forces upper case for all of the following text until
+    ``\E``, or to the end of the string if there is no ``\E``.
+
+  * ``\L`` through ``\E`` or end of string forces lower case.
+
+  * ``\u`` and ``\l`` force the next character, if it is a letter, to
+    upper and lower case, respectively.
+
+  Case forcing applies to all inserted characters, including those from
+  captured groups and in sequences quoted by ``\Q`` through ``\E``.
+
+  Sequences ending in ``\E`` do not nest. So for example,
+  ``"\Uaa\LBB\Ecc\E"`` results in ``"AAbbcc"``, and the final ``\E`` has
+  no effect.
+
+* The "dollar" replacement expressions have an additional capability
+  inspired by Bash to handle unset capturing groups:
+
+  ``${<n>:-<string>}``
+    As with ``${<n>}``, ``<n>`` can be a number or name. If group
+    ``<n>`` is set, then its contents are inserted, otherwise
+    ``<string>`` is expanded and inserted. ``<string>`` may, in turn,
+    include elements of the replacement syntax that are interpreted
+    accordingly.
+
+  ``${<n>:+<string1>:<string2}``
+    If group ``<n>`` is set, insert the result of expanding
+    ``<string1>``, otherwise insert the result of expanding
+    ``<string2>``.
+
+  Colons and escapes in the replacement strings can be escaped with
+  backslashes.
+
+For example, to rewrite Host headers of the form
+``www.<sub1>.<sub2>.<tld>`` to ``<sub2>.<tld>``, and of the form
+``www.<sub>.<tld>`` to ``<sub>.<tld>``, while also normalizing the header
+to lower-case, and leaving other Host headers unchanged::
+
+  sub vcl_init {
+      new hostsub = pcre2.regex(extended=true, pattern={"
+                    "^www\.		# www. prefix
+                    ([^.]+)		# group 1, "<sub1>"
+                    (?:			# non-capturing parentheses
+                      \.([^.]+)		# dot, then group 2, "<sub2>"
+                    )?			# 0 or 1 of group 2
+                    \.([^.]+)$		# dot, then group 3, "<tld>"
+                    "});
+  }
+
+  sub vcl_recv {
+      set req.http.Host = hostsub.sub(req.http.Host, sub_extended=true,
+                                      replacement="\L${2:+$2:$1}.$3");
+  }
+
+``.sub()`` fails, returning NULL while logging a ``VCL_Error`` message,
+if ``replacement`` is undefined.
+
 $Method BOOL .info_bool(ENUM {ALLOW_EMPTY_CLASS, ANCHORED, ALT_BSUX,
 			      ALT_CIRCUMFLEX, ALT_VERBNAMES, CASELESS,
 			      DOLLAR_ENDONLY, DOTALL, DUPNAMES, EXTENDED,
@@ -63,12 +899,231 @@ $Method BOOL .info_bool(ENUM {ALLOW_EMPTY_CLASS, ANCHORED, ALT_BSUX,
 			      HAS_LASTCODEUNIT, HAS_BACKSLASHC, HAS_CRORLF,
 			      JCHANGED, MATCH_EMPTY}, BOOL compiled=1)

+Return true or false about a property of the regex that the object
+represents.  This method and the other ``.info_*`` methods may be
+helpful for debugging and optimizing regular expression matching, for
+example by determining whether PCRE2 could enable certain
+optimizations for the pattern.
+
+The ENUM determines which property is to be inspected. If the ENUM is any
+one of::
+
+  ALLOW_EMPTY_CLASS, ANCHORED, ALT_BSUX, ALT_CIRCUMFLEX,
+  ALT_VERBNAMES, CASELESS, DOLLAR_ENDONLY, DOTALL, DUPNAMES, EXTENDED,
+  FIRSTLINE, MATCH_UNSET_BACKREF, MULTILINE, NEVER_BACKSLASH_C,
+  NEVER_UCP, NEVER_UTF, NO_AUTO_CAPTURE, NO_AUTO_POSSESS,
+  NO_DOTSTAR_ANCHOR, NO_START_OPTIMIZE, NO_UTF_CHECK, UCP, UNGREEDY,
+  USE_OFFSET_LIMIT, UTF
+
+then the return value of ``info_bool()`` indicates whether the
+corresponding compile option is true for the pattern. If ``compiled``
+is true, then the return indicates whether the option was set to true
+after the pattern was compiled, even if it was specified differently
+(or left to the default) in the object constructor.  If ``compiled``
+is false, then the method returns the value of the option as it was
+provided in the constructor.
+
+For example, if the compile option ``anchored`` was set to false in
+the constructor (or left to the default), PCRE2 may nevertheless
+determine that the pattern is anchored if certain conditions are
+satisfied (which are described in detail in `pcre2api(3)`_). In that
+case, ``info_bool()`` will return true if ``compiled`` is true, and
+false if ``compiled`` is false.
+
+``compiled`` is true by default, and is ignored for the other ENUM
+values.
+
+The other ENUMs are interpreted as follows:
+
+``HAS_FIRSTCODEUNIT``
+  If the pattern is unanchored, PCRE2 may determine that there is a
+  unique code unit (a byte) that must appear at the start of the
+  matching part of a string. For example, the part of a string that
+  matches ``(cat|cow|coyote)`` must begin with a
+  ``c``. ``info_bool(HAS_FIRSTCODEUNIT)`` returns true if there is
+  such a code unit, and false if the pattern is anchored or if no
+  unique first code unit could be determined. If there is such a first
+  code unit, it is returned by ``info_str(FIRSTCODEUNIT)``. Note that
+  in non-UTF mode, the first code unit is the same as the first
+  character, but for UTF-8 patterns, it may be the first byte in a
+  multibyte character.
+
+``MATCH_ATSTART``
+  If the pattern is unanchored and no unique first code unit in the
+  matching part of the string is known, PCRE2 may determine that the
+  pattern is constrained to match at the start of the subject string,
+  or following a newline in the subject. In that case,
+  ``info_bool(MATCH_ATSTART)`` returns true; it returns false if the
+  pattern is anchored, if a unique first code unit could be found, or
+  if the pattern could not be determined to match at the start.
+
+``HAS_LASTCODEUNIT``
+  Under certain circumstances, PCRE2 may determine a rightmost literal
+  code unit that must exist in a matching string, other than at the
+  start. This is not necessarily the last byte in the matching part of
+  a string, but rather the last literal code unit known to be
+  required. For example, the ``b`` is recorded for this purpose for
+  the pattern ``ab\d+``, although the ``b`` must be followed by
+  digits. In there is such a last code unit,
+  ``info_bool(HAS_LASTCODEUNIT)`` returns true, and that value can be
+  retrieved from ``info_str(LASTCODEUNIT)``. For anchored patterns,
+  PCRE2 records a possible last literal code unit only if a part of
+  the pattern that comes before it has variable length. For example,
+  ``z`` is recorded for ``^a\d+z\d+`` (because one or more digits must
+  come before it), but none is recorded for ``^a\dz\d`` (because
+  matching strings have a fixed length). As with the first code unit,
+  the last code unit may be a byte in a multibyte UTF-8 character, if
+  UTF is enabled for the pattern.
+
+``HAS_BACKSLASHC``
+  Return true if and only if ``\C`` appears in the pattern.
+
+``HAS_CRORLF``
+  Return true if and only if the pattern contains explicit matches for
+  CR or LF characters. These can be literal carriage returns or
+  linefeeds in the pattern, or the escape sequences ``\r`` or ``\n``.
+
+``JCHANGED``
+  Return true if and only if the pattern contains ``(?J)`` or ``(?-J)``
+  to enable or disable JIT-matching.
+
+``MATCH_EMPTY``
+  Return true if and only if PCRE2 determines that the pattern might
+  match the empty string. For certain complex patterns (with recursive
+  subroutines), it may not be possible to determine; in that case,
+  PCRE2 cautiously returns true.
+
+Example::
+
+  # To determine if the FIRSTCODEUNIT optimization could be applied.
+  if (myregex.info_bool(HAS_FIRSTCODEUNIT)) {
+      std.log("First matching char in the pattern = "
+              + myregex.info_str(FIRSTCODEUNIT));
+  }
+
 $Method INT .info_int(ENUM {BACKREFMAX, CAPTURECOUNT, JITSIZE, MATCHLIMIT,
                            MAXLOOKBEHIND, MINLENGTH, RECURSIONLIMIT, SIZE})

+Return an integer that describes a property of the pattern that the
+object represents, as determined by the ENUM.
+
+``BACKREFMAX``
+  Return the highest back reference within the pattern. Remember that
+  named groups also acquire group numbers, and thus count towards the
+  highest backref. A conditional subpattern such as ``(?(3)a|b)``,
+  which checks if a capturing group is set, also counts as a
+  backref. If there are no backrefs, return 0.
+
+``CAPTURECOUNT``
+  Return the highest capturing group number in the pattern. If the
+  ``(?|`` construct (which allows duplicate group numbers, see
+  `pcre2pattern(3)`_) is not used in the pattern, then the value
+  returned is also the total number of capturing groups.
+
+``JITSIZE``
+  Return the size of JIT-compiled code for the pattern. Returns 0 if
+  the pattern was not JIT-compiled.
+
+``MATCHLIMIT``
+  If the pattern contains the construct ``(*LIMIT_MATCH=nnnn)`` to set
+  the match limit (see the match option ``match_limit`` above), then
+  return the limit that it sets. Returns -1 if no such value has been
+  set.
+
+``MAXLOOKBEHIND``
+  Return the number of characters in the longest lookbehind assertion
+  in the pattern. Returns 0 if there are no lookbehinds.
+
+``MINLENGTH``
+  If PCRE2 has determined that there is a lower bound for the length
+  of a string that may match the pattern, then return that
+  value. Returns 0 if no lower bound is known. This is not necessarily
+  the same as the shortest string that may possibly match; but any
+  string that does match must be at least that long.
+
+``RECURSIONLIMIT``
+  If the pattern contains the construct ``(*LIMIT_RECUSRION=nnnn)``
+  (see the match option ``recursion_limit`` above), then return the
+  value that was set. Returns -1 if no such value has been set.
+
+``SIZE``
+  Return the size of the compiled pattern, as used for the
+  interpretive matcher, in bytes. This is independent of the value
+  returned by ``info_int(JITSIZE)``.
+
+Example::
+
+  # To determine if a lower bound on the length of matching strings
+  # could be found.
+  if (myregex.info_int(MINLENGTH) != 0) {
+      std.log("Lower bound on matching string length = "
+               myregex.info_int(MINLENGTH));
+  }
+  else {
+      std.log("No lower bound for matching string lengths found");
+  }
+
 $Method STRING .info_str(ENUM {BSR, FIRSTCODEUNIT, FIRSTCODEUNITS, LASTCODEUNIT,
 			       NEWLINE}, STRING sep=" ")

+Return a string that describes a property of the pattern represented
+by the object, as determined by the ENUM. The ``sep`` parameter is
+only relevant when the ENUM ``FIRSTCODEUNITS`` is used, as described
+below.
+
+``BSR``
+  Return ``"UNICODE"``, meaning that ``\R`` in the pattern matches any
+  Unicode line ending sequence, or ``"ANYCRLF"``, meaning that it
+  matches only CR, LF or CRLF.
+
+``FIRSTCODEUNIT``
+  If PCRE2 determines that there is a unique first code unit that must
+  begin the matching part of a string (as described above for
+  ``info_bool(HAS_FIRSTCODEUNIT)``), then return that code unit in a
+  string.  Returns the empty string if no such code unit was
+  determined; this is also the case if the pattern is anchored. Recall
+  that a code unit corresponds to a character in non-UTF mode, but may
+  be a byte in a multibyte character when UTF-8 is enabled. The code
+  unit is not escaped in the return string.
+
+``FIRSTCODEUNITS``
+  (Note the difference between ``FIRSTCODEUNIT``, singular, and
+  ``FIRSTCODEUNITS``, plural.) For an unanchored pattern, if PCRE2
+  cannot determine a unique code unit that must appear at the start of
+  the matching part of a string, it may be able to determine a set of
+  such code units. For example, if the pattern starts with ``[abc]``,
+  then the matching part must begin with ``a``, ``b`` or ``c``. In
+  that case, ``info_str(FIRSTCODEUNITS)`` returns those code units in
+  a string, separated by the string given as ``sep``. The default
+  value of ``sep`` is ``" "`` (the string containing one space). If
+  the pattern is anchored, or if a unique first code unit could be
+  found, or if no set of first code units could be found, then return
+  the empty string.
+
+``LASTCODEUNIT``
+  If PCRE2 has recorded a rightmost literal code unit that must exist
+  in a matching string, as described for ``info_bool(HAS_LASTCODEUNIT)``
+  above, then return that code unit in a string. Returns the empty
+  string if no such code unit was recorded.
+
+``NEWLINE``
+  Return a string describing the default sequence recognized as a
+  "newline" for the pattern:
+
+  * ``"CR"`` (carriage return)
+  * ``"LF"`` (linefeed)
+  * ``"CRLF"`` (CR followed by LF)
+  * ``"ANYCRLF"`` (CR, LF or CRLF)
+  * ``"UNICODE"`` (any Unicode line-ending sequence)
+
+Example::
+
+  # Determine if a set of first matching characters could be found.
+  std.log("First matching chars: " + myregex.info_str(FIRSTCODEUNITS));
+
+Regex functional interface
+--------------------------
+
 $Function BOOL match(PRIV_CALL, PRIV_TASK, STRING pattern, STRING subject,
 		     BOOL allow_empty_class=0, BOOL anchored=0,
 		     ENUM {ANYCRLF, UNICODE} bsr=0, BOOL alt_bsux=0,
@@ -89,12 +1144,87 @@ $Function BOOL match(PRIV_CALL, PRIV_TASK, STRING pattern, STRING subject,
 		     BOOL notempty_atstart=0, BOOL no_jit=0,
 		     INT recursion_limit=0)

+Compile the ``pattern`` and return true if it matches
+``subject``. Compilation and matching are subject to the given
+options, or default options. The compiled pattern is discarded after
+use, and ``pattern`` is compiled on every invocation.
+
+The call fails, logging an ``VCL_Error`` message and returning false,
+if:
+
+* ``pattern`` is undefined.
+
+* The compile fails (for example due to a syntax error).
+
+* Any compile or match option is illegal as described above.
+
+As with the ``.match()`` method, if ``subject`` is undefined, then it
+is assumed to be the empty string.
+  
+Example::
+
+  # Match a request header against a pattern provided in a response
+  # header.
+  if (pcre2.match(resp.http.X-Pattern, req.http.X-Subject)) {
+      call do_on_match;
+  }
+
 $Function STRING backref(PRIV_TASK, INT ref,
                         STRING fallback = "**BACKREF FUNCTION FAILED**")

+Return the `nth` captured subexpression from the most recent
+successful call of the ``match()`` function in the current client or
+backend context, or a fallback string if the capture fails. The
+default ``fallback`` is ``"**BACKREF FUNCTION FAILED**"``.
+
+As with the ``regex.backref()`` method, ``fallback`` is returned
+after any failed invocation of the ``match()`` function, or if there
+is no captured group corresponding to the backref number. The function
+is not affected by native VCL regex operations, or any other method or
+function of the VMOD except for the ``match()`` function.
+
+The function fails, returning ``fallback`` and logging a ``VCL_Error``
+message, under the same conditions as the corresponding method:
+
+* ``fallback`` is undefined.
+* ``ref`` is out of range.
+* The ``match()`` function was never called in this context.
+* The pattern failed to compile for the previous ``match()`` call.
+
+Example::
+
+  # Match against a pattern provided in a response header, and capture
+  # subexpression 1.
+  if (pcre2.match(resp.http.X-Pattern, req.http.X-Subject)) {
+     set resp.http.X-Group-1 = pcre2.backref(1);
+  }
+
 $Function STRING namedref(PRIV_TASK, STRING name,
 			  STRING fallback = "**NAMEDREF FUNCTION FAILED**")

+Return the captured subexpression designated by ``name`` from the most
+recent successful call of the ``match()`` function in the current
+context, or ``fallback`` in case of failure. The default fallback is
+``"**NAMEDREF FUNCTION FAILED**"``.
+
+The function returns ``fallback`` when the previous invocation of the
+``match()`` function failed, and is only affected by use of the
+``match()`` function. The function fails, returning ``fallback`` and
+logging a ``VCL_Error`` message, under the same conditions as the
+corresponding method:
+
+* ``fallback`` is undefined.
+* ``name`` is undefined or the empty string.
+* There is no such named group.
+* ``match()`` was not called in this context.
+* The pattern failed to compile for the previous ``match()`` call.
+
+Example::
+
+  if (pcre2.match(resp.http.X-Pattern-With-Names, req.http.X-Subject)) {
+     set resp.http.X-Group-Foo = pcre2.namedref("foo");
+  }
+
 $Function STRING sub(PRIV_CALL, PRIV_TASK, STRING pattern, STRING subject,
 		     STRING replacement, BOOL allow_empty_class=0,
 		     BOOL anchored=0, ENUM {ANYCRLF, UNICODE} bsr=0,
@@ -116,42 +1246,239 @@ $Function STRING sub(PRIV_CALL, PRIV_TASK, STRING pattern, STRING subject,
 		     INT recursion_limit=0, BOOL suball=0, BOOL sub_extended=0,
 		     BOOL unknown_unset=0, BOOL unset_empty=0)

+Compile ``pattern``, and if it matches ``subject``, then return a string
+formed by replacing the part that matched by ``replacement``. If the
+pattern does not match, return ``subject`` unchanged. The compile, match
+and substitution options affect all of these operations, as described
+above.
+
+The syntax of the ``replacement`` string, as modified if the
+``sub_extended`` option is true, is the same as documented above for
+the ``.sub()`` method.
+
+``sub()`` fails, returning NULL and logging a ``VCL_Error`` message,
+if:
+
+* Either of ``pattern`` or ``replacement`` is undefined.
+
+* ``pattern`` cannot be compiled.
+
+Example::
+
+  # If the beresp header X-Sub-Letters contains "b+", and Host contains
+  # "www.yabba.dabba.doo.com", then set X-Yada to
+  # "www.yada.dabba.doo.com".
+  set beresp.http.X-Yada = re2.sub(beresp.http.X-Sub-Letters,
+                                   bereq.http.Host, "d");
+
+Library configuration
+---------------------
+
 $Function BOOL config_bool(ENUM {JIT, STACKRECURSE, UNICODE})

+Return true or false about a property of the PCRE2 library to which
+the VMOD is linked, identified by the ENUM. The ``config_*`` functions
+make it possible to discover features of the library that were chosen
+when it was built.
+
+``JIT``
+  Return true if the library supports just-in-time compilation and
+  matching.
+
+``STACKRECURSE``
+  Return true if internal recursion for the PCRE2 matcher uses the
+  system stack to maintain its state, which is the usual way the
+  library is built. If false is returned, PCRE2 uses blocks of data on
+  the heap rather than recursive function calls.
+
+``UNICODE``
+  Return true if Unicode support is available. If so, then the compile
+  option ``utf`` can be used to define a pattern and the strings
+  against which it is matched as UTF-8 strings.
+
+Example::
+
+  if (pcre2.config_bool(JIT)) {
+      std.log("JIT supported for PCRE2");
+  }
+  else {
+      std.log("JIT not supported for PCRE2");
+  }
+
 $Function STRING config_str(ENUM {BSR, JITTARGET, NEWLINE, UNICODE_VERSION,
 				  VERSION})

+Return a string describing a property of the PCRE2 library.
+
+``BSR``
+  Return a string indicating what the ``\R`` escape sequence matches
+  by default: ``UNICODE`` for Unicode line-ending sequences, or
+  ``ANYCRLF`` for only CR, LF and CRLF. This is the default that holds
+  if no value is given for the compile option ``bsr``.
+
+``JITTARGET``
+  Return a string identifying the architecture for which the JIT
+  compiler is configured. If JIT is not enabled, the returned string
+  contains the phrase ``"JIT not supported"``.
+
+``NEWLINE``
+  Return a string identifying the character sequence that is recognized
+  by default as a newline:
+
+  * ``"CR"`` (carriage return)
+  * ``"LF"`` (linefeed)
+  * ``"CRLF"`` (CR followed by LF)
+  * ``"ANY"`` (any Unicode line ending)
+  * ``"ANYCRLF"`` (any of CR, LF or CRLF)
+
+  This is the default if no value is given for the compile option
+  ``newline``.
+
+``UNICODE_VERSION``
+  If Unicode is supported by the library, return the Unicode version
+  string. If not, return ``"Unicode not supported"``.
+
+``VERSION``
+  Return the PCRE2 version string.
+
+Example::
+
+  std.log("Linked to PCRE2 version " + pcre2.config_str(VERSION));
+
 $Function INT config_int(ENUM {LINKSIZE, MATCHLIMIT, PARENSLIMIT,
                               RECURSIONLIMIT})

+Return an integer describing a property of the PCRE2 library.
+
+``LINKSIZE``
+  Return the number of bytes used for internal linkage (offsets) in
+  compiled regular expressions. This determines the size of the
+  largest possible pattern; the default link size of 2 allows for
+  patterns of up to 64K bytes.
+
+``MATCHLIMIT``
+  Return the default value for the ``match_limit`` compile option,
+  which limits the effort of the matcher when no match is found.
+
+``PARENSLIMIT``
+  Return the default value of the ``parens_nest_limit`` compile
+  option, which limits the depth of parenthesis nesting in patterns,
+  and hence the use of the stack during compilation.
+
+``RECURSIONLIMIT``
+  Return the default value of the ``recursion_limit`` compile option,
+  which limits the depth of recursion, and hence stack usage, for the
+  the interpretive (non-JIT) matcher.
+
+Example::
+
+  std.log("Default PCRE2 match limit = " + config_int(MATCHLIMIT));
+
 $Function STRING version()

 Returns the version string for this VMOD.

 Example::

-        std.log("Using VMOD pcre2 version " + pcre2.version());
+  std.log("Using VMOD pcre2 version " + pcre2.version());

 REQUIREMENTS
 ============

-This VMOD requires Varnish ...
+This VMOD has been tested with Varnish version 5.1.2 and PCRE2 version
+10.23.
+
+INSTALLATION
+============
+
+See `INSTALL.rst <INSTALL.rst>`_ in the source repository.

 LIMITATIONS
 ===========

-...
+The VMOD allocates Varnish workspace for a variety of purposes:

-INSTALLATION
-============
+* The string returned by the ``sub`` method and function.

-See `INSTALL.rst <INSTALL.rst>`_ in the source repository.
+* Buffers for temporary data structures needed by the PCRE2 library,
+  for example to save information about a match for use by the
+  ``backref`` and ``namedref`` methods and functions.
+
+* A copy of the subject string for the ``match`` method and function,
+  if it is not already in workspace, so that it can be safely accessed
+  by ``backref`` and ``namedref``.
+
+* Return strings for some uses of ``info_str`` and ``config_str``.
+
+* Temporary buffers for error message strings from the PCRE2 library.
+
+If VMOD operations fail with the "out of space" error message in the
+Varnish log (with the ``VCL_Error`` tag), increase the varnishd runtime
+parameters ``workspace_client`` and/or ``workspace_backend``.
+
+The PCRE2 interpretive and JIT matchers are backtracking matchers, and
+the interpretive matcher is recursive, using part of the stack on each
+recursive call (in the default library configuration). For patterns
+with large search spaces, this can lead to slow matches, high CPU
+usage, and stack overflow due to deep recursion, which typically
+causes Varnish to segfault. This has occasionally been the subject of
+issues reported to the Varnish project.
+
+For most common uses of regular expressions in VCL, PCRE2 is very fast
+and has minimal resource consumption. This depends strongly on how the
+regex is written -- a well-crafted pattern helps the matcher limit
+backtracking, fail early on non-matches, and make use of some the
+optimizations that PCRE2 can apply. Some of the compile and match
+options also help to optimize the match operation. Which of these
+measures is possible depends, of course, on what you want the regex to
+do.
+
+Writing optimized regexen is a very broad subject, beyond the scope of
+this manual. There is some advice in `pcre2perform(3)`_, and in many
+other sources.
+
+If your use case requires patterns and subject strings that can lead
+to very large search spaces, consider using some of the options
+available in the VMOD that limit excessive effort for unsuccessful
+matches. In particular, consider lowering the match options
+``match_limit`` and ``recursion_limit``. You can also use
+``offset_limit`` to set a maximum length to search for a match in the
+subject string (for which you will have to set the compile option
+``use_offset_limit``). These may cause the matcher to halt before it
+has exhausted all possibilities for a match (but it appears to be
+common that, if the matcher has to search for a long time, then there
+was never any match to be found).
+
+If you encounter stack overflow, it may help to increase the stack
+size (by changing ``limits.conf`` or calling ``ulimit -s`` before
+starting Varnish). Since Varnish 4.1, you can also increase the
+varnishd parameter ``thread_pool_stack``. Bear in mind that this
+increases the total RAM usage of Varnish.
+
+ACKNOWLEDGEMENTS
+================
+
+A tip of the hat to Philip Hazel, who released the first version of
+PCRE twenty years before this VMOD was developed.
+
+A few sentences in this manual are identical to or very closely track
+phrasings in the PCRE2 documentation, if there was simply no better
+way to say what needs to be said.

 SEE ALSO
 ========

 * varnishd(1)
 * vcl(7)
-* source repository: https://code.uplex.de/uplex-varnish/libvmod-pcre2
+* pcre2(3)
+* PCRE web site: http://www.pcre.org/
+* VMOD source repository: https://code.uplex.de/uplex-varnish/libvmod-pcre2
+
+.. _pcre2(3): http://www.pcre.org/current/doc/html/pcre2.html
+.. _pcre2pattern(3): http://www.pcre.org/current/doc/html/pcre2pattern.html
+.. _pcre2syntax(3): http://www.pcre.org/current/doc/html/pcre2syntax.html
+.. _pcre2api(3): http://www.pcre.org/current/doc/html/pcre2api.html
+.. _pcre2unicode(3): http://www.pcre.org/current/doc/html/pcre2unicode.html
+.. _pcre2perform(3): http://www.pcre.org/current/doc/html/pcre2perform.html

 $Event event