Varnish module (VMOD) to access the Google RE2 regular expression engine
Find a file
2017-10-04 14:17:45 +02:00
m4 initial commit, passes a simple test for an initial verson of .match() 2016-03-13 13:01:41 +01:00
src Update compatibility statement -- currently requiring Varnish master. 2017-10-04 14:17:45 +02:00
.dir-locals.el initial commit, passes a simple test for an initial verson of .match() 2016-03-13 13:01:41 +01:00
.gitignore initial commit, passes a simple test for an initial verson of .match() 2016-03-13 13:01:41 +01:00
autogen.sh initial commit, passes a simple test for an initial verson of .match() 2016-03-13 13:01:41 +01:00
configure.ac Revert "Workaround: build for Varnish >=5.2.0, when _Static_assert is defined." 2017-10-04 14:11:55 +02:00
CONTRIBUTING.rst Add CONTRIBUTING, and move the dev instructions into there. 2017-06-03 10:08:11 +02:00
COPYING initial commit, passes a simple test for an initial verson of .match() 2016-03-13 13:01:41 +01:00
INSTALL.rst Move installation instructions into INSTALL.rst. 2017-06-03 10:12:44 +02:00
LICENSE initial commit, passes a simple test for an initial verson of .match() 2016-03-13 13:01:41 +01:00
Makefile.am initial commit, passes a simple test for an initial verson of .match() 2016-03-13 13:01:41 +01:00
README.rst Update compatibility statement -- currently requiring Varnish master. 2017-10-04 14:17:45 +02:00

..
.. NB:  This file is machine generated, DO NOT EDIT!
..
.. Edit vmod.vcc and run make instead
..

.. role:: ref(emphasis)

.. _vmod_re2(3):

========
vmod_re2
========

---------------------------------------------------------------------
Varnish Module for access to the Google RE2 regular expression engine
---------------------------------------------------------------------

:Manual section: 3

SYNOPSIS
========

import re2 [from "path"] ;


::

  # regex object interface
  new OBJECT = re2.regex(STRING pattern [, <regex options>])
  BOOL <obj>.match(STRING)
  STRING <obj>.backref(INT ref)
  STRING <obj>.namedref(STRING name)
  STRING <obj>.sub(STRING text, STRING rewrite)
  STRING <obj>.suball(STRING text, STRING rewrite)
  STRING <obj>.extract(STRING text, STRING rewrite)
  
  # regex function interface
  BOOL re2.match(STRING pattern, STRING subject [, <regex options>])
  STRING re2.backref(INT ref)
  STRING re2.namedref(STRING name)
  STRING re2.sub(STRING pattern, STRING text, STRING rewrite [, <regex options>])
  STRING re2.suball(STRING pattern, STRING text, STRING rewrite [, <regex options>])
  STRING re2.extract(STRING pattern, STRING text, STRING rewrite [, <regex options>])

  # set object interface
  new OBJECT = re2.set([ENUM anchor] [, <regex options>])
  VOID <obj>.add(STRING)
  VOID <obj>.compile()
  BOOL <obj>.match(STRING)
  INT <obj>.nmatches()
  BOOL <obj>.matched(INT)

DESCRIPTION
===========

Varnish Module (VMOD) for access to the Google RE2 regular expression engine.

Varnish VCL uses the PCRE library (Perl Compatible Regular Expressions) for
its native regular expressions, which runs very efficiently for many common
uses of pattern matching in VCL, as attested by years of successful use of
PCRE with Varnish.

But for certain kinds of patterns, the worst-case running time of the PCRE
matcher is exponential in the length of the string to be matched. The
matcher uses backtracking, implemented with recursive calls to the internal
``match()`` function. In principle there is no upper bound to the possible
depth of backtracking and recursion, except as imposed by the ``varnishd``
runtime parameters ``pcre_match_limit`` and ``pcre_match_limit_recursion``;
matches fail if either of these limits are met. Stack overflow caused by
deep backtracking has occasionally been the subject of ``varnishd`` issues.

RE2 differs from PCRE in that it limits the syntax of patterns so that they
always specify a regular language in the formally strict sense. Most notably,
backreferences within a pattern are not permitted, for example ``(foo|bar)\1``
to match ``foofoo`` and ``barbar``, but not ``foobar`` or ``barfoo``. See the
link in ``SEE ALSO`` for the specification of RE2 syntax.

This means that an RE2 matcher runs as a finite automaton, which guarantees
linear running time in the length of the matched string. There is no
backtracking, and hence no risk of deep recursion or stack overflow.

The relative advantages and disadvantages of RE2 and PCRE is a broad subject,
beyond the scope of this manual. See the references in ``SEE ALSO`` for more
in-depth discussion.

regex object and function interfaces
------------------------------------

The VMOD provides regular expression operations by way of the ``regex`` object
interface and a functional interface. For ``regex`` objects, the pattern is
compiled at VCL initialization time, and the compiled pattern is re-used for
each invocation of its methods. Compilation failures (due to errors in the
pattern) cause failure at initialization time, and the VCL fails to load. The
``.backref()`` and ``.namedref()`` methods refer back to the last invocation
of the ``.match()`` method for the same object.

The functional interface provides the same set of operations, but the pattern
is compiled at runtime on each invocation (and then discarded). Compilation
failures are reported as errors in the Varnish log. The ``backref()`` and
``namedref()`` functions refer back to the last invocation of the ``match()``
function, for any pattern.

Compiling a pattern at runtime on each invocation is considerably more costly
than re-using a compiled pattern. So for patterns that are fixed and known
at VCL initialization, the object interface should be used. The functional
interface should only be used for patterns whose contents are not known until
runtime.

set object interface
--------------------

``set`` objects provide a shorthand for constructing patterns that consist of
an alternation -- a group of patterns combined with ``|`` for "or". For
example::

  import re2;
  
  sub vcl_init {
        new myset = re2.set();
	myset.add("foo");	# Pattern 1
	myset.add("bar");	# Pattern 2
	myset.add("baz");	# Pattern 3
	myset.compile();
  }

``myset.match(<string>)`` can now be used to match a string against
the pattern ``foo|bar|baz``. When a match is successful, the matcher
has determined all of the patterns that matched. These can then be
retrieved with the method ``.nmatches()`` for the number of matched
patterns, and with ``.matched(n)``, which returns ``true`` if the
``nth`` pattern matched, where the patterns are numbered in the order
in which they were added::

  if (myset.match("foobar")) {
      std.log("Matched " + myset.nmatches() + " patterns");
      if (myset.matched(1)) {
          # Pattern /foo/ matched
          call do_foo;
      }
      if (myset.matched(2)) {
          # Pattern /bar/ matched
          call do_bar;
      }
      if (myset.matched(3)) {
          # Pattern /baz/ matched
          call do_baz;
      }
  }

regex options
-------------

Where a pattern is compiled -- in the ``regex`` and ``set`` constructors, and
in functions that require compilation -- options may be specified that can
affect the interpretation of the pattern or the operation of the matcher. There
are default values for each option, and it is only necessary to specify options
in VCL that differ from the defaults. Options specified in a ``set``
constructor apply to all of the patterns in the resulting alternation.

``utf8``
  If true, characters in a pattern match Unicode code points, and hence may
  match more than one byte. If false, the pattern and strings to be matched
  are interpreted as Latin-1 (ISO 8859-1), and a pattern character matches
  exactly one byte. Default is **false**. Note that this differs from the
  RE2 default.
``posix_syntax``
  If true, patterns are restricted to POSIX (egrep) syntax. Otherwise,
  the pattern syntax resembles that of PCRE, with some deviations. See the
  link in ``SEE ALSO`` for the syntax specification. Default is **false**.
  The options ``perl_classes``, ``word_boundary`` and ``one_line`` are
  only consulted when this option is true.
``longest_match``
  If true, the matcher searches for the longest possible match where
  alternatives are possible. Otherwise, search for the first match. For
  example with the pattern ``a(b|bb)`` and the string ``abb``, ``abb``
  matches when ``longest_match`` is true, and backref 1 is ``bb``. Otherwise,
  ``ab`` matches, and backref 1 is ``b``. Default is **false**.
``max_mem``
  An upper bound (in bytes) for the size of the compiled pattern. If ``max_mem``
  is too small, the matcher may fall back to less efficient algorithms, or the
  pattern may fail to compile. Default is the RE2 default (8MB), which should
  suffice for typical patterns.
``literal``
  If true, the pattern is interpreted as a literal string, and no regex
  metacharacters (such as ``*``, ``+``, ``^`` and so forth) have their special
  meaning. Default is **false**.
``never_nl``
  If true, the newline character ``\n`` in a string is never matched, even if it
  appears in the pattern. Default is **false**.
``dot_nl``
  If true, then the dot character ``.`` in a pattern matches everything,
  including newline. Otherwise, ``.`` never matches newline. Default is
  **false**.
``never_capture``
  If true, parentheses in a pattern are interpreted as non-capturing, and all
  invocations of the ``backref`` and ``namedref`` methods or functions will
  fail, including ``backref(0)`` after a successful match. Default is **false**,
  except for set objects, for which ``never_capture`` is always true (and cannot
  be changed), since back references are not possible with sets.
``case_sensitive``
  If true, matches are case-sensitive. A pattern can override this option with
  the ``(?i)`` flag, unless ``posix_syntax`` is true. Default is **true**.

The following options are only consulted when ``posix_syntax`` is true. If
``posix_syntax`` is false, then these features are always enabled and cannot be
turned off.

``perl_classes``
  If true, then the perl character classes ``\d``, ``\s``, ``\w``, ``\D``,
  ``\S`` and ``\W`` are permitted in a pattern. Default is **false**.
``word_boundary``
  If true, the perl assertions ``\b`` and ``\B`` (word boundary and not a word
  boundary) are permitted. Default is **false**.
``one_line``
  If true, then ``^`` and ``$`` only match at the beginning and end of the
  string to be matched, regardless of newlines. Otherwise, ``^`` also matches
  just after a newline, and ``$`` also matches just before a newline. Default is
  **false**.

CONTENTS
========

* regex(STRING, BOOL, BOOL, BOOL, INT, BOOL, BOOL, BOOL, BOOL, BOOL, BOOL, BOOL, BOOL)
* BOOL match(PRIV_TASK, STRING, STRING, BOOL, BOOL, BOOL, INT, BOOL, BOOL, BOOL, BOOL, BOOL, BOOL, BOOL, BOOL)
* STRING backref(PRIV_TASK, INT, STRING)
* STRING namedref(PRIV_TASK, STRING, STRING)
* STRING sub(STRING, STRING, STRING, STRING, BOOL, BOOL, BOOL, INT, BOOL, BOOL, BOOL, BOOL, BOOL, BOOL, BOOL, BOOL)
* STRING suball(STRING, STRING, STRING, STRING, BOOL, BOOL, BOOL, INT, BOOL, BOOL, BOOL, BOOL, BOOL, BOOL, BOOL, BOOL)
* STRING extract(STRING, STRING, STRING, STRING, BOOL, BOOL, BOOL, INT, BOOL, BOOL, BOOL, BOOL, BOOL, BOOL, BOOL, BOOL)
* set(ENUM {none,start,both}, BOOL, BOOL, BOOL, INT, BOOL, BOOL, BOOL, BOOL, BOOL, BOOL, BOOL)
* STRING version()

.. _obj_regex:

regex
-----

::

	new OBJ = regex(STRING pattern, BOOL utf8=0, BOOL posix_syntax=0, BOOL longest_match=0, INT max_mem=8388608, BOOL literal=0, BOOL never_nl=0, BOOL dot_nl=0, BOOL never_capture=0, BOOL case_sensitive=1, BOOL perl_classes=0, BOOL word_boundary=0, BOOL one_line=0)

Create a regex object from ``pattern`` and the given options (or
option defaults). If the pattern is invalid, then VCL will fail to
load and the VCC compiler will emit an error message.

Example::

  sub vcl_init {
      new domainmatcher = re2.regex("^www\.([^.]+)\.com$");
      new maxagematcher = re2.regex("max-age\s*=\s*(\d+)");

      # Group possible subdomains without capturing
      new submatcher = re2.regex("^www\.(domain1|domain2)\.com$",
	                         never_capture=true);
  }

.. _func_regex.match:

regex.match
-----------

::

	BOOL regex.match(STRING)

Returns ``true`` if and only if the compiled regex matches the given
string; corresponds to VCL's infix operator ``~``.

Example::

  if (myregex.match(req.http.Host)) {
     call do_on_match;
  }

.. _func_regex.backref:

regex.backref
-------------

::

	STRING regex.backref(INT ref, STRING fallback="**BACKREF METHOD FAILED**")

Returns the `nth` captured subexpression from the most recent
successful call of the ``.match()`` method for this object in the same
client or backend, context, or a fallback string in case the capture
fails. Backref 0 indicates the entire matched string. Thus this
function behaves like the ``\n`` in the native VCL functions
``regsub`` and ``regsuball``, and the ``$1``, ``$2`` ... variables in
Perl.

Since Varnish client and backend operations run in different threads,
``.backref()`` can only refer back to a ``.match()`` call in the same
thread. Thus a ``.backref()`` call in any of the ``vcl_backend_*``
subroutines -- the backend context -- refers back to a previous
``.match()`` in any of those same subroutines; and a call in any of
the other VCL subroutines -- the client context -- refers back to a
``.match()`` in the same client context.

After unsuccessful matches, the ``fallback`` string is returned for
any call to ``.backref()``. The default value of ``fallback`` is
``"**BACKREF METHOD FAILED**"``. ``.backref()`` always fails after a
failed match, even if ``.match()`` had been called successfully before
the failure.

``.backref()`` may also return ``fallback`` after a successful match,
if no captured group in the matching string corresponds to the backref
number. For example, when the pattern ``(a|(b))c`` matches the string
``ac``, there is no backref 2, since nothing matches ``b`` in the
string.

The VCL infix operators ``~`` and ``!~`` do not affect this method,
nor do the functions ``regsub`` or ``regsuball``. Nor is it affected
by the matches performed by any other method or function in this VMOD
(such as the ``sub()``, ``suball()`` or ``extract()`` methods or
functions, or the ``set`` object's ``.match()`` method).

``.backref()`` fails, returning ``fallback`` and writing an error
message to the Varnish log with the ``VCL_Error`` tag, under the
following conditions (even if a previous match was successful and a
substring could have been captured):

* The ``fallback`` string is undefined, for example if set from an unset
  header variable.
* The ``never_capture`` option was set to ``true`` for this object. In this
  case, even ``.backref(0)`` fails after a successful match (otherwise, backref
  0 always returns the full matched string).
* ``ref`` (the backref number) is out of range, i.e. it is larger than the
  highest number for a capturing group in the pattern.
* ``.match()`` was never called for this object prior to calling ``.backref()``.
* There is insufficient workspace for the string to be returned.

Example::

  if (domainmatcher.match(req.http.Host)) {
     set req.http.X-Domain = domainmatcher.backref(1);
  }

.. _func_regex.namedref:

regex.namedref
--------------

::

	STRING regex.namedref(STRING name, STRING fallback="**NAMEDREF METHOD FAILED**")

Returns the captured subexpression designated by ``name`` from the
most recent successful call to ``.match()`` in the current context
(client or backend), or ``fallback`` in case of failure.

Named capturing groups are written in RE2 as: ``(?P<name>re)``. (Note
that this syntax with ``P``, inspired by Python, differs from the
notation for named capturing groups in PCRE.) Thus when
``(?P<foo>.+)bar$`` matches ``bazbar``, then ``.namedref("foo")``
returns ``baz``.

Note that a named capturing group can also be referenced as a numbered
group. So in the previous example, ``.backref(1)`` also returns
``baz``.

``fallback`` is returned when ``.namedref()`` is called after an
unsuccessful match. The default fallback is ``"**NAMEDREF METHOD
FAILED**"``.

Like ``.backref()``, ``.namedref()`` is not affected by native VCL
regex operations, nor by any other matches performed by methods or
functions of the VMOD, except for a prior ``.match()`` for the same
object.

``.namedref()`` fails, returning ``fallback`` and logging a
``VCL_Error`` message, if:

* The ``fallback`` string is undefined.
* ``name`` is undefined or the empty string.
* The ``never_capture`` option was set to ``true``.
* There is no such named group.
* ``.match()`` was not called for this object.
* There is insufficient workspace for the string to be returned.

Example::

  sub vcl_init {
  	new domainmatcher = re2.regex("^www\.(?P<domain>[^.]+)\.com$");
  }
  
  sub vcl_recv {
  	if (domainmatcher.match(req.http.Host)) {
  	   set req.http.X-Domain = domainmatcher.namedref("domain");
	}
  }

.. _func_regex.sub:

regex.sub
---------

::

	STRING regex.sub(STRING text, STRING rewrite, STRING fallback="**SUB METHOD FAILED**")

If the compiled pattern for this regex object matches ``text``, then
return the result of replacing the first match in ``text`` with
``rewrite``. Within ``rewrite``, ``\1`` through ``\9`` can be used to
insert the the numbered capturing group from the pattern, and ``\0``
to insert the entire matching text. This method corresponds to the VCL
native function ``regsub()``.

``fallback`` is returned if the pattern does not match ``text``. The
default fallback is ``"**SUB METHOD FAILED**"``.

``.sub()`` fails, returning ``fallback`` and logging a ``VCL_Error``
message, if:

* Any of ``text``, ``rewrite`` or ``fallback`` are undefined.
* There is insufficient workspace for the rewritten string.

Example::

  sub vcl_init {
      new bmatcher = re2.regex("b+");
  }

  sub vcl_recv {
      # If Host contains "www.yabba.dabba.doo.com", then this will
      # set X-Yada to "www.yada.dabba.doo.com".
      set req.http.X-Yada = bmatcher.sub(req.http.Host, "d");
  }

.. _func_regex.suball:

regex.suball
------------

::

	STRING regex.suball(STRING text, STRING rewrite, STRING fallback="**SUBALL METHOD FAILED**")

Like ``.sub()``, except that all successive non-overlapping matches in
``text`` are replaced with ``rewrite``. This method corresponds to VCL
native ``regsuball()``.

The default fallback is ``"**SUBALL METHOD FAILED**"``. ``.suball()``
fails under the same conditions as ``.sub()``.

Since only non-overlapping matches are substituted, replacing
``"ana"`` within ``"banana"`` only results in one substitution, not
two.

Example::

  sub vcl_init {
      new bmatcher = re2.regex("b+");
  }

  sub vcl_recv {
      # If Host contains "www.yabba.dabba.doo.com", then set X-Yada to
      # "www.yada.dada.doo.com".
      set req.http.X-Yada = bmatcher.suball(req.http.Host, "d");
  }

.. _func_regex.extract:

regex.extract
-------------

::

	STRING regex.extract(STRING text, STRING rewrite, STRING fallback="**EXTRACT METHOD FAILED**")

If the compiled pattern for this regex object matches ``text``, then
return ``rewrite`` with substitutions from the matching portions of
``text``. Non-matching substrings of ``text`` are ignored.

The default fallback is ``"**EXTRACT METHOD FAILED**"``. Like
``.sub()`` and ``.suball()``, ``.extract()`` fails if:

* Any of ``text``, ``rewrite`` or ``fallback`` are undefined.
* There is insufficient workspace for the rewritten string.

Example::

	sub vcl_init {
	    new email = re2.regex("(.*)@([^.]*)");
	}

	sub vcl_deliver {
	    # Sets X-UUCP to "kremvax!boris"
	    set resp.http.X-UUCP = email.extract("boris@kremvax.ru", "\2!\1");
	}

regex functional interface
==========================

.. _func_match:

match
-----

::

	BOOL match(PRIV_TASK, STRING pattern, STRING subject, BOOL utf8=0, BOOL posix_syntax=0, BOOL longest_match=0, INT max_mem=8388608, BOOL literal=0, BOOL never_nl=0, BOOL dot_nl=0, BOOL never_capture=0, BOOL case_sensitive=1, BOOL perl_classes=0, BOOL word_boundary=0, BOOL one_line=0)

Like the ``regex.match()`` method, return ``true`` if ``pattern``
matches ``subject``, where ``pattern`` is compiled with the given
options (or default options) on each invocation.

If ``pattern`` fails to compile, then an error message is logged with
the ``VCL_Error`` tag, and ``false`` is returned.

Example::

  # Match the bereq Host header against a backend response header
  if (re2.match(pattern=bereq.http.Host, subject=beresp.http.X-Host)) {
     call do_on_match;
  }

.. _func_backref:

backref
-------

::

	STRING backref(PRIV_TASK, INT ref, STRING fallback="**BACKREF FUNCTION FAILED**")

Returns the `nth` captured subexpression from the most recent
successful call of the ``match()`` function in the current client or
backend context, or a fallback string if the capture fails. The
default ``fallback`` is ``"**BACKREF FUNCTION FAILED**"``.

Similarly to the ``regex.backref()`` method, ``fallback`` is returned
after any failed invocation of the ``match()`` function, or if there
is no captured group corresponding to the backref number. The function
is not affected by native VCL regex operations, or any other method or
function of the VMOD except for the ``match()`` function.

The function fails, returning ``fallback`` and logging a ``VCL_Error``
message, under the same conditions as the corresponding method:

* ``fallback`` is undefined.
* ``never_capture`` was true in the previous invocation of the ``match()``
  function.
* ``ref`` is out of range.
* The ``match()`` function was never called in this context.
* The pattern failed to compile for the previous ``match()`` call.
* There is insufficient workspace for the captured subexpression.

Example::

  # Match against a pattern provided in a beresp header, and capture
  # subexpression 1.
  if (re2.match(pattern=beresp.http.X-Pattern, bereq.http.X-Foo)) {
     set beresp.http.X-Capture = re2.backref(1);
  }

.. _func_namedref:

namedref
--------

::

	STRING namedref(PRIV_TASK, STRING name, STRING fallback="**NAMEDREF FUNCTION FAILED**")

Returns the captured subexpression designated by ``name`` from the
most recent successful call to the ``match()`` function in the current
context, or ``fallback`` in case of failure. The default fallback is
``"**NAMEDREF FUNCTION FAILED**"``.

The function returns ``fallback`` when the previous invocation of the
``match()`` function failed, and is only affected by use of the
``match()`` function. The function fails, returning ``fallback`` and
logging a ``VCL_Error`` message, under the same conditions as the
corresponding method:

* ``fallback`` is undefined.
* ``name`` is undefined or the empty string.
* The ``never_capture`` option was set to ``true``.
* There is no such named group.
* ``match()`` was not called in this context.
* The pattern failed to compile for the previous ``match()`` call.
* There is insufficient workspace for the captured expression.

Example::

  if (re2.match(beresp.http.X-Pattern-With-Names, bereq.http.X-Foo)) {
     set beresp.http.X-Capture = re2.namedref("foo");
  }

.. _func_sub:

sub
---

::

	STRING sub(STRING pattern, STRING text, STRING rewrite, STRING fallback="**SUB FUNCTION FAILED**", BOOL utf8=0, BOOL posix_syntax=0, BOOL longest_match=0, INT max_mem=8388608, BOOL literal=0, BOOL never_nl=0, BOOL dot_nl=0, BOOL never_capture=0, BOOL case_sensitive=1, BOOL perl_classes=0, BOOL word_boundary=0, BOOL one_line=0)

Compiles ``pattern`` with the given options, and if it matches
``text``, then return the result of replacing the first match in
``text`` with ``rewrite``. As with the ``regex.sub()`` method, ``\0``
through ``\9`` may be used in ``rewrite`` to substitute captured
groups from the pattern.

``fallback`` is returned if the pattern does not match ``text``. The
default fallback is ``"**SUB FUNCTION FAILED**"``.

``sub()`` fails, returning ``fallback`` and logging a ``VCL_Error``
message, if:

* ``pattern`` cannot be compiled.
* Any of ``text``, ``rewrite`` or ``fallback`` are undefined.
* There is insufficient workspace for the rewritten string.

Example::

  # If the beresp header X-Sub-Letters contains "b+", and Host contains
  # "www.yabba.dabba.doo.com", then set X-Yada to
  # "www.yada.dabba.doo.com".
  set beresp.http.X-Yada = re2.sub(beresp.http.X-Sub-Letters,
                                   bereq.http.Host, "d");

.. _func_suball:

suball
------

::

	STRING suball(STRING pattern, STRING text, STRING rewrite, STRING fallback="**SUBALL FUNCTION FAILED**", BOOL utf8=0, BOOL posix_syntax=0, BOOL longest_match=0, INT max_mem=8388608, BOOL literal=0, BOOL never_nl=0, BOOL dot_nl=0, BOOL never_capture=0, BOOL case_sensitive=1, BOOL perl_classes=0, BOOL word_boundary=0, BOOL one_line=0)

Like the ``sub()`` function, except that all successive
non-overlapping matches in ``text`` are replace with ``rewrite``.

The default fallback is ``"**SUBALL FUNCTION FAILED**"``. The
``suball()`` function fails under the same conditions as ``sub()``.

Example::

  # If the beresp header X-Sub-Letters contains "b+", and Host contains
  # "www.yabba.dabba.doo.com", then set X-Yada to
  # "www.yada.dada.doo.com".
  set beresp.http.X-Yada = re2.suball(beresp.http.X-Sub-Letters,
                                      bereq.http.Host, "d");

.. _func_extract:

extract
-------

::

	STRING extract(STRING pattern, STRING text, STRING rewrite, STRING fallback="**EXTRACT FUNCTION FAILED**", BOOL utf8=0, BOOL posix_syntax=0, BOOL longest_match=0, INT max_mem=8388608, BOOL literal=0, BOOL never_nl=0, BOOL dot_nl=0, BOOL never_capture=0, BOOL case_sensitive=1, BOOL perl_classes=0, BOOL word_boundary=0, BOOL one_line=0)

Compiles ``pattern`` with the given options, and if it matches
``text``, then return ``rewrite`` with substitutions from the matching
portions of ``text``, ignoring the non-matching portions.

The default fallback is ``"**EXTRACT FUNCTION FAILED**"``. The
``extract()`` function fails under the same conditions as ``sub()``
and ``suball()``.

Example::

  # If beresp header X-Params contains "(foo|bar)=(baz|quux)", and the
  # URL contains "bar=quux", then set X-Query to "bar:quux".
  set beresp.http.X-Query = re2.extract(beresp.http.X-Params, bereq.url,
                                        "\1:\2");

.. _obj_set:

set
---

::

	new OBJ = set(ENUM {none,start,both} anchor="none", BOOL utf8=0, BOOL posix_syntax=0, BOOL longest_match=0, INT max_mem=8388608, BOOL literal=0, BOOL never_nl=0, BOOL dot_nl=0, BOOL case_sensitive=1, BOOL perl_classes=0, BOOL word_boundary=0, BOOL one_line=0)

Initialize a set object that represents several patterns combined by
alternation -- ``|`` for "or".
  
Optional parameters control the interpretation of the resulting
composed pattern. The ``anchor`` parameter is an enum that can have
the values ``none``, ``start`` or ``both``, where ``none`` is the
default. ``start`` means that each pattern is matched as if it begins
with ``^`` for start-of-text, and ``both`` means that each pattern is
anchored with both ``^`` at the beginning and ``$`` for end-of-text at
the end. ``none`` means that each pattern is interpreted as a partial
match (although individual patterns within the set may have either of
``^`` of ``$``).

For example, if a set is initialized with ``anchor=both``, and the
patterns ``foo`` and ``bar`` are added, then matches against the set
match a string against ``^foo$|^bar$``, or equivalently
``^(foo|bar)$``.
  
The usual regex options can be set, which then control matching
against the resulting composed pattern. However, the ``never_capture``
option cannot be set, and is always implicitly true, since backrefs
and namedrefs are not possible with sets.

Example::

  sub vcl_init {
        # Initialize a regex set for partial matches
	# with default options
  	new foo = re2.set();

        # Initialize a regex set for case insensitive matches
	# with anchors on both ends (^ and $).
  	new bar = re2.set(anchor=both, case_sensitive=false);

        # Initialize a regex set using POSIX syntax, but allowing
	# Perl character classes, and anchoring at the left (^).
  	new baz = re2.set(anchor=start, posix_syntax=true,
	                  perl_classes=true);
  }

.. _func_set.add:

set.add
-------

::

	VOID set.add(STRING)

Add the given pattern to the set. If the pattern is invalid,
``.add()`` fails, and the VCL will fail to load, with an error message
describing the problem.

``.add()`` MUST be called in ``vcl_init``, and MAY NOT be called after
``.compile()``.  If ``.add()`` is called in any other subroutine, an
error message with ``VCL_Error`` is logged, and the call has no
effect. If it is called in ``vcl_init`` after ``.compile()``, then the
VCL load will fail with an error message.

In other words, add all patterns to the set in ``vcl_init``, and
finally call ``.compile()`` when you're done.

When the ``.matched(INT)`` method is called after a successful match,
the numbering corresponds to the order in which patterns were added.

Example::

  sub vcl_init {
      # literal=true means that the dots are interpreted as literal
      # dots, not "match any character".
      new hostmatcher = re2.set(anchor=both, case_sensitive=false,
                                literal=true);
      hostmatcher.add("www.domain1.com");
      hostmatcher.add("www.domain2.com");
      hostmatcher.add("www.domain3.com");
      hostmatcher.compile();
  }

.. _func_set.compile:

set.compile
-----------

::

	VOID set.compile()

Compile the compound pattern represented by the set -- an alternation
of all patterns added by ``.add()``.

``.compile()`` fails if no patterns were added to the set. It may also
fail if the ``max_mem`` setting is not large enough for the composed
pattern. In that case, the VCL load will fail with an error message
(then consider a larger value for ``max_mem`` in the set constructor).

``.compile()`` MUST be called in ``vcl_init``, and MAY NOT be called
more than once for a set object. If it is called in any other
subroutine, a ``VCL_Error`` message is logged, and the call has no
effect. If it is called a second time in ``vcl_init``, the VCL load
will fail.

See above for examples.

.. _func_set.match:

set.match
---------

::

	BOOL set.match(STRING)

Returns ``true`` if the given string matches the compound pattern
represented by the set, i.e. if it matches any of the patterns that
were added to the set.

The matcher identifies all of the patterns that were added to the set
and match the given string. These can be determined after a successful
match using the ``.matched(INT)`` and ``.nmatches()`` methods
described below.

``.match()`` MUST be called after ``.compile()``; otherwise the match
always fails.

Example::

  if (hostmatcher.match(req.http.Host)) {
     call do_when_a_host_matched;
  }

.. _func_set.matched:

set.matched
-----------

::

	BOOL set.matched(INT)

Returns ``true`` after a successful match if the ``nth`` pattern that
was added to the set is among the patterns that matched, ``false``
otherwise. The numbering of the patterns corresponds to the order in
which patterns were added in ``vcl_init``, counting from 1.

The method refers back to the most recent invocation of ``.match()``
for the same object in the same client or backend context. It always
returns ``false``, for every value of the parameter, if it is called
after an unsuccessful match (``.match()`` returned ``false``).

``.matched()`` fails and returns ``false`` if:

* The ``.match()`` method was not called for this object in the same
  client or backend scope.

* The integer parameter is out of range; that is, if it is less than 1
  or greater than the number of patterns added to the set.

On failure, the method writes an error message to the log with the tag
``VCL_Error``; if it fails during ``vcl_init``, then the VCL load
fails with the error message. In any other VCL subroutine, the method
returns ``false`` on failure and processing continues; since ``false``
is a legitimate return value, you should consider monitoring the log
for the error messages.

Example::

  if (hostmatcher.match(req.http.Host)) {
      if (hostmatcher.matched(1)) {
          call do_domain1;
      }
      if (hostmatcher.matched(2)) {
          call do_domain2;
      }
      if (hostmatcher.matched(3)) {
          call do_domain3;
      }
  }

.. _func_set.nmatches:

set.nmatches
------------

::

	INT set.nmatches()

Returns the number of patterns that were matched by the most recent
invocation of ``.match()`` for the same object in the same client or
backend context. The method always returns 0 after an unsuccessful
match (``.match()`` returned ``false``).

If ``.match()`` was not called for this object in the same client or
backend scope, ``.nmatches()`` fails and returns 0, writing an error
message with ``VCL_Error`` to the log. If this happens in
``vcl_init``, the VCL load fails with the error message. As with
``.matched()``, ``.nmatches()`` returns a legitimate value and VCL
processing continues when it fails in any other subroutine, so you
should monitor the log for the error messages.

Example::

  if (myset.match(req.url)) {
      std.log("URL matched " + myset.nmatches()
              + " patterns from the set");
  }

.. _func_version:

version
-------

::

	STRING version()

Return the version string for this VMOD.

Example::

  std.log("Using VMOD re2 version: " + re2.version());

REQUIREMENTS
============

The VMOD requires the Varnish master branch, and is not compatible
with any current released versions of Varnish. See the source
repository for versions of the VMOD that are compatible with other
Varnish versions.

It requires the RE2 library, and has been tested against RE2 version
2017-08-01.

INSTALLATION
============

See `INSTALL.rst <INSTALL.rst>`_ in the source repository.

LIMITATIONS
===========

The VMOD allocates Varnish workspace for captured groups and rewritten
strings. If operations fail with "insufficient workspace" error
messages in the Varnish log (with the ``VCL_Error`` tag), increase the
varnishd runtime parameters ``workspace_client`` and/or
``workspace_backend``.

The RE2 documentation states that successful matches are slowed quite
a bit when they also capture substrings. There is also additional
overhead from the VMOD, unless the ``never_capture`` flag is true, to
manage data about captured groups in the workspace. This overhead is
incurred even if there are no capturing expressions in a pattern,
since it is always possible to call ``backref(0)`` to obtain the
matched portion of a string.

So if you are using a pattern only to match against strings, and never
to capture subexpressions, consider setting the ``never_capture``
option to true, to eliminate the extra work for both RE2 and the VMOD.

AUTHOR
======

* Geoffrey Simmons <geoff@uplex.de>

UPLEX Nils Goroll Systemoptimierung

SEE ALSO
========

* varnishd(1)
* vcl(7)
* VMOD source repository: https://code.uplex.de/uplex-varnish/libvmod-re2
* RE2 git repo: https://github.com/google/re2
* RE2 syntax: https://github.com/google/re2/wiki/Syntax
* "Implementing Regular Expressions": https://swtch.com/~rsc/regexp/

  * Series of articles motivating the design of RE2, with discussion
    of how RE2 compares with PCRE

COPYRIGHT
=========

::

  Copyright (c) 2016-2017 UPLEX Nils Goroll Systemoptimierung
  All rights reserved
 
  Author: Geoffrey Simmons <geoffrey.simmons@uplex.de>
 
  See LICENSE