L

libvmod-re2

Varnish module (VMOD) to access the Google RE2 regular expression engine

vmod_re2

Varnish Module for access to the Google RE2 regular expression engine

Manual section: 3

SYNOPSIS

import re2 [from "path"] ;

# regex object interface
new OBJECT = re2.regex(STRING pattern [, <regex options>])
BOOL <obj>.match(STRING)
STRING <obj>.backref(INT ref)
STRING <obj>.namedref(STRING name)
STRING <obj>.sub(STRING text, STRING rewrite)
STRING <obj>.suball(STRING text, STRING rewrite)
STRING <obj>.extract(STRING text, STRING rewrite)

# regex function interface
BOOL re2.match(STRING pattern, STRING subject [, <regex options>])
STRING re2.backref(INT ref)
STRING re2.namedref(STRING name)
STRING re2.sub(STRING pattern, STRING text, STRING rewrite [, <regex options>])
STRING re2.suball(STRING pattern, STRING text, STRING rewrite [, <regex options>])
STRING re2.extract(STRING pattern, STRING text, STRING rewrite [, <regex options>])

# set object interface
new OBJECT = re2.set([ENUM anchor] [, <regex options>])
VOID <obj>.add(STRING [, STRING string] [, BACKEND backend])
VOID <obj>.compile()
BOOL <obj>.match(STRING)
INT <obj>.nmatches()
BOOL <obj>.matched(INT)
INT <obj>.which([ENUM select])
STRING <obj>.string([INT n,] [ENUM select])
BACKEND <obj>.backend([INT n,] [ENUM select])

# VMOD version
STRING re2.version()

DESCRIPTION

Varnish Module (VMOD) for access to the Google RE2 regular expression engine.

Varnish VCL uses the PCRE library (Perl Compatible Regular Expressions) for its native regular expressions, which runs very efficiently for many common uses of pattern matching in VCL, as attested by years of successful use of PCRE with Varnish.

But for certain kinds of patterns, the worst-case running time of the PCRE matcher is exponential in the length of the string to be matched. The matcher uses backtracking, implemented with recursive calls to the internal match() function. In principle there is no upper bound to the possible depth of backtracking and recursion, except as imposed by the varnishd runtime parameters pcre_match_limit and pcre_match_limit_recursion; matches fail if either of these limits are met. Stack overflow caused by deep backtracking has occasionally been the subject of varnishd issues.

RE2 differs from PCRE in that it limits the syntax of patterns so that they always specify a regular language in the formally strict sense. Most notably, backreferences within a pattern are not permitted, for example (foo|bar)\1 to match foofoo and barbar, but not foobar or barfoo. See the link in SEE ALSO for the specification of RE2 syntax.

This means that an RE2 matcher runs as a finite automaton, which guarantees linear running time in the length of the matched string. There is no backtracking, and hence no risk of deep recursion or stack overflow.

The relative advantages and disadvantages of RE2 and PCRE is a broad subject, beyond the scope of this manual. See the references in SEE ALSO for more in-depth discussion.

regex object and function interfaces

The VMOD provides regular expression operations by way of the regex object interface and a functional interface. For regex objects, the pattern is compiled at VCL initialization time, and the compiled pattern is re-used for each invocation of its methods. Compilation failures (due to errors in the pattern) cause failure at initialization time, and the VCL fails to load. The .backref() and .namedref() methods refer back to the last invocation of the .match() method for the same object.

The functional interface provides the same set of operations, but the pattern is compiled at runtime on each invocation (and then discarded). Compilation failures are reported as errors in the Varnish log. The backref() and namedref() functions refer back to the last invocation of the match() function, for any pattern.

Compiling a pattern at runtime on each invocation is considerably more costly than re-using a compiled pattern. So for patterns that are fixed and known at VCL initialization, the object interface should be used. The functional interface should only be used for patterns whose contents are not known until runtime.

set object interface

set objects provide a shorthand for constructing patterns that consist of an alternation -- a group of patterns combined with | for "or". For example:

import re2;

sub vcl_init {
      new myset = re2.set();
      myset.add("foo");       # Pattern 1
      myset.add("bar");       # Pattern 2
      myset.add("baz");       # Pattern 3
      myset.compile();
}

myset.match(<string>) can now be used to match a string against the pattern foo|bar|baz. When a match is successful, the matcher has determined all of the patterns that matched. These can then be retrieved with the method .nmatches() for the number of matched patterns, and with .matched(n), which returns true if the nth pattern matched, where the patterns are numbered in the order in which they were added:

if (myset.match("foobar")) {
    std.log("Matched " + myset.nmatches() + " patterns");
    if (myset.matched(1)) {
        # Pattern /foo/ matched
        call do_foo;
    }
    if (myset.matched(2)) {
        # Pattern /bar/ matched
        call do_bar;
    }
    if (myset.matched(3)) {
        # Pattern /baz/ matched
        call do_baz;
    }
}

An advantage of alternations and sets with RE2, as opposed to an alternation in PCRE or a series of separate matches in an if-elsif-elsif sequence, comes from the fact that the matcher is implemented as a state machine. That means that the matcher progresses through the string to be matched just once, following patterns in the set that match through the state machine, or determining that there is no match as soon as there are no more possible paths in the state machine. So a string can be matched against a large set of patterns in time that is proportional to the length of the string to be matched. In contrast, PCRE matches patterns in an alternation one after another, stopping after the first matching pattern, or attempting matches against all of them if there is no match. Thus a match against an alternation in PCRE is not unlike an if-elsif-elsif sequence of individual matches, and requires the time needed for each individual match, overall in proportion with the number of patterns to be matched.

Another advantage of the VMOD's set object is the ability to associate strings or backends with the patterns added to the set with the .add() method:

sub vcl_init {
      new prefix = re2.set(anchor=start);
      prefix.add("/foo", string="www.domain1.com");
      prefix.add("/bar", string="www.domain2.com");
      prefix.add("/baz", string="www.domain3.com");
      prefix.add("/quux", string="www.domain4.com");
      prefix.compile();

      new appmatcher = re2.set(anchor=start);
      appmatcher.add("/foo", backend=app1);
      appmatcher.add("/bar", backend=app2);
      appmatcher.add("/baz", backend=app3);
      appmatcher.add("/quux", backend=app4);
      appmatcher.compile();
}

After a successful match, the string or backend associated with the matching pattern can be retrieved with the .string() and .backend() methods. This makes it possible, for example, to construct a redirect response or choose the backend with code that is both efficient and compact, even with a large set of patterns to be matched:

# Use the prefix object to construct a redirect response from
# a matching request URL.
sub vcl_recv {
    if (prefix.match(req.url)) {
        # Pass the string associated with the matching pattern
        # to vcl_synth.
        return(synth(1301, prefix.string()));
    }
}

sub vcl_synth {
    # The string associated with the matching pattern is in
    # resp.reason.
    if (resp.status == 1301) {
        set resp.http.Location = "http://" + resp.reason + req.url;
        set resp.status = 301;
        set resp.reason = "Moved Permanently";
    }
}

# Use the appmatcher object to choose a backend based on the
# request URL prefix.
sub vcl_recv {
    if (appmatcher.match(req.url)) {
        set req.backend_hint = appmatcher.backend();
    }
}

regex options

Where a pattern is compiled -- in the regex and set constructors, and in functions that require compilation -- options may be specified that can affect the interpretation of the pattern or the operation of the matcher. There are default values for each option, and it is only necessary to specify options in VCL that differ from the defaults. Options specified in a set constructor apply to all of the patterns in the resulting alternation.

utf8
If true, characters in a pattern match Unicode code points, and hence may match more than one byte. If false, the pattern and strings to be matched are interpreted as Latin-1 (ISO 8859-1), and a pattern character matches exactly one byte. Default is false. Note that this differs from the RE2 default.
posix_syntax
If true, patterns are restricted to POSIX (egrep) syntax. Otherwise, the pattern syntax resembles that of PCRE, with some deviations. See the link in SEE ALSO for the syntax specification. Default is false. The options perl_classes, word_boundary and one_line are only consulted when this option is true.
longest_match
If true, the matcher searches for the longest possible match where alternatives are possible. Otherwise, search for the first match. For example with the pattern a(b|bb) and the string abb, abb matches when longest_match is true, and backref 1 is bb. Otherwise, ab matches, and backref 1 is b. Default is false.
max_mem
An upper bound (in bytes) for the size of the compiled pattern. If max_mem is too small, the matcher may fall back to less efficient algorithms, or the pattern may fail to compile. Default is the RE2 default (8MB), which should suffice for typical patterns.
literal
If true, the pattern is interpreted as a literal string, and no regex metacharacters (such as *, +, ^ and so forth) have their special meaning. Default is false.
never_nl
If true, the newline character \n in a string is never matched, even if it appears in the pattern. Default is false.
dot_nl
If true, then the dot character . in a pattern matches everything, including newline. Otherwise, . never matches newline. Default is false.
never_capture
If true, parentheses in a pattern are interpreted as non-capturing, and all invocations of the backref and namedref methods or functions will fail, including backref(0) after a successful match. Default is false, except for set objects, for which never_capture is always true (and cannot be changed), since back references are not possible with sets.
case_sensitive
If true, matches are case-sensitive. A pattern can override this option with the (?i) flag, unless posix_syntax is true. Default is true.

The following options are only consulted when posix_syntax is true. If posix_syntax is false, then these features are always enabled and cannot be turned off.

perl_classes
If true, then the perl character classes \d, \s, \w, \D, \S and \W are permitted in a pattern. Default is false.
word_boundary
If true, the perl assertions \b and \B (word boundary and not a word boundary) are permitted. Default is false.
one_line
If true, then ^ and $ only match at the beginning and end of the string to be matched, regardless of newlines. Otherwise, ^ also matches just after a newline, and $ also matches just before a newline. Default is false.

CONTENTS

  • regex(STRING, BOOL, BOOL, BOOL, INT, BOOL, BOOL, BOOL, BOOL, BOOL, BOOL, BOOL, BOOL)
  • BOOL match(PRIV_TASK, STRING, STRING, BOOL, BOOL, BOOL, INT, BOOL, BOOL, BOOL, BOOL, BOOL, BOOL, BOOL, BOOL)
  • STRING backref(PRIV_TASK, INT, STRING)
  • STRING namedref(PRIV_TASK, STRING, STRING)
  • STRING sub(STRING, STRING, STRING, STRING, BOOL, BOOL, BOOL, INT, BOOL, BOOL, BOOL, BOOL, BOOL, BOOL, BOOL, BOOL)
  • STRING suball(STRING, STRING, STRING, STRING, BOOL, BOOL, BOOL, INT, BOOL, BOOL, BOOL, BOOL, BOOL, BOOL, BOOL, BOOL)
  • STRING extract(STRING, STRING, STRING, STRING, BOOL, BOOL, BOOL, INT, BOOL, BOOL, BOOL, BOOL, BOOL, BOOL, BOOL, BOOL)
  • set(ENUM {none,start,both}, BOOL, BOOL, BOOL, INT, BOOL, BOOL, BOOL, BOOL, BOOL, BOOL, BOOL)
  • STRING version()

regex

new OBJ = regex(STRING pattern, BOOL utf8=0, BOOL posix_syntax=0, BOOL longest_match=0, INT max_mem=8388608, BOOL literal=0, BOOL never_nl=0, BOOL dot_nl=0, BOOL never_capture=0, BOOL case_sensitive=1, BOOL perl_classes=0, BOOL word_boundary=0, BOOL one_line=0)

Create a regex object from pattern and the given options (or option defaults). If the pattern is invalid, then VCL will fail to load and the VCC compiler will emit an error message.

Example:

sub vcl_init {
    new domainmatcher = re2.regex("^www\.([^.]+)\.com$");
    new maxagematcher = re2.regex("max-age\s*=\s*(\d+)");

    # Group possible subdomains without capturing
    new submatcher = re2.regex("^www\.(domain1|domain2)\.com$",
                               never_capture=true);
}

regex.match

BOOL regex.match(STRING)

Returns true if and only if the compiled regex matches the given string; corresponds to VCL's infix operator ~.

Example:

if (myregex.match(req.http.Host)) {
   call do_on_match;
}

regex.backref

STRING regex.backref(INT ref, STRING fallback="**BACKREF METHOD FAILED**")

Returns the nth captured subexpression from the most recent successful call of the .match() method for this object in the same client or backend, context, or a fallback string in case the capture fails. Backref 0 indicates the entire matched string. Thus this function behaves like the \n in the native VCL functions regsub and regsuball, and the $1, $2 ... variables in Perl.

Since Varnish client and backend operations run in different threads, .backref() can only refer back to a .match() call in the same thread. Thus a .backref() call in any of the vcl_backend_* subroutines -- the backend context -- refers back to a previous .match() in any of those same subroutines; and a call in any of the other VCL subroutines -- the client context -- refers back to a .match() in the same client context.

After unsuccessful matches, the fallback string is returned for any call to .backref(). The default value of fallback is "**BACKREF METHOD FAILED**". .backref() always fails after a failed match, even if .match() had been called successfully before the failure.

.backref() may also return fallback after a successful match, if no captured group in the matching string corresponds to the backref number. For example, when the pattern (a|(b))c matches the string ac, there is no backref 2, since nothing matches b in the string.

The VCL infix operators ~ and !~ do not affect this method, nor do the functions regsub or regsuball. Nor is it affected by the matches performed by any other method or function in this VMOD (such as the sub(), suball() or extract() methods or functions, or the set object's .match() method).

.backref() fails, returning fallback and writing an error message to the Varnish log with the VCL_Error tag, under the following conditions (even if a previous match was successful and a substring could have been captured):

  • The fallback string is undefined, for example if set from an unset header variable.
  • The never_capture option was set to true for this object. In this case, even .backref(0) fails after a successful match (otherwise, backref 0 always returns the full matched string).
  • ref (the backref number) is out of range, i.e. it is larger than the highest number for a capturing group in the pattern.
  • .match() was never called for this object prior to calling .backref().
  • There is insufficient workspace for the string to be returned.

Example:

if (domainmatcher.match(req.http.Host)) {
   set req.http.X-Domain = domainmatcher.backref(1);
}

regex.namedref

STRING regex.namedref(STRING name, STRING fallback="**NAMEDREF METHOD FAILED**")

Returns the captured subexpression designated by name from the most recent successful call to .match() in the current context (client or backend), or fallback in case of failure.

Named capturing groups are written in RE2 as: (?P<name>re). (Note that this syntax with P, inspired by Python, differs from the notation for named capturing groups in PCRE.) Thus when (?P<foo>.+)bar$ matches bazbar, then .namedref("foo") returns baz.

Note that a named capturing group can also be referenced as a numbered group. So in the previous example, .backref(1) also returns baz.

fallback is returned when .namedref() is called after an unsuccessful match. The default fallback is "**NAMEDREF METHOD FAILED**".

Like .backref(), .namedref() is not affected by native VCL regex operations, nor by any other matches performed by methods or functions of the VMOD, except for a prior .match() for the same object.

.namedref() fails, returning fallback and logging a VCL_Error message, if:

  • The fallback string is undefined.
  • name is undefined or the empty string.
  • The never_capture option was set to true.
  • There is no such named group.
  • .match() was not called for this object.
  • There is insufficient workspace for the string to be returned.

Example:

sub vcl_init {
      new domainmatcher = re2.regex("^www\.(?P<domain>[^.]+)\.com$");
}

sub vcl_recv {
      if (domainmatcher.match(req.http.Host)) {
         set req.http.X-Domain = domainmatcher.namedref("domain");
      }
}

regex.sub

STRING regex.sub(STRING text, STRING rewrite, STRING fallback="**SUB METHOD FAILED**")

If the compiled pattern for this regex object matches text, then return the result of replacing the first match in text with rewrite. Within rewrite, \1 through \9 can be used to insert the the numbered capturing group from the pattern, and \0 to insert the entire matching text. This method corresponds to the VCL native function regsub().

fallback is returned if the pattern does not match text. The default fallback is "**SUB METHOD FAILED**".

.sub() fails, returning fallback and logging a VCL_Error message, if:

  • Any of text, rewrite or fallback are undefined.
  • There is insufficient workspace for the rewritten string.

Example:

sub vcl_init {
    new bmatcher = re2.regex("b+");
}

sub vcl_recv {
    # If Host contains "www.yabba.dabba.doo.com", then this will
    # set X-Yada to "www.yada.dabba.doo.com".
    set req.http.X-Yada = bmatcher.sub(req.http.Host, "d");
}

regex.suball

STRING regex.suball(STRING text, STRING rewrite, STRING fallback="**SUBALL METHOD FAILED**")

Like .sub(), except that all successive non-overlapping matches in text are replaced with rewrite. This method corresponds to VCL native regsuball().

The default fallback is "**SUBALL METHOD FAILED**". .suball() fails under the same conditions as .sub().

Since only non-overlapping matches are substituted, replacing "ana" within "banana" only results in one substitution, not two.

Example:

sub vcl_init {
    new bmatcher = re2.regex("b+");
}

sub vcl_recv {
    # If Host contains "www.yabba.dabba.doo.com", then set X-Yada to
    # "www.yada.dada.doo.com".
    set req.http.X-Yada = bmatcher.suball(req.http.Host, "d");
}

regex.extract

STRING regex.extract(STRING text, STRING rewrite, STRING fallback="**EXTRACT METHOD FAILED**")

If the compiled pattern for this regex object matches text, then return rewrite with substitutions from the matching portions of text. Non-matching substrings of text are ignored.

The default fallback is "**EXTRACT METHOD FAILED**". Like .sub() and .suball(), .extract() fails if:

  • Any of text, rewrite or fallback are undefined.
  • There is insufficient workspace for the rewritten string.

Example:

sub vcl_init {
    new email = re2.regex("(.*)@([^.]*)");
}

sub vcl_deliver {
    # Sets X-UUCP to "kremvax!boris"
    set resp.http.X-UUCP = email.extract("boris@kremvax.ru", "\2!\1");
}

regex functional interface

match

BOOL match(PRIV_TASK, STRING pattern, STRING subject, BOOL utf8=0, BOOL posix_syntax=0, BOOL longest_match=0, INT max_mem=8388608, BOOL literal=0, BOOL never_nl=0, BOOL dot_nl=0, BOOL never_capture=0, BOOL case_sensitive=1, BOOL perl_classes=0, BOOL word_boundary=0, BOOL one_line=0)

Like the regex.match() method, return true if pattern matches subject, where pattern is compiled with the given options (or default options) on each invocation.

If pattern fails to compile, then an error message is logged with the VCL_Error tag, and false is returned.

Example:

# Match the bereq Host header against a backend response header
if (re2.match(pattern=bereq.http.Host, subject=beresp.http.X-Host)) {
   call do_on_match;
}

backref

STRING backref(PRIV_TASK, INT ref, STRING fallback="**BACKREF FUNCTION FAILED**")

Returns the nth captured subexpression from the most recent successful call of the match() function in the current client or backend context, or a fallback string if the capture fails. The default fallback is "**BACKREF FUNCTION FAILED**".

Similarly to the regex.backref() method, fallback is returned after any failed invocation of the match() function, or if there is no captured group corresponding to the backref number. The function is not affected by native VCL regex operations, or any other method or function of the VMOD except for the match() function.

The function fails, returning fallback and logging a VCL_Error message, under the same conditions as the corresponding method:

  • fallback is undefined.
  • never_capture was true in the previous invocation of the match() function.
  • ref is out of range.
  • The match() function was never called in this context.
  • The pattern failed to compile for the previous match() call.
  • There is insufficient workspace for the captured subexpression.

Example:

# Match against a pattern provided in a beresp header, and capture
# subexpression 1.
if (re2.match(pattern=beresp.http.X-Pattern, bereq.http.X-Foo)) {
   set beresp.http.X-Capture = re2.backref(1);
}

namedref

STRING namedref(PRIV_TASK, STRING name, STRING fallback="**NAMEDREF FUNCTION FAILED**")

Returns the captured subexpression designated by name from the most recent successful call to the match() function in the current context, or fallback in case of failure. The default fallback is "**NAMEDREF FUNCTION FAILED**".

The function returns fallback when the previous invocation of the match() function failed, and is only affected by use of the match() function. The function fails, returning fallback and logging a VCL_Error message, under the same conditions as the corresponding method:

  • fallback is undefined.
  • name is undefined or the empty string.
  • The never_capture option was set to true.
  • There is no such named group.
  • match() was not called in this context.
  • The pattern failed to compile for the previous match() call.
  • There is insufficient workspace for the captured expression.

Example:

if (re2.match(beresp.http.X-Pattern-With-Names, bereq.http.X-Foo)) {
   set beresp.http.X-Capture = re2.namedref("foo");
}

sub

STRING sub(STRING pattern, STRING text, STRING rewrite, STRING fallback="**SUB FUNCTION FAILED**", BOOL utf8=0, BOOL posix_syntax=0, BOOL longest_match=0, INT max_mem=8388608, BOOL literal=0, BOOL never_nl=0, BOOL dot_nl=0, BOOL never_capture=0, BOOL case_sensitive=1, BOOL perl_classes=0, BOOL word_boundary=0, BOOL one_line=0)

Compiles pattern with the given options, and if it matches text, then return the result of replacing the first match in text with rewrite. As with the regex.sub() method, \0 through \9 may be used in rewrite to substitute captured groups from the pattern.

fallback is returned if the pattern does not match text. The default fallback is "**SUB FUNCTION FAILED**".

sub() fails, returning fallback and logging a VCL_Error message, if:

  • pattern cannot be compiled.
  • Any of text, rewrite or fallback are undefined.
  • There is insufficient workspace for the rewritten string.

Example:

# If the beresp header X-Sub-Letters contains "b+", and Host contains
# "www.yabba.dabba.doo.com", then set X-Yada to
# "www.yada.dabba.doo.com".
set beresp.http.X-Yada = re2.sub(beresp.http.X-Sub-Letters,
                                 bereq.http.Host, "d");

suball

STRING suball(STRING pattern, STRING text, STRING rewrite, STRING fallback="**SUBALL FUNCTION FAILED**", BOOL utf8=0, BOOL posix_syntax=0, BOOL longest_match=0, INT max_mem=8388608, BOOL literal=0, BOOL never_nl=0, BOOL dot_nl=0, BOOL never_capture=0, BOOL case_sensitive=1, BOOL perl_classes=0, BOOL word_boundary=0, BOOL one_line=0)

Like the sub() function, except that all successive non-overlapping matches in text are replace with rewrite.

The default fallback is "**SUBALL FUNCTION FAILED**". The suball() function fails under the same conditions as sub().

Example:

# If the beresp header X-Sub-Letters contains "b+", and Host contains
# "www.yabba.dabba.doo.com", then set X-Yada to
# "www.yada.dada.doo.com".
set beresp.http.X-Yada = re2.suball(beresp.http.X-Sub-Letters,
                                    bereq.http.Host, "d");

extract

STRING extract(STRING pattern, STRING text, STRING rewrite, STRING fallback="**EXTRACT FUNCTION FAILED**", BOOL utf8=0, BOOL posix_syntax=0, BOOL longest_match=0, INT max_mem=8388608, BOOL literal=0, BOOL never_nl=0, BOOL dot_nl=0, BOOL never_capture=0, BOOL case_sensitive=1, BOOL perl_classes=0, BOOL word_boundary=0, BOOL one_line=0)

Compiles pattern with the given options, and if it matches text, then return rewrite with substitutions from the matching portions of text, ignoring the non-matching portions.

The default fallback is "**EXTRACT FUNCTION FAILED**". The extract() function fails under the same conditions as sub() and suball().

Example:

# If beresp header X-Params contains "(foo|bar)=(baz|quux)", and the
# URL contains "bar=quux", then set X-Query to "bar:quux".
set beresp.http.X-Query = re2.extract(beresp.http.X-Params, bereq.url,
                                      "\1:\2");

set

new OBJ = set(ENUM {none,start,both} anchor="none", BOOL utf8=0, BOOL posix_syntax=0, BOOL longest_match=0, INT max_mem=8388608, BOOL literal=0, BOOL never_nl=0, BOOL dot_nl=0, BOOL case_sensitive=1, BOOL perl_classes=0, BOOL word_boundary=0, BOOL one_line=0)

Initialize a set object that represents several patterns combined by alternation -- | for "or".

Optional parameters control the interpretation of the resulting composed pattern. The anchor parameter is an enum that can have the values none, start or both, where none is the default. start means that each pattern is matched as if it begins with ^ for start-of-text, and both means that each pattern is anchored with both ^ at the beginning and $ for end-of-text at the end. none means that each pattern is interpreted as a partial match (although individual patterns within the set may have either of ^ of $).

For example, if a set is initialized with anchor=both, and the patterns foo and bar are added, then matches against the set match a string against ^foo$|^bar$, or equivalently ^(foo|bar)$.

The usual regex options can be set, which then control matching against the resulting composed pattern. However, the never_capture option cannot be set, and is always implicitly true, since backrefs and namedrefs are not possible with sets.

Example:

sub vcl_init {
      # Initialize a regex set for partial matches
      # with default options
      new foo = re2.set();

      # Initialize a regex set for case insensitive matches
      # with anchors on both ends (^ and $).
      new bar = re2.set(anchor=both, case_sensitive=false);

      # Initialize a regex set using POSIX syntax, but allowing
      # Perl character classes, and anchoring at the left (^).
      new baz = re2.set(anchor=start, posix_syntax=true,
                        perl_classes=true);
}

set.add

VOID set.add(STRING, STRING string=0, BACKEND backend=0)

Add the given pattern to the set. If the pattern is invalid, .add() fails, and the VCL will fail to load, with an error message describing the problem.

If values for the string and/or backend parameters are provided, then these values can be retrieved with the .string() and .backend() methods, respectively, as described below. This makes it possible to associate a string or a backend with the added pattern after it matches successfully. string and backend default to NULL; that is; by default the pattern is not associated with any such value.

.add() MUST be called in vcl_init, and MAY NOT be called after .compile(). If .add() is called in any other subroutine, an error message with VCL_Error is logged, and the call has no effect. If it is called in vcl_init after .compile(), then the VCL load will fail with an error message.

In other words, add all patterns to the set in vcl_init, and finally call .compile() when you're done.

When the .matched(INT) method is called after a successful match, the numbering corresponds to the order in which patterns were added. The same is true of the INT arguments that may be given for the .string() or .backend() methods.

Example:

sub vcl_init {
    # literal=true means that the dots are interpreted as literal
    # dots, not "match any character".
    new hostmatcher = re2.set(anchor=both, case_sensitive=false,
                              literal=true);
    hostmatcher.add("www.domain1.com");
    hostmatcher.add("www.domain2.com");
    hostmatcher.add("www.domain3.com");
    hostmatcher.compile();
}

# See the documentation of the .string() and .backend() methods
# below for uses of the parameters string and backend for .add().

set.compile

VOID set.compile()

Compile the compound pattern represented by the set -- an alternation of all patterns added by .add().

.compile() fails if no patterns were added to the set. It may also fail if the max_mem setting is not large enough for the composed pattern. In that case, the VCL load will fail with an error message (then consider a larger value for max_mem in the set constructor).

.compile() MUST be called in vcl_init, and MAY NOT be called more than once for a set object. If it is called in any other subroutine, a VCL_Error message is logged, and the call has no effect. If it is called a second time in vcl_init, the VCL load will fail.

See above for examples.

set.match

BOOL set.match(STRING)

Returns true if the given string matches the compound pattern represented by the set, i.e. if it matches any of the patterns that were added to the set.

The matcher identifies all of the patterns that were added to the set and match the given string. These can be determined after a successful match using the .matched(INT) and .nmatches() methods described below.

.match() MUST be called after .compile(); otherwise the match always fails.

Example:

if (hostmatcher.match(req.http.Host)) {
   call do_when_a_host_matched;
}

set.matched

BOOL set.matched(INT)

Returns true after a successful match if the nth pattern that was added to the set is among the patterns that matched, false otherwise. The numbering of the patterns corresponds to the order in which patterns were added in vcl_init, counting from 1.

The method refers back to the most recent invocation of .match() for the same object in the same client or backend context. It always returns false, for every value of the parameter, if it is called after an unsuccessful match (.match() returned false).

.matched() fails and returns false if:

  • The .match() method was not called for this object in the same client or backend scope.
  • The integer parameter is out of range; that is, if it is less than 1 or greater than the number of patterns added to the set.

On failure, the method writes an error message to the log with the tag VCL_Error; if it fails during vcl_init, then the VCL load fails with the error message. In any other VCL subroutine, the method returns false on failure and processing continues; since false is a legitimate return value, you should consider monitoring the log for the error messages.

Example:

if (hostmatcher.match(req.http.Host)) {
    if (hostmatcher.matched(1)) {
        call do_domain1;
    }
    if (hostmatcher.matched(2)) {
        call do_domain2;
    }
    if (hostmatcher.matched(3)) {
        call do_domain3;
    }
}

set.nmatches

INT set.nmatches()

Returns the number of patterns that were matched by the most recent invocation of .match() for the same object in the same client or backend context. The method always returns 0 after an unsuccessful match (.match() returned false).

If .match() was not called for this object in the same client or backend scope, .nmatches() fails and returns 0, writing an error message with VCL_Error to the log. If this happens in vcl_init, the VCL load fails with the error message. As with .matched(), .nmatches() returns a legitimate value and VCL processing continues when it fails in any other subroutine, so you should monitor the log for the error messages.

Example:

if (myset.match(req.url)) {
    std.log("URL matched " + myset.nmatches()
            + " patterns from the set");
}

set.which

INT set.which(ENUM {FIRST,LAST} select=0)

Returns a number indicating which pattern in a set matched in the most recent invocation of .match() in the client or backend context. The number corresponds to the order in which patterns were added to the set in vcl_init, counting from 1.

If exactly one pattern matched in the most recent .match() call (so that .nmatches() returns 1), then the number for that pattern is returned. The select ENUM is ignored in this case, and can be left out.

If more than one pattern matched in the most recent .match() call (.nmatches() > 1), then the select ENUM determines the integer that is returned. The values FIRST and LAST specify that, of the patterns that matched, the first or last one added via the .add() method is chosen, and the number for that pattern is returned.

.which() fails, returning 0 with a VCL_Error message in the log, if:

  • .match() was not called for the set in the current client or backend transaction, or if the previous call returned false.
  • More than one pattern in the set matched in the previous .match() call, but the select parameter is not set.

Examples:

sub vcl_init {
    new myset = re2.set();
    myset.add("foo"); # Pattern 1
    myset.add("bar"); # Pattern 2
    myset.add("baz"); # Pattern 3
    myset.compile();
}

sub vcl_recv {
    if (myset.match("bar")) {
        # myset.which() returns 2.
    }
    if (myset.which("foobaz")) {
        # myset.which() fails and returns 0, with a log
        #               message indicating that 2 patterns
        #               matched.
        # myset.which(FIRST) returns 1.
        # myset.which(LAST) returns 3.
    }
    if (myset.match("quux")) {
        # ...
    }
    else {
        # myset.which() fails and returns 0, with either or
        # no value for the select ENUM, with a log message
        # indicating that the previous .match() call was
        # unsuccessful.
    }

set.string

STRING set.string(INT n=0, ENUM {FIRST,LAST} select=0)

Returns the string associated with the nth pattern added to the set, or with the pattern in the set that matched in the most recent call to .match() in the same task scope (client or backend context).

If n > 0, then return the string associated with the nth pattern added to the set with the string parameter of the .add() method, counting from 1. This will return the nth string in any context, regardless of whether .match() was called previously, or whether a previous call returned true or false.

If n <= 0, then return the string associated with a pattern in the set that matched successfully in the most recent call to .match() in the task scope. Since n is 0 by default, n can be left out for this purpose.

If n <= 0 and exactly one pattern in the set matched in the most recent invocation of .match() (and hence .nmatches() returns 1), then the string associated with that pattern is returned. The select parameter is ignored in this case. Thus .string() can be used for this purpose with no arguments.

If n <= 0 and more than one pattern matched in the most recent .match() call (.nmatches() > 1), then the string returned is determined by the select parameter. As with .which(), FIRST and LAST specify that the first or last matching pattern added via the .add() method is chosen, and the string associated with that pattern is returned.

.string() fails, returning NULL with an a VCL_Error message in the log, if:

  • n is greater than the number of patterns in the set.
  • n <= 0 (or left to the default), but .match() was not called earlier in the same task scope (client or backend context).
  • n <= 0, but the previous .match() call returned false.
  • n <= 0 and no value is given for the select ENUM, but more than one pattern matched in the previous .match() call.
  • A pattern from the set is selected as described above, but no string was associated with that pattern; that is, the string parameter was not set in the .add() call that added the pattern.

Examples:

# Match the request URL against a set of patterns, and generate
# a synthetic redirect response with a Location header derived
# from the string assoicated with the matching pattern.

# In the first example, exactly one pattern in the set matches.

sub vcl_init {
    # With anchor=both, we specify exact matches.
    new matcher = re2.set(anchor=both);
    matcher.add("/foo/bar", "/baz/quux");
    matcher.add("/baz/bar/foo", "/baz/quux/foo");
    matcher.add("/quux/bar/baz/foo", "/baz/quux/foo/bar");
    matcher.compile();
}

sub vcl_recv {
    if (matcher.match(req.url)) {
        # Confirm that there was exactly one match
        if (matcher.nmatches() != 1) {
            return(fail);
        }
        # Divert to vcl_synth, sending the string associated
        # with the matching pattern in the "reason" field.
        return(synth(1301, matcher.string()));
    }
}

sub vcl_synth {
    # Construct a redirect response, using the path set in
    # resp.reason.
    if (resp.status == 1301) {
        set resp.http.Location
            = "http://otherdomain.org" + resp.reason;
        set resp.status = 301;
        set resp.reason = "Moved Permanently";
        return(deliver);
    }
}

# In the second example, the patterns that may match have
# common prefixes, and more than one pattern may match. We
# add patterns to the set in a "more specific" to "less
# specific" order, and we choose the most specific pattern
# that matches, by specifying the first matching pattern in
# the set.

sub vcl_init {
    # With anchor=start, we specify matching prefixes.
    new matcher = re2.set(anchor=start);
    matcher.add("/foo/bar/baz/quux", "/baz/quux");
    matcher.add("/foo/bar/baz", "/baz/quux/foo");
    matcher.add("/foo/bar", "/baz/quux/foo/bar");
    matcher.add("/foo", "/baz");
    matcher.compile();
}

sub vcl_recv {
    if (matcher.match(req.url)) {
        # Select the first matching pattern
        return(synth(1301, matcher.string(select=FIRST)));
    }
}

# vcl_synth is implemented as shown above

set.backend

BACKEND set.backend(INT n=0, ENUM {FIRST,LAST} select=0)

Returns the backend associated with the nth pattern added to the set, or with the pattern in the set that matched in the most recent call to .match() in the same task scope (client or backend context).

The rules for selecting a pattern from the set and its associated backend are the same as for .string() above:

  • If n > 0, then return the string associated with the nth pattern added to the set with the backend parameter of the .add() method, counting from 1 (independent of any previous .match() call).
  • If n <= 0 (or left to the default) and exactly one pattern in the set matched in the most recent invocation of .match() (.nmatches() == 1), then return the backend associated with that pattern (ignoring the value of select).
  • If n <= 0 and .nmatches() > 1, then return the backend associated with the first or last matching pattern in the set as determined by the select parameter.

.backend() fails, returning NULL with an a VCL_Error message in the log, under the same conditions described for .string() above.

Example:

# Choose a backend based on the URL prefix.

# In this example, assume that backends b1 through b4
# have been defined.

sub vcl_init {
    # Use anchor=start to match prefixes.
    # The prefixes are unique, so exactly one will match.
    new matcher = re2.set(anchor=start);
    matcher.add("/foo", backend=b1);
    matcher.add("/bar", backend=b2);
    matcher.add("/baz", backend=b3);
    matcher.add("/quux", backend=b4);
    matcher.compile();
}

sub vcl_recv {
    if (matcher.match(req.url)) {
        # Confirm that there was exactly one match
        if (matcher.nmatches() != 1) {
            return(fail);
        }
        # Set the backend hint to the backend associated
        # with the matching pattern.
        set req.backend_hint = matcher.backend();
    }
}

version

STRING version()

Return the version string for this VMOD.

Example:

std.log("Using VMOD re2 version: " + re2.version());

REQUIREMENTS

The VMOD requires the Varnish master branch, and is not compatible with any current released versions of Varnish. See the source repository for versions of the VMOD that are compatible with other Varnish versions.

It requires the RE2 library, and has been tested against RE2 versions 2015-06-01 through 2017-08-01.

INSTALLATION

See INSTALL.rst in the source repository.

LIMITATIONS

The VMOD allocates Varnish workspace for captured groups and rewritten strings. If operations fail with "insufficient workspace" error messages in the Varnish log (with the VCL_Error tag), increase the varnishd runtime parameters workspace_client and/or workspace_backend.

The RE2 documentation states that successful matches are slowed quite a bit when they also capture substrings. There is also additional overhead from the VMOD, unless the never_capture flag is true, to manage data about captured groups in the workspace. This overhead is incurred even if there are no capturing expressions in a pattern, since it is always possible to call backref(0) to obtain the matched portion of a string.

So if you are using a pattern only to match against strings, and never to capture subexpressions, consider setting the never_capture option to true, to eliminate the extra work for both RE2 and the VMOD.

AUTHOR

UPLEX Nils Goroll Systemoptimierung

SEE ALSO

COPYRIGHT

Copyright (c) 2016-2017 UPLEX Nils Goroll Systemoptimierung
All rights reserved

Author: Geoffrey Simmons <geoffrey.simmons@uplex.de>

See LICENSE