Commit b263b8cf authored by Geoff Simmons's avatar Geoff Simmons

Retire README.rst, use pandoc to generate README.md.

The rst source has apparently become too large for gitlab web sites
to be able to render it. So we just go with markdown, in the hopes
that gitlab can better cope with it.

It's not an error if you don't have pandoc installed, but then
README will not be kept up to date with the docs in vcc.
parent 3e94935f
......@@ -5,13 +5,19 @@ SUBDIRS = src
DISTCHECK_CONFIGURE_FLAGS = \
VMOD_DIR='$${libdir}/varnish/vmods'
EXTRA_DIST = README.rst
EXTRA_DIST = README.md
dist_doc_DATA = README.rst LICENSE COPYING
dist_doc_DATA = README.md LICENSE COPYING
README.rst: src/vmod_re2.vcc
README.md: src/vmod_re2.vcc
if HAVE_PANDOC
$(MAKE) -C src vmod_re2.man.rst
cp src/vmod_re2.man.rst README.rst
$(AM_V_GEN) $(PANDOC) -f rst -t gfm src/vmod_re2.man.rst > README.md
else
@echo "========================================================="
@echo "You need pandoc installed to generate README.md, skipping"
@echo "========================================================="
endif
coverage:
$(MAKE) $(AM_MAKEFLAGS) -C src coverage
......
# vmod\_re2
## Varnish Module for access to the Google RE2 regular expression engine
- Manual section
3
### SYNOPSIS
import re2;
# regex object interface
new OBJECT = re2.regex(STRING pattern [, <regex options>])
BOOL <obj>.match(STRING)
STRING <obj>.backref(INT ref)
STRING <obj>.namedref(STRING name)
STRING <obj>.sub(STRING text, STRING rewrite)
STRING <obj>.suball(STRING text, STRING rewrite)
STRING <obj>.extract(STRING text, STRING rewrite)
INT <obj>.cost()
# regex function interface
BOOL re2.match(STRING pattern, STRING subject [, <regex options>])
STRING re2.backref(INT ref)
STRING re2.namedref(STRING name)
STRING re2.sub(STRING pattern, STRING text, STRING rewrite
[, <regex options>])
STRING re2.suball(STRING pattern, STRING text, STRING rewrite
[, <regex options>])
STRING re2.extract(STRING pattern, STRING text, STRING rewrite
[, <regex options>])
INT re2.cost(STRING pattern [, <regex options>])
# set object interface
new OBJECT = re2.set([ENUM anchor] [, <regex options>])
VOID <obj>.add(STRING [, BOOL save] [, BOOL never_capture] [, STRING string]
[, BACKEND backend] [, INT integer])
VOID <obj>.compile()
BOOL <obj>.match(STRING)
INT <obj>.nmatches()
BOOL <obj>.matched(INT)
INT <obj>.which([ENUM select])
STRING <obj>.string([INT n,] [ENUM select])
BACKEND <obj>.backend([INT n,] [ENUM select])
INT <obj>.integer([INT n] [, ENUM select])
STRING <obj>.sub(STRING text, STRING rewrite [, INT n]
[, ENUM select])
STRING <obj>.suball(STRING text, STRING rewrite [, INT n]
[, ENUM select])
STRING <obj>.extract(STRING text, STRING rewrite [, INT n]
[, ENUM select])
BOOL <obj>.saved([ENUM {REGEX, STR, BE, INT} which] [, INT n]
[, ENUM select])
VOID <obj>.hdr_filter(HTTP [, BOOL])
# utility function
STRING re2.quotemeta(STRING)
# VMOD version
STRING re2.version()
### DESCRIPTION
Varnish Module (VMOD) for access to the Google RE2 regular expression
engine.
Varnish VCL uses the PCRE library (Perl Compatible Regular Expressions)
for its native regular expressions, which runs very efficiently for many
common uses of pattern matching in VCL, as attested by years of
successful use of PCRE with Varnish.
But for certain kinds of patterns, the worst-case running time of the
PCRE matcher is exponential in the length of the string to be matched.
The matcher uses backtracking, implemented with recursive calls to the
internal `match()` function. In principle there is no upper bound to the
possible depth of backtracking and recursion, except as imposed by the
`varnishd` runtime parameters `pcre_match_limit` and
`pcre_match_limit_recursion`; matches fail if either of these limits are
met. Stack overflow caused by deep backtracking has occasionally been
the subject of `varnishd` issues.
RE2 differs from PCRE in that it limits the syntax of patterns so that
they always specify a regular language in the formally strict sense.
Most notably, backreferences within a pattern are not permitted, for
example `(foo|bar)\1` to match `foofoo` and `barbar`, but not `foobar`
or `barfoo`. See the link in `SEE ALSO` for the specification of RE2
syntax.
This means that an RE2 matcher runs as a finite automaton, which
guarantees linear running time in the length of the matched string.
There is no backtracking, and hence no risk of deep recursion or stack
overflow.
The relative advantages and disadvantages of RE2 and PCRE is a broad
subject, beyond the scope of this manual. See the references in `SEE
ALSO` for more in-depth discussion.
#### regex object and function interfaces
The VMOD provides regular expression operations by way of the `regex`
object interface and a functional interface. For `regex` objects, the
pattern is compiled at VCL initialization time, and the compiled pattern
is re-used for each invocation of its methods. Compilation failures (due
to errors in the pattern) cause failure at initialization time, and the
VCL fails to load. The `.backref()` and `.namedref()` methods refer back
to the last invocation of the `.match()` method for the same object.
The functional interface provides the same set of operations, but the
pattern is compiled at runtime on each invocation (and then discarded).
Compilation failures are reported as errors in the Varnish log. The
`backref()` and `namedref()` functions refer back to the last invocation
of the `match()` function, for any pattern.
Compiling a pattern at runtime on each invocation is considerably more
costly than re-using a compiled pattern. So for patterns that are fixed
and known at VCL initialization, the object interface should be used.
The functional interface should only be used for patterns whose contents
are not known until runtime.
#### set object interface
`set` objects provide a shorthand for constructing patterns that consist
of an alternation -- a group of patterns combined with `|` for "or". For
example:
import re2;
sub vcl_init {
new myset = re2.set();
myset.add("foo"); # Pattern 1
myset.add("bar"); # Pattern 2
myset.add("baz"); # Pattern 3
myset.compile();
}
`myset.match(<string>)` can now be used to match a string against the
pattern `foo|bar|baz`. When a match is successful, the matcher has
determined all of the patterns that matched. These can then be retrieved
with the method `.nmatches()` for the number of matched patterns, and
with `.matched(n)`, which returns `true` if the `nth` pattern matched,
where the patterns are numbered in the order in which they were added:
if (myset.match("foobar")) {
std.log("Matched " + myset.nmatches() + " patterns");
if (myset.matched(1)) {
# Pattern /foo/ matched
call do_foo;
}
if (myset.matched(2)) {
# Pattern /bar/ matched
call do_bar;
}
if (myset.matched(3)) {
# Pattern /baz/ matched
call do_baz;
}
}
An advantage of alternations and sets with RE2, as opposed to an
alternation in PCRE or a series of separate matches in an if-elsif-elsif
sequence, comes from the fact that the matcher is implemented as a state
machine. That means that the matcher progresses through the string to be
matched just once, following patterns in the set that match through the
state machine, or determining that there is no match as soon as there
are no more possible paths in the state machine. So a string can be
matched against a large set of patterns in time that is proportional to
the length of the string to be matched. In contrast, PCRE matches
patterns in an alternation one after another, stopping after the first
matching pattern, or attempting matches against all of them if there is
no match. Thus a match against an alternation in PCRE is not unlike an
if-elsif-elsif sequence of individual matches, and requires the time
needed for each individual match, overall in proportion with the number
of patterns to be matched.
Another advantage of the VMOD's set object is the ability to associate
strings or backends with the patterns added to the set with the `.add()`
method:
sub vcl_init {
new prefix = re2.set(anchor=start);
prefix.add("/foo", string="www.domain1.com");
prefix.add("/bar", string="www.domain2.com");
prefix.add("/baz", string="www.domain3.com");
prefix.add("/quux", string="www.domain4.com");
prefix.compile();
new appmatcher = re2.set(anchor=start);
appmatcher.add("/foo", backend=app1);
appmatcher.add("/bar", backend=app2);
appmatcher.add("/baz", backend=app3);
appmatcher.add("/quux", backend=app4);
appmatcher.compile();
}
After a successful match, the string or backend associated with the
matching pattern can be retrieved with the `.string()` and `.backend()`
methods. This makes it possible, for example, to construct a redirect
response or choose the backend with code that is both efficient and
compact, even with a large set of patterns to be matched:
# Use the prefix object to construct a redirect response from
# a matching request URL.
sub vcl_recv {
if (prefix.match(req.url)) {
# Pass the string associated with the matching pattern
# to vcl_synth.
return(synth(1301, prefix.string()));
}
}
sub vcl_synth {
# The string associated with the matching pattern is in
# resp.reason.
if (resp.status == 1301) {
set resp.http.Location = "http://" + resp.reason + req.url;
set resp.status = 301;
set resp.reason = "Moved Permanently";
}
}
# Use the appmatcher object to choose a backend based on the
# request URL prefix.
sub vcl_recv {
if (appmatcher.match(req.url)) {
set req.backend_hint = appmatcher.backend();
}
}
#### regex options
Where a pattern is compiled -- in the `regex` and `set` constructors,
and in functions that require compilation -- options may be specified
that can affect the interpretation of the pattern or the operation of
the matcher. There are default values for each option, and it is only
necessary to specify options in VCL that differ from the defaults.
Options specified in a `set` constructor apply to all of the patterns in
the resulting alternation.
- `utf8`
If true, characters in a pattern match Unicode code points, and
hence may match more than one byte. If false, the pattern and
strings to be matched are interpreted as Latin-1 (ISO 8859-1), and a
pattern character matches exactly one byte. Default is **false**.
Note that this differs from the RE2 default.
- `posix_syntax`
If true, patterns are restricted to POSIX (egrep) syntax. Otherwise,
the pattern syntax resembles that of PCRE, with some deviations. See
the link in `SEE ALSO` for the syntax specification. Default is
**false**. The options `perl_classes`, `word_boundary` and
`one_line` are only consulted when this option is true.
- `longest_match`
If true, the matcher searches for the longest possible match where
alternatives are possible. Otherwise, search for the first match.
For example with the pattern `a(b|bb)` and the string `abb`, `abb`
matches when `longest_match` is true, and backref 1 is `bb`.
Otherwise, `ab` matches, and backref 1 is `b`. Default is **false**.
- `max_mem`
An upper bound (in bytes) for the size of the compiled pattern. If
`max_mem` is too small, the matcher may fall back to less efficient
algorithms, or the pattern may fail to compile. Default is the RE2
default (8MB), which should suffice for typical patterns.
- `literal`
If true, the pattern is interpreted as a literal string, and no
regex metacharacters (such as `*`, `+`, `^` and so forth) have their
special meaning. Default is **false**.
- `never_nl`
If true, the newline character `\n` in a string is never matched,
even if it appears in the pattern. Default is **false**.
- `dot_nl`
If true, then the dot character `.` in a pattern matches everything,
including newline. Otherwise, `.` never matches newline. Default is
**false**.
- `never_capture`
If true, parentheses in a pattern are interpreted as non-capturing,
and all invocations of the `backref` and `namedref` methods or
functions will fail, including `backref(0)` after a successful
match. Default is **false**, except for set objects, for which
`never_capture` is always true (and cannot be changed), since back
references are not possible with sets.
- `case_sensitive`
If true, matches are case-sensitive. A pattern can override this
option with the `(?i)` flag, unless `posix_syntax` is true. Default
is **true**.
The following options are only consulted when `posix_syntax` is true. If
`posix_syntax` is false, then these features are always enabled and
cannot be turned off.
- `perl_classes`
If true, then the perl character classes `\d`, `\s`, `\w`, `\D`,
`\S` and `\W` are permitted in a pattern. Default is **false**.
- `word_boundary`
If true, the perl assertions `\b` and `\B` (word boundary and not a
word boundary) are permitted. Default is **false**.
- `one_line`
If true, then `^` and `$` only match at the beginning and end of the
string to be matched, regardless of newlines. Otherwise, `^` also
matches just after a newline, and `$` also matches just before a
newline. Default is
**false**.
#### new xregex = re2.regex(STRING pattern, BOOL utf8, BOOL posix\_syntax, BOOL longest\_match, INT max\_mem, BOOL literal, BOOL never\_nl, BOOL dot\_nl, BOOL never\_capture, BOOL case\_sensitive, BOOL perl\_classes, BOOL word\_boundary, BOOL one\_line)
new xregex = re2.regex(
STRING pattern,
BOOL utf8=0,
BOOL posix_syntax=0,
BOOL longest_match=0,
INT max_mem=8388608,
BOOL literal=0,
BOOL never_nl=0,
BOOL dot_nl=0,
BOOL never_capture=0,
BOOL case_sensitive=1,
BOOL perl_classes=0,
BOOL word_boundary=0,
BOOL one_line=0
)
Create a regex object from `pattern` and the given options (or option
defaults). If the pattern is invalid, then VCL will fail to load and the
VCC compiler will emit an error message.
Example:
sub vcl_init {
new domainmatcher = re2.regex("^www\.([^.]+)\.com$");
new maxagematcher = re2.regex("max-age\s*=\s*(\d+)");
# Group possible subdomains without capturing
new submatcher = re2.regex("^www\.(domain1|domain2)\.com$",
never_capture=true);
}
#### BOOL xregex.match(STRING)
Returns `true` if and only if the compiled regex matches the given
string; corresponds to VCL's infix operator `~`.
Example:
if (myregex.match(req.http.Host)) {
call do_on_match;
}
#### STRING xregex.backref(INT ref, STRING fallback)
STRING xregex.backref(
INT ref,
STRING fallback="**BACKREF METHOD FAILED**"
)
Returns the nth captured subexpression from the most recent successful
call of the `.match()` method for this object in the same client or
backend, context, or a fallback string in case the capture fails.
Backref 0 indicates the entire matched string. Thus this function
behaves like the `\n` in the native VCL functions `regsub` and
`regsuball`, and the `$1`, `$2` ... variables in Perl.
Since Varnish client and backend operations run in different threads,
`.backref()` can only refer back to a `.match()` call in the same
thread. Thus a `.backref()` call in any of the `vcl_backend_*`
subroutines -- the backend context -- refers back to a previous
`.match()` in any of those same subroutines; and a call in any of the
other VCL subroutines -- the client context -- refers back to a
`.match()` in the same client context.
After unsuccessful matches, the `fallback` string is returned for any
call to `.backref()`. The default value of `fallback` is `"**BACKREF
METHOD FAILED**"`. `.backref()` always fails after a failed match, even
if `.match()` had been called successfully before the failure.
`.backref()` may also return `fallback` after a successful match, if no
captured group in the matching string corresponds to the backref number.
For example, when the pattern `(a|(b))c` matches the string `ac`, there
is no backref 2, since nothing matches `b` in the string.
The VCL infix operators `~` and `!~` do not affect this method, nor do
the functions `regsub` or `regsuball`. Nor is it affected by the matches
performed by any other method or function in this VMOD (such as the
`sub()`, `suball()` or `extract()` methods or functions, or the `set`
object's `.match()` method).
`.backref()` fails, returning `fallback` and writing an error message to
the Varnish log with the `VCL_Error` tag, under the following conditions
(even if a previous match was successful and a substring could have been
captured):
- The `fallback` string is undefined, for example if set from an unset
header variable.
- The `never_capture` option was set to `true` for this object. In
this case, even `.backref(0)` fails after a successful match
(otherwise, backref 0 always returns the full matched string).
- `ref` (the backref number) is out of range, i.e. it is larger than
the highest number for a capturing group in the pattern.
- `.match()` was never called for this object prior to calling
`.backref()`.
- There is insufficient workspace for the string to be returned.
Example:
if (domainmatcher.match(req.http.Host)) {
set req.http.X-Domain = domainmatcher.backref(1);
}
#### STRING xregex.namedref(STRING name, STRING fallback)
STRING xregex.namedref(
STRING name,
STRING fallback="**NAMEDREF METHOD FAILED**"
)
Returns the captured subexpression designated by `name` from the most
recent successful call to `.match()` in the current context (client or
backend), or `fallback` in case of failure.
Named capturing groups are written in RE2 as: `(?P<name>re)`. (Note that
this syntax with `P`, inspired by Python, differs from the notation for
named capturing groups in PCRE.) Thus when `(?P<foo>.+)bar$` matches
`bazbar`, then `.namedref("foo")` returns `baz`.
Note that a named capturing group can also be referenced as a numbered
group. So in the previous example, `.backref(1)` also returns `baz`.
`fallback` is returned when `.namedref()` is called after an
unsuccessful match. The default fallback is `"**NAMEDREF METHOD
FAILED**"`.
Like `.backref()`, `.namedref()` is not affected by native VCL regex
operations, nor by any other matches performed by methods or functions
of the VMOD, except for a prior `.match()` for the same object.
`.namedref()` fails, returning `fallback` and logging a `VCL_Error`
message, if:
- The `fallback` string is undefined.
- `name` is undefined or the empty string.
- The `never_capture` option was set to `true`.
- There is no such named group.
- `.match()` was not called for this object.
- There is insufficient workspace for the string to be returned.
Example:
sub vcl_init {
new domainmatcher = re2.regex("^www\.(?P<domain>[^.]+)\.com$");
}
sub vcl_recv {
if (domainmatcher.match(req.http.Host)) {
set req.http.X-Domain = domainmatcher.namedref("domain");
}
}
#### STRING xregex.sub(STRING text, STRING rewrite, STRING fallback)
STRING xregex.sub(
STRING text,
STRING rewrite,
STRING fallback="**SUB METHOD FAILED**"
)
If the compiled pattern for this regex object matches `text`, then
return the result of replacing the first match in `text` with `rewrite`.
Within `rewrite`, `\1` through `\9` can be used to insert the the
numbered capturing group from the pattern, and `\0` to insert the entire
matching text. This method corresponds to the VCL native function
`regsub()`.
`fallback` is returned if the pattern does not match `text`. The default
fallback is `"**SUB METHOD FAILED**"`.
`.sub()` fails, returning `fallback` and logging a `VCL_Error` message,
if:
- Any of `text`, `rewrite` or `fallback` are undefined.
- There is insufficient workspace for the rewritten string.
Example:
sub vcl_init {
new bmatcher = re2.regex("b+");
}
sub vcl_recv {
# If Host contains "www.yabba.dabba.doo.com", then this will
# set X-Yada to "www.yada.dabba.doo.com".
set req.http.X-Yada = bmatcher.sub(req.http.Host, "d");
}
#### STRING xregex.suball(STRING text, STRING rewrite, STRING fallback)
STRING xregex.suball(
STRING text,
STRING rewrite,
STRING fallback="**SUBALL METHOD FAILED**"
)
Like `.sub()`, except that all successive non-overlapping matches in
`text` are replaced with `rewrite`. This method corresponds to VCL
native `regsuball()`.
The default fallback is `"**SUBALL METHOD FAILED**"`. `.suball()` fails
under the same conditions as `.sub()`.
Since only non-overlapping matches are substituted, replacing `"ana"`
within `"banana"` only results in one substitution, not two.
Example:
sub vcl_init {
new bmatcher = re2.regex("b+");
}
sub vcl_recv {
# If Host contains "www.yabba.dabba.doo.com", then set X-Yada to
# "www.yada.dada.doo.com".
set req.http.X-Yada = bmatcher.suball(req.http.Host, "d");
}
#### STRING xregex.extract(STRING text, STRING rewrite, STRING fallback)
STRING xregex.extract(
STRING text,
STRING rewrite,
STRING fallback="**EXTRACT METHOD FAILED**"
)
If the compiled pattern for this regex object matches `text`, then
return `rewrite` with substitutions from the matching portions of
`text`. Non-matching substrings of `text` are ignored.
The default fallback is `"**EXTRACT METHOD FAILED**"`. Like `.sub()` and
`.suball()`, `.extract()` fails if:
- Any of `text`, `rewrite` or `fallback` are undefined.
- There is insufficient workspace for the rewritten string.
Example:
sub vcl_init {
new email = re2.regex("(.*)@([^.]*)");
}
sub vcl_deliver {
# Sets X-UUCP to "kremvax!boris"
set resp.http.X-UUCP = email.extract("boris@kremvax.ru", "\2!\1");
}
#### INT xregex.cost()
Return a numeric measurement \> 0 for this regex object from the RE2
library. According to the RE2 documentation:
> ... a very approximate measure of a regexp's "cost". Larger numbers
> are more expensive than smaller numbers.
The absolute numeric values are opaque and not relevant, but they are
meaningful relative to one another -- more complex regexen have a higher
cost than less complex regexen. This may be useful during development
and optimization of regular
expressions.
Example:
std.log("r1 cost=" + r1.cost() + " r_alt cost=" + r_alt.cost());
### regex functional interface
#### BOOL match(STRING pattern, STRING subject, BOOL utf8, BOOL posix\_syntax, BOOL longest\_match, INT max\_mem, BOOL literal, BOOL never\_nl, BOOL dot\_nl, BOOL never\_capture, BOOL case\_sensitive, BOOL perl\_classes, BOOL word\_boundary, BOOL one\_line)
BOOL match(
STRING pattern,
STRING subject,
BOOL utf8=0,
BOOL posix_syntax=0,
BOOL longest_match=0,
INT max_mem=8388608,
BOOL literal=0,
BOOL never_nl=0,
BOOL dot_nl=0,
BOOL never_capture=0,
BOOL case_sensitive=1,
BOOL perl_classes=0,
BOOL word_boundary=0,
BOOL one_line=0
)
Like the `regex.match()` method, return `true` if `pattern` matches
`subject`, where `pattern` is compiled with the given options (or
default options) on each invocation.
If `pattern` fails to compile, then an error message is logged with the
`VCL_Error` tag, and `false` is returned.
Example:
# Match the bereq Host header against a backend response header
if (re2.match(pattern=bereq.http.Host, subject=beresp.http.X-Host)) {
call do_on_match;
}
#### STRING backref(INT ref, STRING fallback)
STRING backref(
INT ref,
STRING fallback="**BACKREF FUNCTION FAILED**"
)
Returns the nth captured subexpression from the most recent successful
call of the `match()` function in the current client or backend context,
or a fallback string if the capture fails. The default `fallback` is
`"**BACKREF FUNCTION FAILED**"`.
Similarly to the `regex.backref()` method, `fallback` is returned after
any failed invocation of the `match()` function, or if there is no
captured group corresponding to the backref number. The function is not
affected by native VCL regex operations, or any other method or function
of the VMOD except for the `match()` function.
The function fails, returning `fallback` and logging a `VCL_Error`
message, under the same conditions as the corresponding method:
- `fallback` is undefined.
- `never_capture` was true in the previous invocation of the `match()`
function.
- `ref` is out of range.
- The `match()` function was never called in this context.
- The pattern failed to compile for the previous `match()` call.
- There is insufficient workspace for the captured subexpression.
Example:
# Match against a pattern provided in a beresp header, and capture
# subexpression 1.
if (re2.match(pattern=beresp.http.X-Pattern, bereq.http.X-Foo)) {
set beresp.http.X-Capture = re2.backref(1);
}
#### STRING namedref(STRING name, STRING fallback)
STRING namedref(
STRING name,
STRING fallback="**NAMEDREF FUNCTION FAILED**"
)
Returns the captured subexpression designated by `name` from the most
recent successful call to the `match()` function in the current context,
or `fallback` in case of failure. The default fallback is `"**NAMEDREF
FUNCTION FAILED**"`.
The function returns `fallback` when the previous invocation of the
`match()` function failed, and is only affected by use of the `match()`
function. The function fails, returning `fallback` and logging a
`VCL_Error` message, under the same conditions as the corresponding
method:
- `fallback` is undefined.
- `name` is undefined or the empty string.
- The `never_capture` option was set to `true`.
- There is no such named group.
- `match()` was not called in this context.
- The pattern failed to compile for the previous `match()` call.
- There is insufficient workspace for the captured expression.
Example:
if (re2.match(beresp.http.X-Pattern-With-Names, bereq.http.X-Foo)) {
set beresp.http.X-Capture = re2.namedref("foo");
}
#### STRING sub(STRING pattern, STRING text, STRING rewrite, STRING fallback, BOOL utf8, BOOL posix\_syntax, BOOL longest\_match, INT max\_mem, BOOL literal, BOOL never\_nl, BOOL dot\_nl, BOOL never\_capture, BOOL case\_sensitive, BOOL perl\_classes, BOOL word\_boundary, BOOL one\_line)
STRING sub(
STRING pattern,
STRING text,
STRING rewrite,
STRING fallback="**SUB FUNCTION FAILED**",
BOOL utf8=0,
BOOL posix_syntax=0,
BOOL longest_match=0,
INT max_mem=8388608,
BOOL literal=0,
BOOL never_nl=0,
BOOL dot_nl=0,
BOOL never_capture=0,
BOOL case_sensitive=1,
BOOL perl_classes=0,
BOOL word_boundary=0,
BOOL one_line=0
)
Compiles `pattern` with the given options, and if it matches `text`,
then return the result of replacing the first match in `text` with
`rewrite`. As with the `regex.sub()` method, `\0` through `\9` may be
used in `rewrite` to substitute captured groups from the pattern.
`fallback` is returned if the pattern does not match `text`. The default
fallback is `"**SUB FUNCTION FAILED**"`.
`sub()` fails, returning `fallback` and logging a `VCL_Error` message,
if:
- `pattern` cannot be compiled.
- Any of `text`, `rewrite` or `fallback` are undefined.
- There is insufficient workspace for the rewritten
string.
Example:
# If the beresp header X-Sub-Letters contains "b+", and Host contains
# "www.yabba.dabba.doo.com", then set X-Yada to
# "www.yada.dabba.doo.com".
set beresp.http.X-Yada = re2.sub(beresp.http.X-Sub-Letters,
bereq.http.Host, "d");
#### STRING suball(STRING pattern, STRING text, STRING rewrite, STRING fallback, BOOL utf8, BOOL posix\_syntax, BOOL longest\_match, INT max\_mem, BOOL literal, BOOL never\_nl, BOOL dot\_nl, BOOL never\_capture, BOOL case\_sensitive, BOOL perl\_classes, BOOL word\_boundary, BOOL one\_line)
STRING suball(
STRING pattern,
STRING text,
STRING rewrite,
STRING fallback="**SUBALL FUNCTION FAILED**",
BOOL utf8=0,
BOOL posix_syntax=0,
BOOL longest_match=0,
INT max_mem=8388608,
BOOL literal=0,
BOOL never_nl=0,
BOOL dot_nl=0,
BOOL never_capture=0,
BOOL case_sensitive=1,
BOOL perl_classes=0,
BOOL word_boundary=0,
BOOL one_line=0
)
Like the `sub()` function, except that all successive non-overlapping
matches in `text` are replace with `rewrite`.
The default fallback is `"**SUBALL FUNCTION FAILED**"`. The `suball()`
function fails under the same conditions as
`sub()`.
Example:
# If the beresp header X-Sub-Letters contains "b+", and Host contains
# "www.yabba.dabba.doo.com", then set X-Yada to
# "www.yada.dada.doo.com".
set beresp.http.X-Yada = re2.suball(beresp.http.X-Sub-Letters,
bereq.http.Host, "d");
#### STRING extract(STRING pattern, STRING text, STRING rewrite, STRING fallback, BOOL utf8, BOOL posix\_syntax, BOOL longest\_match, INT max\_mem, BOOL literal, BOOL never\_nl, BOOL dot\_nl, BOOL never\_capture, BOOL case\_sensitive, BOOL perl\_classes, BOOL word\_boundary, BOOL one\_line)
STRING extract(
STRING pattern,
STRING text,
STRING rewrite,
STRING fallback="**EXTRACT FUNCTION FAILED**",
BOOL utf8=0,
BOOL posix_syntax=0,
BOOL longest_match=0,
INT max_mem=8388608,
BOOL literal=0,
BOOL never_nl=0,
BOOL dot_nl=0,
BOOL never_capture=0,
BOOL case_sensitive=1,
BOOL perl_classes=0,
BOOL word_boundary=0,
BOOL one_line=0
)
Compiles `pattern` with the given options, and if it matches `text`,
then return `rewrite` with substitutions from the matching portions of
`text`, ignoring the non-matching portions.
The default fallback is `"**EXTRACT FUNCTION FAILED**"`. The `extract()`
function fails under the same conditions as `sub()` and `suball()`.
Example:
# If beresp header X-Params contains "(foo|bar)=(baz|quux)", and the
# URL contains "bar=quux", then set X-Query to "bar:quux".
set beresp.http.X-Query = re2.extract(beresp.http.X-Params, bereq.url,
"\1:\2");
#### INT cost(STRING pattern, BOOL utf8, BOOL posix\_syntax, BOOL longest\_match, INT max\_mem, BOOL literal, BOOL never\_nl, BOOL dot\_nl, BOOL never\_capture, BOOL case\_sensitive, BOOL perl\_classes, BOOL word\_boundary, BOOL one\_line)
INT cost(
STRING pattern,
BOOL utf8=0,
BOOL posix_syntax=0,
BOOL longest_match=0,
INT max_mem=8388608,
BOOL literal=0,
BOOL never_nl=0,
BOOL dot_nl=0,
BOOL never_capture=0,
BOOL case_sensitive=1,
BOOL perl_classes=0,
BOOL word_boundary=0,
BOOL one_line=0
)
Like the `.cost()` method above, return a numeric measurement \> 0 from
the RE2 library for `pattern` with the given options. More complex
regexen have a higher cost than less complex regexen.
Fails and returns -1 if `pattern` cannot be compiled.
Example:
std.log("simple cost=" + re2.cost("simple")
+ " complex cost=" + re2.cost("complex{1,128}"));
#### new xset = re2.set(ENUM anchor, BOOL utf8, BOOL posix\_syntax, BOOL longest\_match, INT max\_mem, BOOL literal, BOOL never\_nl, BOOL dot\_nl, BOOL case\_sensitive, BOOL perl\_classes, BOOL word\_boundary, BOOL one\_line)
new xset = re2.set(
ENUM {none, start, both} anchor=none,
BOOL utf8=0,
BOOL posix_syntax=0,
BOOL longest_match=0,
INT max_mem=8388608,
BOOL literal=0,
BOOL never_nl=0,
BOOL dot_nl=0,
BOOL case_sensitive=1,
BOOL perl_classes=0,
BOOL word_boundary=0,
BOOL one_line=0
)
Initialize a set object that represents several patterns combined by
alternation -- `|` for "or".
Optional parameters control the interpretation of the resulting composed
pattern. The `anchor` parameter is an enum that can have the values
`none`, `start` or `both`, where `none` is the default. `start` means
that each pattern is matched as if it begins with `^` for start-of-text,
and `both` means that each pattern is anchored with both `^` at the
beginning and `$` for end-of-text at the end. `none` means that each
pattern is interpreted as a partial match (although individual patterns
within the set may have either of `^` of `$`).
For example, if a set is initialized with `anchor=both`, and the
patterns `foo` and `bar` are added, then matches against the set match a
string against `^foo$|^bar$`, or equivalently `^(foo|bar)$`.
The usual regex options can be set, which then control matching against
the resulting composed pattern. However, the `never_capture` option
cannot be set, and is always implicitly true, since backrefs and
namedrefs are not possible with sets.
Example:
sub vcl_init {
# Initialize a regex set for partial matches
# with default options
new foo = re2.set();
# Initialize a regex set for case insensitive matches
# with anchors on both ends (^ and $).
new bar = re2.set(anchor=both, case_sensitive=false);
# Initialize a regex set using POSIX syntax, but allowing
# Perl character classes, and anchoring at the left (^).
new baz = re2.set(anchor=start, posix_syntax=true,
perl_classes=true);
}
#### VOID xset.add(STRING, \[STRING string\], \[BACKEND backend\], \[BOOL save\], \[BOOL never\_capture\], \[INT integer\])
VOID xset.add(
STRING,
[STRING string],
[BACKEND backend],
[BOOL save],
[BOOL never_capture],
[INT integer]
)
Add the given pattern to the set. If the pattern is invalid, `.add()`
fails, and the VCL will fail to load, with an error message describing
the problem.
If values for the `string`, `backend` and/or `integer` parameters are
provided, then these values can be retrieved with the `.string()`,
`.backend()` and `.integer()` methods, respectively, as described below.
This makes it possible to associate data with the added pattern after it
matches successfully. By default the pattern is not associated with any
such value.
If `save` is true, then the given pattern is compiled and saved as a
`regex` object, just as if the `regex` constructor described above is
invoked. This object is stored internally in the `set` object as an
independent matcher, separate from "compound" pattern formed by the set
as an alternation of the patterns added to it. By default, `save` is
**false**.
When the `.match()` method on the set is successful, and one of the
patterns that matched is associated with a saved internal `regex`
object, then that object may be used for subsequent method invocations
such as `.sub()` on the set object, whose meanings are the same as
documented above for `regex` objects. Details are described below.
When an internal `regex` object is saved (i.e. when `save` is true), it
is compiled with the same options that were provided to the set object
in the constructor. The `never_capture` option can also be set to false
for the individual regex, even though it is implicitly set to true for
the full set object (default is false).
`.add()` MUST be called in `vcl_init`, and MAY NOT be called after
`.compile()`. If `.add()` is called in any other subroutine, an error
message with `VCL_Error` is logged, and the call has no effect. If it is
called in `vcl_init` after `.compile()`, then the VCL load will fail
with an error message.
In other words, add all patterns to the set in `vcl_init`, and finally
call `.compile()` when you're done.
When the `.matched(INT)` method is called after a successful match, the
numbering corresponds to the order in which patterns were added. The
same is true of the INT arguments that may be given for methods such as
`.string()`, `.backend()` or `.sub()`, as described below.
Example:
sub vcl_init {
# literal=true means that the dots are interpreted as literal
# dots, not "match any character".
new hostmatcher = re2.set(anchor=both, case_sensitive=false,
literal=true);
hostmatcher.add("www.domain1.com");
hostmatcher.add("www.domain2.com");
hostmatcher.add("www.domain3.com");
hostmatcher.compile();
}
# See the documentation of the .string() and .backend() methods
# below for uses of the parameters string and backend for .add().
#### VOID xset.compile()
Compile the compound pattern represented by the set -- an alternation of
all patterns added by `.add()`.
`.compile()` fails if no patterns were added to the set. It may also
fail if the `max_mem` setting is not large enough for the composed
pattern. In that case, the VCL load will fail with an error message
(then consider a larger value for `max_mem` in the set constructor).
`.compile()` MUST be called in `vcl_init`, and MAY NOT be called more
than once for a set object. If it is called in any other subroutine, a
`VCL_Error` message is logged, and the call has no effect. If it is
called a second time in `vcl_init`, the VCL load will fail.
See above for examples.
#### BOOL xset.match(STRING)
Returns `true` if the given string matches the compound pattern
represented by the set, i.e. if it matches any of the patterns that were
added to the set.
The matcher identifies all of the patterns that were added to the set
and match the given string. These can be determined after a successful
match using the `.matched(INT)` and `.nmatches()` methods described
below.
`.match()` MUST be called after `.compile()`; otherwise the match always
fails.
A match may also fail (returning `false`) if the internal memory limit
imposed by the `max_mem` parameter in the constructor is exceeded. (With
the default value of `max_mem`, this ordinarily requires very large
patterns and/or a very large string to be matched.) Since about version
2017-12-01, the RE2 library reports this condition; if so, the VMOD
writes a `VCL_Error` message in the log if it happens, except during
`vcl_init`, in which case the VCL load fails with the error message. If
matches fail due to the out-of-memory condition, increase the `max_mem`
parameter in the constructor.
Example:
if (hostmatcher.match(req.http.Host)) {
call do_when_a_host_matched;
}
#### BOOL xset.matched(INT)
Returns `true` after a successful match if the `nth` pattern that was
added to the set is among the patterns that matched, `false` otherwise.
The numbering of the patterns corresponds to the order in which patterns
were added in `vcl_init`, counting from 1.
The method refers back to the most recent invocation of `.match()` for
the same object in the same client or backend context. It always returns
`false`, for every value of the parameter, if it is called after an
unsuccessful match (`.match()` returned `false`).
`.matched()` fails and returns `false` if:
- The `.match()` method was not called for this object in the same
client or backend scope.
- The integer parameter is out of range; that is, if it is less than 1
or greater than the number of patterns added to the set.
On failure, the method writes an error message to the log with the tag
`VCL_Error`; if it fails during `vcl_init`, then the VCL load fails with
the error message. In any other VCL subroutine, the method returns
`false` on failure and processing continues; since `false` is a
legitimate return value, you should consider monitoring the log for the
error messages.
Example:
if (hostmatcher.match(req.http.Host)) {
if (hostmatcher.matched(1)) {
call do_domain1;
}
if (hostmatcher.matched(2)) {
call do_domain2;
}
if (hostmatcher.matched(3)) {
call do_domain3;
}
}
#### INT xset.nmatches()
Returns the number of patterns that were matched by the most recent
invocation of `.match()` for the same object in the same client or
backend context. The method always returns 0 after an unsuccessful match
(`.match()` returned `false`).
If `.match()` was not called for this object in the same client or
backend scope, `.nmatches()` fails and returns 0, writing an error
message with `VCL_Error` to the log. If this happens in `vcl_init`, the
VCL load fails with the error message. As with `.matched()`,
`.nmatches()` returns a legitimate value and VCL processing continues
when it fails in any other subroutine, so you should monitor the log for
the error messages.
Example:
if (myset.match(req.url)) {
std.log("URL matched " + myset.nmatches()
+ " patterns from the set");
}
#### INT xset.which(ENUM {FIRST, LAST, UNIQUE} select=UNIQUE)
Returns a number indicating which pattern in a set matched in the most
recent invocation of `.match()` in the client or backend context. The
number corresponds to the order in which patterns were added to the set
in `vcl_init`, counting from 1.
If exactly one pattern matched in the most recent `.match()` call (so
that `.nmatches()` returns 1), and the `select` ENUM is set to `UNIQUE`,
then the number for that pattern is returned. `select` defaults to
`UNIQUE`, so it can be left out in this case.
If more than one pattern matched in the most recent `.match()` call
(`.nmatches()` \> 1), then the `select` ENUM determines the integer that
is returned. The values `FIRST` and `LAST` specify that, of the patterns
that matched, the first or last one added via the `.add()` method is
chosen, and the number for that pattern is returned.
`.which()` fails, returning 0 with a `VCL_Error` message in the log, if:
- `.match()` was not called for the set in the current client or
backend transaction, or if the previous call returned `false`.
- More than one pattern in the set matched in the previous `.match()`
call, but the `select` parameter is set to `UNIQUE` (or left out,
since `select` defaults to `UNIQUE`).
Examples:
sub vcl_init {
new myset = re2.set();
myset.add("foo"); # Pattern 1
myset.add("bar"); # Pattern 2
myset.add("baz"); # Pattern 3
myset.compile();
}
sub vcl_recv {
if (myset.match("bar")) {
# myset.which() returns 2.
}
if (myset.which("foobaz")) {
# myset.which() fails and returns 0, with a log
# message indicating that 2 patterns
# matched.
# myset.which(FIRST) returns 1.
# myset.which(LAST) returns 3.
}
if (myset.match("quux")) {
# ...
}
else {
# myset.which() fails and returns 0, with either or
# no value for the select ENUM, with a log message
# indicating that the previous .match() call was
# unsuccessful.
}
#### STRING xset.string(INT n, ENUM select)
STRING xset.string(
INT n=0,
ENUM {FIRST, LAST, UNIQUE} select=UNIQUE
)
Returns the string associated with the nth pattern added to the set, or
with the pattern in the set that matched in the most recent call to
`.match()` in the same task scope (client or backend context). The
string set with the `string` parameter of the `.add()` method during
`vcl_init` is returned.
The pattern is identified with the parameters `n` and `select` according
to these rules, which also hold for all further `set` methods documented
in the following.
- If `n` \> 0, then select the nth pattern added to the set with the
`.add()` method, counting from 1. This identifies the nth pattern in
any context, regardless of whether `.match()` was called previously,
or whether a previous call returned `true` or `false`. The `select`
parameter is ignored in this case.
- If `n` \<= 0, then select a pattern in the set that matched
successfully in the most recent call to `.match()` in the same task
scope. Since `n` is 0 by default, `n` can be left out for this
purpose.
- If `n` \<= 0 and exactly one pattern in the set matched in the most
recent invocation of `.match()` (and hence `.nmatches()` returns 1),
and `select` is set to `UNIQUE`, then select that pattern. `select`
defaults to `UNIQUE`, so when exactly one pattern in the set
matched, both `n` and `select` can be left out.
- If `n` \<= 0 and more than one pattern matched in the most recent
`.match()` call (`.nmatches()` \> 1), then the selection of a
pattern is determined by the `select` parameter. As with `.which()`,
`FIRST` and `LAST` specify the first or last matching pattern added
via the `.add()` method.
For the pattern selected by these rules, return the string that was set
with the `string` parameter in the `.add()` method that added the
pattern to the set.
`.string()` fails, returning NULL with an a `VCL_Error` message in the
log, if:
- The values of `n` and `select` are invalid:
- `n` is greater than the number of patterns in the set.
- `n` \<= 0 (or left to the default), but `.match()` was not
called earlier in the same task scope (client or backend
context).
- `n` \<= 0, but the previous `.match()` call returned `false`.
- `n` \<= 0 and the `select` ENUM is `UNIQUE` (or default), but
more than one pattern matched in the previous `.match()` call.
- No string was associated with the pattern selected by `n` and
`select`; that is, the `string` parameter was not set in the
`.add()` call that added the pattern.
Examples:
# Match the request URL against a set of patterns, and generate
# a synthetic redirect response with a Location header derived
# from the string assoicated with the matching pattern.
# In the first example, exactly one pattern in the set matches.
sub vcl_init {
# With anchor=both, we specify exact matches.
new matcher = re2.set(anchor=both);
matcher.add("/foo/bar", "/baz/quux");
matcher.add("/baz/bar/foo", "/baz/quux/foo");
matcher.add("/quux/bar/baz/foo", "/baz/quux/foo/bar");
matcher.compile();
}
sub vcl_recv {
if (matcher.match(req.url)) {
# Confirm that there was exactly one match
if (matcher.nmatches() != 1) {
return(fail);
}
# Divert to vcl_synth, sending the string associated
# with the matching pattern in the "reason" field.
return(synth(1301, matcher.string()));
}
}
sub vcl_synth {
# Construct a redirect response, using the path set in
# resp.reason.
if (resp.status == 1301) {
set resp.http.Location
= "http://otherdomain.org" + resp.reason;
set resp.status = 301;
set resp.reason = "Moved Permanently";
return(deliver);
}
}
# In the second example, the patterns that may match have
# common prefixes, and more than one pattern may match. We
# add patterns to the set in a "more specific" to "less
# specific" order, and we choose the most specific pattern
# that matches, by specifying the first matching pattern in
# the set.
sub vcl_init {
# With anchor=start, we specify matching prefixes.
new matcher = re2.set(anchor=start);
matcher.add("/foo/bar/baz/quux", "/baz/quux");
matcher.add("/foo/bar/baz", "/baz/quux/foo");
matcher.add("/foo/bar", "/baz/quux/foo/bar");
matcher.add("/foo", "/baz");
matcher.compile();
}
sub vcl_recv {
if (matcher.match(req.url)) {
# Select the first matching pattern
return(synth(1301, matcher.string(select=FIRST)));
}
}
# vcl_synth is implemented as shown above
#### BACKEND xset.backend(INT n, ENUM select)
BACKEND xset.backend(
INT n=0,
ENUM {FIRST, LAST, UNIQUE} select=UNIQUE
)
Returns the backend associated with the nth pattern added to the set, or
with the pattern in the set that matched in the most recent call to
`.match()` in the same task scope (client or backend context).
The rules for selecting a pattern from the set and its associated
backend based on `n` and `select` are the same as described above for
`.string()`.
`.backend()` fails, returning NULL with an a `VCL_Error` message in the
log, under the same conditions described for `.string()` above -- `n`
and `select` are invalid, or no backend was associated with the selected
pattern with the `.add()` method.
Example:
# Choose a backend based on the URL prefix.
# In this example, assume that backends b1 through b4
# have been defined.
sub vcl_init {
# Use anchor=start to match prefixes.
# The prefixes are unique, so exactly one will match.
new matcher = re2.set(anchor=start);
matcher.add("/foo", backend=b1);
matcher.add("/bar", backend=b2);
matcher.add("/baz", backend=b3);
matcher.add("/quux", backend=b4);
matcher.compile();
}
sub vcl_recv {
if (matcher.match(req.url)) {
# Confirm that there was exactly one match
if (matcher.nmatches() != 1) {
return(fail);
}
# Set the backend hint to the backend associated
# with the matching pattern.
set req.backend_hint = matcher.backend();
}
}
#### INT xset.integer(INT n, ENUM select)
INT xset.integer(
INT n=0,
ENUM {FIRST, LAST, UNIQUE} select=UNIQUE
)
Returns the integer associated with the nth pattern added to the set, or
with the pattern in the set that matched in the most recent call to
`.match()` in the same task scope.
The rules for selecting a pattern from the set and its associated
integer based on `n` and `select` are the same as described above for
`.string()`.
`.integer()` invokes VCL failure under the same error conditions
described for `.string()` above -- `n` and `select` are invalid, or no
integer was associated with the selected pattern with the `.add()`
method.
Note that VCL failure differs from the failure mode for `.string()` and
`.backend()`, since there is no distinguished "error" value that could
be returned as the INT. VCL failure has the same effect as if
`return(fail)` were called from a VCL subroutine; usually, control
directs immediately to `vcl_synth`, with the response status set to 503,
and the response reason set to "VCL failed".
You can avoid that, for example, by testing if `.nmatches()==1` after
calling `.match()`, if you need to ensure that calling
`.integer(select=UNIQUE)` will not fail.
Example:
# Generate redirect responses based on the Host header. In the
# example, subdomains are removed in the new Location, and the
# associated integer is used to set the redirect status code.
sub vcl_init {
# No more than one pattern can match the same string. So it
# is safe to call .integer() with default select=UNIQUE in
# vcl_recv below (no risk of VCL failure).
new redir = re2.set(anchor=both);
redir.add("www\.[^.]+\.foo\.com", integer=301, string="www.foo.com");
redir.add("www\.[^.]+\.bar\.com", integer=302, string="www.bar.com");
redir.add("www\.[^.]+\.baz\.com", integer=303, string="www.baz.com");
redir.add("www\.[^.]+\.quux\.com", integer=307, string="www.quux.com");
redir.compile();
}
sub vcl_recv {
if (redir.match(req.http.Host)) {
# Construct a Location header that will be used in the
# synthetic redirect response.
set req.http.Location = "http://" + redir.string() + req.url;
# Set the response status from the associated integer.
return( synth(redir.integer()) );
}
}
sub vcl_synth {
if (resp.status >= 301 && resp.status <= 307) {
# We come here from the synth return for the redirect
# response. The status code was set from .integer().
set resp.http.Location = req.http.Location;
return(deliver);
}
}
#### STRING xset.sub(STRING text, STRING rewrite, STRING fallback, INT n, ENUM select)
STRING xset.sub(
STRING text,
STRING rewrite,
STRING fallback="**SUB METHOD FAILED**",
INT n=0,
ENUM {FIRST, LAST, UNIQUE} select=UNIQUE
)
Returns the result of the method call `.sub(text, rewrite, fallback)`,
as documented above for the `regex` interface, invoked on the nth
pattern added to the set, or on the pattern in the set that matched in
the most recent call to `.match()` in the same task scope.
`.sub()` requires that the pattern it identifies was saved as an
internal `regex` object, by setting `save` to true when it was added
with the `.add()` method.
The associated pattern is determined by `n` and `select` according to
the rules given above. If an internal `regex` object was saved for that
pattern, then the result of the `.sub()` method invoked on that object
is returned.
`.sub()` fails, returning NULL with a `VCL_Error` message in the log,
if:
- The values of `n` and `select` are invalid, according to the rules
given above.
- `save` was false in the `.add()` method for the pattern identified
by `n` and `select`; that is, no internal `regex` object was saved
on which the `.sub()` method could have been invoked.
- The `.sub()` method invoked on the `regex` object fails for any of
the reasons described for `regex.sub()`.
Examples:
# Generate synthethic redirect responses on URLs that match a set of
# patterns, rewriting the URL according to the matched pattern.
# In this example, we set the new URL in the redirect location to
# the path that comes after the prefix of the original req.url.
sub vcl_init {
new matcher = re2.set(anchor=start);
matcher.add("/foo/(.*)", save=true);
matcher.add("/bar/(.*)", save=true);
matcher.add("/baz/(.*)", save=true);
matcher.compile();
}
sub vcl_recv {
if (matcher.match(req.url)) {
if (matcher.nmatches() != 1) {
return(fail);
}
return(synth(1301));
}
}
sub vcl_synth {
if (resp.status == 1301) {
# matcher.sub() rewrites the URL to the subpath after the
# original prefix.
set resp.http.Location
= "http://www.otherdomain.org" + matcher.sub(req.url, "/\1");
return(deliver);
}
}
#### STRING xset.suball(STRING text, STRING rewrite, STRING fallback, INT n, ENUM select)
STRING xset.suball(
STRING text,
STRING rewrite,
STRING fallback="**SUBALL METHOD FAILED**",
INT n=0,
ENUM {FIRST, LAST, UNIQUE} select=UNIQUE
)
Like the `.sub()` method, this returns the result of calling
`.suball(text, rewrite, fallback)` from the regex interface on the nth
pattern added to the set, or the pattern that most recently matched in a
`.match()` call.
`.suball()` is subject to the same conditions as the `.sub()` method:
- The pattern to which it is applied is identified by `n` and `select`
according to the rules given above.
- It fails if:
- The pattern that it identifies was not saved with
`.add(save=true)`.
- The values of `n` or `select` are invalid.
- The `.suball()` method invoked on the saved `regex` object
fails.
Example:
# In any URL that matches one of the words given below, replace all
# occurrences of the matching word with "quux" (for example to
# rewrite path components or elements of query strings).
sub vcl_init {
new matcher = re2.set();
matcher.add("\bfoo\b", save=true);
matcher.add("\bbar\b", save=true);
matcher.add("\bbaz\b", save=true);
matcher.compile();
}
sub vcl_recv {
if (matcher.match(req.url)) {
if (matcher.nmatches() != 1) {
return(fail);
}
set req.url = matcher.suball(req.url, "quux");
}
}
#### STRING xset.extract(STRING text, STRING rewrite, STRING fallback, INT n, ENUM select)
STRING xset.extract(
STRING text,
STRING rewrite,
STRING fallback="**EXTRACT METHOD FAILED**",
INT n=0,
ENUM {FIRST, LAST, UNIQUE} select=UNIQUE
)
Like the `.sub()` and `.suball()` methods, this method returns the
result of calling `.extract(text, rewrite, fallback)` from the regex
interface on the nth pattern added to the set, or the pattern that most
recently matched in a `.match()` call.
`.extract()` is subject to the same conditions as the other rewrite
methods:
- The pattern to which it is applied is identified by `n` and `select`
according to the rules given above.
- It fails if:
- The pattern that it identifies was not saved with
`.add(save=true)`.
- The values of `n` or `select` are invalid.
- The `.extract()` method invoked on the saved `regex` object
fails.
Example:
# Rewrite any URL that matches one of the patterns in the set
# by exchanging the path components.
sub vcl_init {
new matcher = re2.set(anchor=both);
matcher.add("/(foo)/(bar)/", save=true);
matcher.add("/(bar)/(baz)/", save=true);
matcher.add("/(baz)/(quux)/", save=true);
matcher.compile();
}
sub vcl_recv {
if (matcher.match(req.url)) {
if (matcher.nmatches() != 1) {
return(fail);
}
set req.url = matcher.extract(req.url, "/\2/\1/");
}
}
#### BOOL xset.saved(ENUM which, INT n, ENUM select)
BOOL xset.saved(
ENUM {REGEX, STR, BE, INT} which=REGEX,
INT n=0,
ENUM {FIRST, LAST, UNIQUE} select=UNIQUE
)
Returns true if and only if an object of the type indicated by `which`
was saved at initialization time for the `nth` pattern added to the set,
or for the pattern indicated by `select` after the most recent
`.match()` call.
In other words, `.saved()` returns true:
- for `which=REGEX` if the individual regex was saved with
`.add(save=true)` for the indicated pattern
- for `which=STR` if a string was stored with the `string` parameter
in `.add()`
- for `which=BE` if a backend was stored with the `backend` attribute.
- for `which=INT` if an integer was stored with the `integer`
attribute.
The default value of `which` is `REGEX`.
The pattern in the set is identified by `n` and `select` according to
the rules given above. `.saved()` fails, returning false with a
`VCL_Error` message in the log, if the values of `n` or `select` are
invalid.
Example:
sub vcl_init {
new s = re2.set();
s.add("1", save=true, string="1", backend=b1);
s.add("2", save=true, string="2");
s.add("3", save=true, backend=b3);
s.add("4", save=true);
s.add("5", string="5", backend=b5);
s.add("6", string="6");
s.add("7", backend=b7);
s.add("8");
s.compile();
}
# Then the following holds for this set:
# s.saved(n=1) == true # for any value of which
# s.saved(which=REGEX, n=2) == true
# s.saved(which=STR, n=2) == true
# s.saved(which=BE, n=2) == false
# s.saved(which=REGEX, n=3) == true
# s.saved(which=STR, n=3) == false
# s.saved(which=BE, n=3) == true
# s.saved(which=REGEX, n=4) == true
# s.saved(which=STR, n=4) == false
# s.saved(which=BE, n=4) == false
# s.saved(which=REGEX, n=5) == false
# s.saved(which=STR, n=5) == true
# s.saved(which=BE, n=5) == true
# s.saved(which=REGEX, n=6) == false
# s.saved(which=STR, n=6) == true
# s.saved(which=BE, n=6) == false
# s.saved(which=REGEX, n=7) == false
# s.saved(which=STR, n=7) == false
# s.saved(which=BE, n=7) == true
# s.saved(n=8) == false # for any value of which
if (s.match("4")) {
# The fourth pattern has been uniquely matched.
# So in this context: s.saved() == true
# Since save=true was used in .add() for the 4th pattern,
# and which=REGEX by default.
}
#### VOID xset.hdr\_filter(HTTP, BOOL whitelist=1)
Filters the headers in the HTTP object, which may be one of `req`,
`resp`, `bereq`, or `beresp`. In other words, filter the headers in the
client or backend request or response.
If `whitelist` is `true`, then headers that match one of the patterns in
the set are retained, and all other headers are removed. Otherwise,
headers that match a pattern in the set are removed, and all others are
retained. By default, `whitelist` is `true`.
Example:
sub vcl_init {
# Header whitelist
new white = re2.set(anchor=start);
white.add("Foo:");
white.add("Bar:");
white.add("Baz: baz$");
white.compile();
# Header blacklist
new black = re2.set(anchor=start);
black.add("Chaotic:");
black.add("Evil:");
black.add("Wicked: wicked$");
black.compile();
}
sub vcl_recv {
# Filter the client request header with the whitelist.
# Headers that do not match any pattern in the set are removed.
white.hdr_filter(req);
}
sub vcl_deliver {
# Filter the client response header with the blacklist.
# Headers that match any pattern in the set are removed.
black.hdr_filter(resp, false);
}
#### STRING quotemeta(STRING, STRING fallback)
STRING quotemeta(
STRING,
STRING fallback="**QUOTEMETA FUNCTION FAILED**"
)
Returns a copy of the argument string with all regex metacharacters
escaped via backslash. When the returned string is used as a regular
expression, it will exactly match the original string, regardless of any
special characters. This function has a purpose similar to a `\Q..\E`
sequence within a regex, or the `literal=true` setting in a regex
constructor.
The function fails and returns `fallback` if there is insufficient
workspace for the return string.
Example:
# The following are always true:
re2.quotemeta("1.5-2.0?") == "1\.5\-2\.0\?"
re2.match(re2.quotemeta("1.5-2.0?"), "1.5-2.0?")
#### STRING version()
Return the version string for this VMOD.
Example:
std.log("Using VMOD re2 version: " + re2.version());
### REQUIREMENTS
The VMOD requires the Varnish since version 6.2, or the master branch.
See the source repository for versions of the VMOD that are compatible
with other Varnish versions.
It requires the RE2 library, and has been tested against RE2 versions
since 2015-06-01 (through 2019-08-01 at the time of writing).
If the VMOD is built against versions of RE2 since 2017-12-01, it uses a
version of the set match operation that reports out-of-memory conditions
during a match. (Versions of RE2 since June 2019 no longer have this
error, but nevertheless the different internal call is used for set
matches.) In that case, the VMOD is not compatible with earlier versions
of RE2. This is only a problem if the runtime version of the library
differs from the version against which the VMOD was built. If you
encounter this error, consider re-building the VMOD against the runtime
version of RE2, or installing a newer version of RE2.
### INSTALLATION
See [INSTALL.rst](INSTALL.rst) in the source repository.
### LIMITATIONS
The VMOD allocates Varnish workspace for captured groups and rewritten
strings. If operations fail with "insufficient workspace" error messages
in the Varnish log (with the `VCL_Error` tag), increase the varnishd
runtime parameters `workspace_client` and/or `workspace_backend`.
The RE2 documentation states that successful matches are slowed quite a
bit when they also capture substrings. There is also additional overhead
from the VMOD, unless the `never_capture` flag is true, to manage data
about captured groups in the workspace. This overhead is incurred even
if there are no capturing expressions in a pattern, since it is always
possible to call `backref(0)` to obtain the matched portion of a string.
So if you are using a pattern only to match against strings, and never
to capture subexpressions, consider setting the `never_capture` option
to true, to eliminate the extra work for both RE2 and the VMOD.
### AUTHOR
- Geoffrey Simmons \<<geoff@uplex.de>\>
UPLEX Nils Goroll Systemoptimierung
### SEE ALSO
- varnishd(1)
- vcl(7)
- VMOD source repository:
<https://code.uplex.de/uplex-varnish/libvmod-re2>
- Gitlab mirror: <https://gitlab.com/uplex/varnish/libvmod-re2>
- RE2 git repo: <https://github.com/google/re2>
- RE2 syntax: <https://github.com/google/re2/wiki/Syntax>
- "Implementing Regular Expressions": <https://swtch.com/~rsc/regexp/>
- Series of articles motivating the design of RE2, with discussion
of how RE2 compares with PCRE
### COPYRIGHT
Copyright (c) 2016-2018 UPLEX Nils Goroll Systemoptimierung
All rights reserved
Author: Geoffrey Simmons <geoffrey.simmons@uplex.de>
See LICENSE
..
.. NB: This file is machine generated, DO NOT EDIT!
..
.. Edit vmod.vcc and run make instead
..
.. role:: ref(emphasis)
========
VMOD re2
========
---------------------------------------------------------------------
Varnish Module for access to the Google RE2 regular expression engine
---------------------------------------------------------------------
:Manual section: 3
SYNOPSIS
========
::
import re2;
# regex object interface
new OBJECT = re2.regex(STRING pattern [, <regex options>])
BOOL <obj>.match(STRING)
STRING <obj>.backref(INT ref)
STRING <obj>.namedref(STRING name)
STRING <obj>.sub(STRING text, STRING rewrite)
STRING <obj>.suball(STRING text, STRING rewrite)
STRING <obj>.extract(STRING text, STRING rewrite)
INT <obj>.cost()
# regex function interface
BOOL re2.match(STRING pattern, STRING subject [, <regex options>])
STRING re2.backref(INT ref)
STRING re2.namedref(STRING name)
STRING re2.sub(STRING pattern, STRING text, STRING rewrite
[, <regex options>])
STRING re2.suball(STRING pattern, STRING text, STRING rewrite
[, <regex options>])
STRING re2.extract(STRING pattern, STRING text, STRING rewrite
[, <regex options>])
INT re2.cost(STRING pattern [, <regex options>])
# set object interface
new OBJECT = re2.set([ENUM anchor] [, <regex options>])
VOID <obj>.add(STRING [, BOOL save] [, BOOL never_capture] [, STRING string]
[, BACKEND backend] [, INT integer])
VOID <obj>.compile()
BOOL <obj>.match(STRING)
INT <obj>.nmatches()
BOOL <obj>.matched(INT)
INT <obj>.which([ENUM select])
STRING <obj>.string([INT n,] [ENUM select])
BACKEND <obj>.backend([INT n,] [ENUM select])
INT <obj>.integer([INT n] [, ENUM select])
STRING <obj>.sub(STRING text, STRING rewrite [, INT n]
[, ENUM select])
STRING <obj>.suball(STRING text, STRING rewrite [, INT n]
[, ENUM select])
STRING <obj>.extract(STRING text, STRING rewrite [, INT n]
[, ENUM select])
BOOL <obj>.saved([ENUM {REGEX, STR, BE, INT} which] [, INT n]
[, ENUM select])
VOID <obj>.hdr_filter(HTTP [, BOOL])
# utility function
STRING re2.quotemeta(STRING)
# VMOD version
STRING re2.version()
DESCRIPTION
===========
Varnish Module (VMOD) for access to the Google RE2 regular expression engine.
Varnish VCL uses the PCRE library (Perl Compatible Regular Expressions) for
its native regular expressions, which runs very efficiently for many common
uses of pattern matching in VCL, as attested by years of successful use of
PCRE with Varnish.
But for certain kinds of patterns, the worst-case running time of the PCRE
matcher is exponential in the length of the string to be matched. The
matcher uses backtracking, implemented with recursive calls to the internal
``match()`` function. In principle there is no upper bound to the possible
depth of backtracking and recursion, except as imposed by the ``varnishd``
runtime parameters ``pcre_match_limit`` and ``pcre_match_limit_recursion``;
matches fail if either of these limits are met. Stack overflow caused by
deep backtracking has occasionally been the subject of ``varnishd`` issues.
RE2 differs from PCRE in that it limits the syntax of patterns so that they
always specify a regular language in the formally strict sense. Most notably,
backreferences within a pattern are not permitted, for example ``(foo|bar)\1``
to match ``foofoo`` and ``barbar``, but not ``foobar`` or ``barfoo``. See the
link in ``SEE ALSO`` for the specification of RE2 syntax.
This means that an RE2 matcher runs as a finite automaton, which guarantees
linear running time in the length of the matched string. There is no
backtracking, and hence no risk of deep recursion or stack overflow.
The relative advantages and disadvantages of RE2 and PCRE is a broad subject,
beyond the scope of this manual. See the references in ``SEE ALSO`` for more
in-depth discussion.
regex object and function interfaces
------------------------------------
The VMOD provides regular expression operations by way of the ``regex`` object
interface and a functional interface. For ``regex`` objects, the pattern is
compiled at VCL initialization time, and the compiled pattern is re-used for
each invocation of its methods. Compilation failures (due to errors in the
pattern) cause failure at initialization time, and the VCL fails to load. The
``.backref()`` and ``.namedref()`` methods refer back to the last invocation
of the ``.match()`` method for the same object.
The functional interface provides the same set of operations, but the pattern
is compiled at runtime on each invocation (and then discarded). Compilation
failures are reported as errors in the Varnish log. The ``backref()`` and
``namedref()`` functions refer back to the last invocation of the ``match()``
function, for any pattern.
Compiling a pattern at runtime on each invocation is considerably more costly
than re-using a compiled pattern. So for patterns that are fixed and known
at VCL initialization, the object interface should be used. The functional
interface should only be used for patterns whose contents are not known until
runtime.
set object interface
--------------------
``set`` objects provide a shorthand for constructing patterns that consist of
an alternation -- a group of patterns combined with ``|`` for "or". For
example::
import re2;
sub vcl_init {
new myset = re2.set();
myset.add("foo"); # Pattern 1
myset.add("bar"); # Pattern 2
myset.add("baz"); # Pattern 3
myset.compile();
}
``myset.match(<string>)`` can now be used to match a string against
the pattern ``foo|bar|baz``. When a match is successful, the matcher
has determined all of the patterns that matched. These can then be
retrieved with the method ``.nmatches()`` for the number of matched
patterns, and with ``.matched(n)``, which returns ``true`` if the
``nth`` pattern matched, where the patterns are numbered in the order
in which they were added::
if (myset.match("foobar")) {
std.log("Matched " + myset.nmatches() + " patterns");
if (myset.matched(1)) {
# Pattern /foo/ matched
call do_foo;
}
if (myset.matched(2)) {
# Pattern /bar/ matched
call do_bar;
}
if (myset.matched(3)) {
# Pattern /baz/ matched
call do_baz;
}
}
An advantage of alternations and sets with RE2, as opposed to an
alternation in PCRE or a series of separate matches in an
if-elsif-elsif sequence, comes from the fact that the matcher is
implemented as a state machine. That means that the matcher progresses
through the string to be matched just once, following patterns in the
set that match through the state machine, or determining that there is
no match as soon as there are no more possible paths in the state
machine. So a string can be matched against a large set of patterns in
time that is proportional to the length of the string to be
matched. In contrast, PCRE matches patterns in an alternation one
after another, stopping after the first matching pattern, or
attempting matches against all of them if there is no match. Thus a
match against an alternation in PCRE is not unlike an if-elsif-elsif
sequence of individual matches, and requires the time needed for each
individual match, overall in proportion with the number of patterns to
be matched.
Another advantage of the VMOD's set object is the ability to associate
strings or backends with the patterns added to the set with the
``.add()`` method::
sub vcl_init {
new prefix = re2.set(anchor=start);
prefix.add("/foo", string="www.domain1.com");
prefix.add("/bar", string="www.domain2.com");
prefix.add("/baz", string="www.domain3.com");
prefix.add("/quux", string="www.domain4.com");
prefix.compile();
new appmatcher = re2.set(anchor=start);
appmatcher.add("/foo", backend=app1);
appmatcher.add("/bar", backend=app2);
appmatcher.add("/baz", backend=app3);
appmatcher.add("/quux", backend=app4);
appmatcher.compile();
}
After a successful match, the string or backend associated with the
matching pattern can be retrieved with the ``.string()`` and
``.backend()`` methods. This makes it possible, for example, to
construct a redirect response or choose the backend with code that is
both efficient and compact, even with a large set of patterns to be
matched::
# Use the prefix object to construct a redirect response from
# a matching request URL.
sub vcl_recv {
if (prefix.match(req.url)) {
# Pass the string associated with the matching pattern
# to vcl_synth.
return(synth(1301, prefix.string()));
}
}
sub vcl_synth {
# The string associated with the matching pattern is in
# resp.reason.
if (resp.status == 1301) {
set resp.http.Location = "http://" + resp.reason + req.url;
set resp.status = 301;
set resp.reason = "Moved Permanently";
}
}
# Use the appmatcher object to choose a backend based on the
# request URL prefix.
sub vcl_recv {
if (appmatcher.match(req.url)) {
set req.backend_hint = appmatcher.backend();
}
}
regex options
-------------
Where a pattern is compiled -- in the ``regex`` and ``set`` constructors, and
in functions that require compilation -- options may be specified that can
affect the interpretation of the pattern or the operation of the matcher. There
are default values for each option, and it is only necessary to specify options
in VCL that differ from the defaults. Options specified in a ``set``
constructor apply to all of the patterns in the resulting alternation.
``utf8``
If true, characters in a pattern match Unicode code points, and hence may
match more than one byte. If false, the pattern and strings to be matched
are interpreted as Latin-1 (ISO 8859-1), and a pattern character matches
exactly one byte. Default is **false**. Note that this differs from the
RE2 default.
``posix_syntax``
If true, patterns are restricted to POSIX (egrep) syntax. Otherwise,
the pattern syntax resembles that of PCRE, with some deviations. See the
link in ``SEE ALSO`` for the syntax specification. Default is **false**.
The options ``perl_classes``, ``word_boundary`` and ``one_line`` are
only consulted when this option is true.
``longest_match``
If true, the matcher searches for the longest possible match where
alternatives are possible. Otherwise, search for the first match. For
example with the pattern ``a(b|bb)`` and the string ``abb``, ``abb``
matches when ``longest_match`` is true, and backref 1 is ``bb``. Otherwise,
``ab`` matches, and backref 1 is ``b``. Default is **false**.
``max_mem``
An upper bound (in bytes) for the size of the compiled pattern. If ``max_mem``
is too small, the matcher may fall back to less efficient algorithms, or the
pattern may fail to compile. Default is the RE2 default (8MB), which should
suffice for typical patterns.
``literal``
If true, the pattern is interpreted as a literal string, and no regex
metacharacters (such as ``*``, ``+``, ``^`` and so forth) have their special
meaning. Default is **false**.
``never_nl``
If true, the newline character ``\n`` in a string is never matched, even if it
appears in the pattern. Default is **false**.
``dot_nl``
If true, then the dot character ``.`` in a pattern matches everything,
including newline. Otherwise, ``.`` never matches newline. Default is
**false**.
``never_capture``
If true, parentheses in a pattern are interpreted as non-capturing, and all
invocations of the ``backref`` and ``namedref`` methods or functions will
fail, including ``backref(0)`` after a successful match. Default is **false**,
except for set objects, for which ``never_capture`` is always true (and cannot
be changed), since back references are not possible with sets.
``case_sensitive``
If true, matches are case-sensitive. A pattern can override this option with
the ``(?i)`` flag, unless ``posix_syntax`` is true. Default is **true**.
The following options are only consulted when ``posix_syntax`` is true. If
``posix_syntax`` is false, then these features are always enabled and cannot be
turned off.
``perl_classes``
If true, then the perl character classes ``\d``, ``\s``, ``\w``, ``\D``,
``\S`` and ``\W`` are permitted in a pattern. Default is **false**.
``word_boundary``
If true, the perl assertions ``\b`` and ``\B`` (word boundary and not a word
boundary) are permitted. Default is **false**.
``one_line``
If true, then ``^`` and ``$`` only match at the beginning and end of the
string to be matched, regardless of newlines. Otherwise, ``^`` also matches
just after a newline, and ``$`` also matches just before a newline. Default is
**false**.
.. _re2.regex():
new xregex = re2.regex(STRING pattern, BOOL utf8, BOOL posix_syntax, BOOL longest_match, INT max_mem, BOOL literal, BOOL never_nl, BOOL dot_nl, BOOL never_capture, BOOL case_sensitive, BOOL perl_classes, BOOL word_boundary, BOOL one_line)
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
::
new xregex = re2.regex(
STRING pattern,
BOOL utf8=0,
BOOL posix_syntax=0,
BOOL longest_match=0,
INT max_mem=8388608,
BOOL literal=0,
BOOL never_nl=0,
BOOL dot_nl=0,
BOOL never_capture=0,
BOOL case_sensitive=1,
BOOL perl_classes=0,
BOOL word_boundary=0,
BOOL one_line=0
)
Create a regex object from ``pattern`` and the given options (or
option defaults). If the pattern is invalid, then VCL will fail to
load and the VCC compiler will emit an error message.
Example::
sub vcl_init {
new domainmatcher = re2.regex("^www\.([^.]+)\.com$");
new maxagematcher = re2.regex("max-age\s*=\s*(\d+)");
# Group possible subdomains without capturing
new submatcher = re2.regex("^www\.(domain1|domain2)\.com$",
never_capture=true);
}
.. _xregex.match():
BOOL xregex.match(STRING)
-------------------------
Returns ``true`` if and only if the compiled regex matches the given
string; corresponds to VCL's infix operator ``~``.
Example::
if (myregex.match(req.http.Host)) {
call do_on_match;
}
.. _xregex.backref():
STRING xregex.backref(INT ref, STRING fallback)
-----------------------------------------------
::
STRING xregex.backref(
INT ref,
STRING fallback="**BACKREF METHOD FAILED**"
)
Returns the `nth` captured subexpression from the most recent
successful call of the ``.match()`` method for this object in the same
client or backend, context, or a fallback string in case the capture
fails. Backref 0 indicates the entire matched string. Thus this
function behaves like the ``\n`` in the native VCL functions
``regsub`` and ``regsuball``, and the ``$1``, ``$2`` ... variables in
Perl.
Since Varnish client and backend operations run in different threads,
``.backref()`` can only refer back to a ``.match()`` call in the same
thread. Thus a ``.backref()`` call in any of the ``vcl_backend_*``
subroutines -- the backend context -- refers back to a previous
``.match()`` in any of those same subroutines; and a call in any of
the other VCL subroutines -- the client context -- refers back to a
``.match()`` in the same client context.
After unsuccessful matches, the ``fallback`` string is returned for
any call to ``.backref()``. The default value of ``fallback`` is
``"**BACKREF METHOD FAILED**"``. ``.backref()`` always fails after a
failed match, even if ``.match()`` had been called successfully before
the failure.
``.backref()`` may also return ``fallback`` after a successful match,
if no captured group in the matching string corresponds to the backref
number. For example, when the pattern ``(a|(b))c`` matches the string
``ac``, there is no backref 2, since nothing matches ``b`` in the
string.
The VCL infix operators ``~`` and ``!~`` do not affect this method,
nor do the functions ``regsub`` or ``regsuball``. Nor is it affected
by the matches performed by any other method or function in this VMOD
(such as the ``sub()``, ``suball()`` or ``extract()`` methods or
functions, or the ``set`` object's ``.match()`` method).
``.backref()`` fails, returning ``fallback`` and writing an error
message to the Varnish log with the ``VCL_Error`` tag, under the
following conditions (even if a previous match was successful and a
substring could have been captured):
* The ``fallback`` string is undefined, for example if set from an unset
header variable.
* The ``never_capture`` option was set to ``true`` for this object. In this
case, even ``.backref(0)`` fails after a successful match (otherwise, backref
0 always returns the full matched string).
* ``ref`` (the backref number) is out of range, i.e. it is larger than the
highest number for a capturing group in the pattern.
* ``.match()`` was never called for this object prior to calling ``.backref()``.
* There is insufficient workspace for the string to be returned.
Example::
if (domainmatcher.match(req.http.Host)) {
set req.http.X-Domain = domainmatcher.backref(1);
}
.. _xregex.namedref():
STRING xregex.namedref(STRING name, STRING fallback)
----------------------------------------------------
::
STRING xregex.namedref(
STRING name,
STRING fallback="**NAMEDREF METHOD FAILED**"
)
Returns the captured subexpression designated by ``name`` from the
most recent successful call to ``.match()`` in the current context
(client or backend), or ``fallback`` in case of failure.
Named capturing groups are written in RE2 as: ``(?P<name>re)``. (Note
that this syntax with ``P``, inspired by Python, differs from the
notation for named capturing groups in PCRE.) Thus when
``(?P<foo>.+)bar$`` matches ``bazbar``, then ``.namedref("foo")``
returns ``baz``.
Note that a named capturing group can also be referenced as a numbered
group. So in the previous example, ``.backref(1)`` also returns
``baz``.
``fallback`` is returned when ``.namedref()`` is called after an
unsuccessful match. The default fallback is ``"**NAMEDREF METHOD
FAILED**"``.
Like ``.backref()``, ``.namedref()`` is not affected by native VCL
regex operations, nor by any other matches performed by methods or
functions of the VMOD, except for a prior ``.match()`` for the same
object.
``.namedref()`` fails, returning ``fallback`` and logging a
``VCL_Error`` message, if:
* The ``fallback`` string is undefined.
* ``name`` is undefined or the empty string.
* The ``never_capture`` option was set to ``true``.
* There is no such named group.
* ``.match()`` was not called for this object.
* There is insufficient workspace for the string to be returned.
Example::
sub vcl_init {
new domainmatcher = re2.regex("^www\.(?P<domain>[^.]+)\.com$");
}
sub vcl_recv {
if (domainmatcher.match(req.http.Host)) {
set req.http.X-Domain = domainmatcher.namedref("domain");
}
}
.. _xregex.sub():
STRING xregex.sub(STRING text, STRING rewrite, STRING fallback)
---------------------------------------------------------------
::
STRING xregex.sub(
STRING text,
STRING rewrite,
STRING fallback="**SUB METHOD FAILED**"
)
If the compiled pattern for this regex object matches ``text``, then
return the result of replacing the first match in ``text`` with
``rewrite``. Within ``rewrite``, ``\1`` through ``\9`` can be used to
insert the the numbered capturing group from the pattern, and ``\0``
to insert the entire matching text. This method corresponds to the VCL
native function ``regsub()``.
``fallback`` is returned if the pattern does not match ``text``. The
default fallback is ``"**SUB METHOD FAILED**"``.
``.sub()`` fails, returning ``fallback`` and logging a ``VCL_Error``
message, if:
* Any of ``text``, ``rewrite`` or ``fallback`` are undefined.
* There is insufficient workspace for the rewritten string.
Example::
sub vcl_init {
new bmatcher = re2.regex("b+");
}
sub vcl_recv {
# If Host contains "www.yabba.dabba.doo.com", then this will
# set X-Yada to "www.yada.dabba.doo.com".
set req.http.X-Yada = bmatcher.sub(req.http.Host, "d");
}
.. _xregex.suball():
STRING xregex.suball(STRING text, STRING rewrite, STRING fallback)
------------------------------------------------------------------
::
STRING xregex.suball(
STRING text,
STRING rewrite,
STRING fallback="**SUBALL METHOD FAILED**"
)
Like ``.sub()``, except that all successive non-overlapping matches in
``text`` are replaced with ``rewrite``. This method corresponds to VCL
native ``regsuball()``.
The default fallback is ``"**SUBALL METHOD FAILED**"``. ``.suball()``
fails under the same conditions as ``.sub()``.
Since only non-overlapping matches are substituted, replacing
``"ana"`` within ``"banana"`` only results in one substitution, not
two.
Example::
sub vcl_init {
new bmatcher = re2.regex("b+");
}
sub vcl_recv {
# If Host contains "www.yabba.dabba.doo.com", then set X-Yada to
# "www.yada.dada.doo.com".
set req.http.X-Yada = bmatcher.suball(req.http.Host, "d");
}
.. _xregex.extract():
STRING xregex.extract(STRING text, STRING rewrite, STRING fallback)
-------------------------------------------------------------------
::
STRING xregex.extract(
STRING text,
STRING rewrite,
STRING fallback="**EXTRACT METHOD FAILED**"
)
If the compiled pattern for this regex object matches ``text``, then
return ``rewrite`` with substitutions from the matching portions of
``text``. Non-matching substrings of ``text`` are ignored.
The default fallback is ``"**EXTRACT METHOD FAILED**"``. Like
``.sub()`` and ``.suball()``, ``.extract()`` fails if:
* Any of ``text``, ``rewrite`` or ``fallback`` are undefined.
* There is insufficient workspace for the rewritten string.
Example::
sub vcl_init {
new email = re2.regex("(.*)@([^.]*)");
}
sub vcl_deliver {
# Sets X-UUCP to "kremvax!boris"
set resp.http.X-UUCP = email.extract("boris@kremvax.ru", "\2!\1");
}
.. _xregex.cost():
INT xregex.cost()
-----------------
Return a numeric measurement > 0 for this regex object from the RE2
library. According to the RE2 documentation:
... a very approximate measure of a regexp's "cost". Larger numbers
are more expensive than smaller numbers.
The absolute numeric values are opaque and not relevant, but they are
meaningful relative to one another -- more complex regexen have a
higher cost than less complex regexen. This may be useful during
development and optimization of regular expressions.
Example::
std.log("r1 cost=" + r1.cost() + " r_alt cost=" + r_alt.cost());
regex functional interface
==========================
.. _re2.match():
BOOL match(STRING pattern, STRING subject, BOOL utf8, BOOL posix_syntax, BOOL longest_match, INT max_mem, BOOL literal, BOOL never_nl, BOOL dot_nl, BOOL never_capture, BOOL case_sensitive, BOOL perl_classes, BOOL word_boundary, BOOL one_line)
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
::
BOOL match(
STRING pattern,
STRING subject,
BOOL utf8=0,
BOOL posix_syntax=0,
BOOL longest_match=0,
INT max_mem=8388608,
BOOL literal=0,
BOOL never_nl=0,
BOOL dot_nl=0,
BOOL never_capture=0,
BOOL case_sensitive=1,
BOOL perl_classes=0,
BOOL word_boundary=0,
BOOL one_line=0
)
Like the ``regex.match()`` method, return ``true`` if ``pattern``
matches ``subject``, where ``pattern`` is compiled with the given
options (or default options) on each invocation.
If ``pattern`` fails to compile, then an error message is logged with
the ``VCL_Error`` tag, and ``false`` is returned.
Example::
# Match the bereq Host header against a backend response header
if (re2.match(pattern=bereq.http.Host, subject=beresp.http.X-Host)) {
call do_on_match;
}
.. _re2.backref():
STRING backref(INT ref, STRING fallback)
----------------------------------------
::
STRING backref(
INT ref,
STRING fallback="**BACKREF FUNCTION FAILED**"
)
Returns the `nth` captured subexpression from the most recent
successful call of the ``match()`` function in the current client or
backend context, or a fallback string if the capture fails. The
default ``fallback`` is ``"**BACKREF FUNCTION FAILED**"``.
Similarly to the ``regex.backref()`` method, ``fallback`` is returned
after any failed invocation of the ``match()`` function, or if there
is no captured group corresponding to the backref number. The function
is not affected by native VCL regex operations, or any other method or
function of the VMOD except for the ``match()`` function.
The function fails, returning ``fallback`` and logging a ``VCL_Error``
message, under the same conditions as the corresponding method:
* ``fallback`` is undefined.
* ``never_capture`` was true in the previous invocation of the ``match()``
function.
* ``ref`` is out of range.
* The ``match()`` function was never called in this context.
* The pattern failed to compile for the previous ``match()`` call.
* There is insufficient workspace for the captured subexpression.
Example::
# Match against a pattern provided in a beresp header, and capture
# subexpression 1.
if (re2.match(pattern=beresp.http.X-Pattern, bereq.http.X-Foo)) {
set beresp.http.X-Capture = re2.backref(1);
}
.. _re2.namedref():
STRING namedref(STRING name, STRING fallback)
---------------------------------------------
::
STRING namedref(
STRING name,
STRING fallback="**NAMEDREF FUNCTION FAILED**"
)
Returns the captured subexpression designated by ``name`` from the
most recent successful call to the ``match()`` function in the current
context, or ``fallback`` in case of failure. The default fallback is
``"**NAMEDREF FUNCTION FAILED**"``.
The function returns ``fallback`` when the previous invocation of the
``match()`` function failed, and is only affected by use of the
``match()`` function. The function fails, returning ``fallback`` and
logging a ``VCL_Error`` message, under the same conditions as the
corresponding method:
* ``fallback`` is undefined.
* ``name`` is undefined or the empty string.
* The ``never_capture`` option was set to ``true``.
* There is no such named group.
* ``match()`` was not called in this context.
* The pattern failed to compile for the previous ``match()`` call.
* There is insufficient workspace for the captured expression.
Example::
if (re2.match(beresp.http.X-Pattern-With-Names, bereq.http.X-Foo)) {
set beresp.http.X-Capture = re2.namedref("foo");
}
.. _re2.sub():
STRING sub(STRING pattern, STRING text, STRING rewrite, STRING fallback, BOOL utf8, BOOL posix_syntax, BOOL longest_match, INT max_mem, BOOL literal, BOOL never_nl, BOOL dot_nl, BOOL never_capture, BOOL case_sensitive, BOOL perl_classes, BOOL word_boundary, BOOL one_line)
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
::
STRING sub(
STRING pattern,
STRING text,
STRING rewrite,
STRING fallback="**SUB FUNCTION FAILED**",
BOOL utf8=0,
BOOL posix_syntax=0,
BOOL longest_match=0,
INT max_mem=8388608,
BOOL literal=0,
BOOL never_nl=0,
BOOL dot_nl=0,
BOOL never_capture=0,
BOOL case_sensitive=1,
BOOL perl_classes=0,
BOOL word_boundary=0,
BOOL one_line=0
)
Compiles ``pattern`` with the given options, and if it matches
``text``, then return the result of replacing the first match in
``text`` with ``rewrite``. As with the ``regex.sub()`` method, ``\0``
through ``\9`` may be used in ``rewrite`` to substitute captured
groups from the pattern.
``fallback`` is returned if the pattern does not match ``text``. The
default fallback is ``"**SUB FUNCTION FAILED**"``.
``sub()`` fails, returning ``fallback`` and logging a ``VCL_Error``
message, if:
* ``pattern`` cannot be compiled.
* Any of ``text``, ``rewrite`` or ``fallback`` are undefined.
* There is insufficient workspace for the rewritten string.
Example::
# If the beresp header X-Sub-Letters contains "b+", and Host contains
# "www.yabba.dabba.doo.com", then set X-Yada to
# "www.yada.dabba.doo.com".
set beresp.http.X-Yada = re2.sub(beresp.http.X-Sub-Letters,
bereq.http.Host, "d");
.. _re2.suball():
STRING suball(STRING pattern, STRING text, STRING rewrite, STRING fallback, BOOL utf8, BOOL posix_syntax, BOOL longest_match, INT max_mem, BOOL literal, BOOL never_nl, BOOL dot_nl, BOOL never_capture, BOOL case_sensitive, BOOL perl_classes, BOOL word_boundary, BOOL one_line)
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
::
STRING suball(
STRING pattern,
STRING text,
STRING rewrite,
STRING fallback="**SUBALL FUNCTION FAILED**",
BOOL utf8=0,
BOOL posix_syntax=0,
BOOL longest_match=0,
INT max_mem=8388608,
BOOL literal=0,
BOOL never_nl=0,
BOOL dot_nl=0,
BOOL never_capture=0,
BOOL case_sensitive=1,
BOOL perl_classes=0,
BOOL word_boundary=0,
BOOL one_line=0
)
Like the ``sub()`` function, except that all successive
non-overlapping matches in ``text`` are replace with ``rewrite``.
The default fallback is ``"**SUBALL FUNCTION FAILED**"``. The
``suball()`` function fails under the same conditions as ``sub()``.
Example::
# If the beresp header X-Sub-Letters contains "b+", and Host contains
# "www.yabba.dabba.doo.com", then set X-Yada to
# "www.yada.dada.doo.com".
set beresp.http.X-Yada = re2.suball(beresp.http.X-Sub-Letters,
bereq.http.Host, "d");
.. _re2.extract():
STRING extract(STRING pattern, STRING text, STRING rewrite, STRING fallback, BOOL utf8, BOOL posix_syntax, BOOL longest_match, INT max_mem, BOOL literal, BOOL never_nl, BOOL dot_nl, BOOL never_capture, BOOL case_sensitive, BOOL perl_classes, BOOL word_boundary, BOOL one_line)
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
::
STRING extract(
STRING pattern,
STRING text,
STRING rewrite,
STRING fallback="**EXTRACT FUNCTION FAILED**",
BOOL utf8=0,
BOOL posix_syntax=0,
BOOL longest_match=0,
INT max_mem=8388608,
BOOL literal=0,
BOOL never_nl=0,
BOOL dot_nl=0,
BOOL never_capture=0,
BOOL case_sensitive=1,
BOOL perl_classes=0,
BOOL word_boundary=0,
BOOL one_line=0
)
Compiles ``pattern`` with the given options, and if it matches
``text``, then return ``rewrite`` with substitutions from the matching
portions of ``text``, ignoring the non-matching portions.
The default fallback is ``"**EXTRACT FUNCTION FAILED**"``. The
``extract()`` function fails under the same conditions as ``sub()``
and ``suball()``.
Example::
# If beresp header X-Params contains "(foo|bar)=(baz|quux)", and the
# URL contains "bar=quux", then set X-Query to "bar:quux".
set beresp.http.X-Query = re2.extract(beresp.http.X-Params, bereq.url,
"\1:\2");
.. _re2.cost():
INT cost(STRING pattern, BOOL utf8, BOOL posix_syntax, BOOL longest_match, INT max_mem, BOOL literal, BOOL never_nl, BOOL dot_nl, BOOL never_capture, BOOL case_sensitive, BOOL perl_classes, BOOL word_boundary, BOOL one_line)
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
::
INT cost(
STRING pattern,
BOOL utf8=0,
BOOL posix_syntax=0,
BOOL longest_match=0,
INT max_mem=8388608,
BOOL literal=0,
BOOL never_nl=0,
BOOL dot_nl=0,
BOOL never_capture=0,
BOOL case_sensitive=1,
BOOL perl_classes=0,
BOOL word_boundary=0,
BOOL one_line=0
)
Like the ``.cost()`` method above, return a numeric measurement > 0
from the RE2 library for ``pattern`` with the given options. More
complex regexen have a higher cost than less complex regexen.
Fails and returns -1 if ``pattern`` cannot be compiled.
Example::
std.log("simple cost=" + re2.cost("simple")
+ " complex cost=" + re2.cost("complex{1,128}"));
.. _re2.set():
new xset = re2.set(ENUM anchor, BOOL utf8, BOOL posix_syntax, BOOL longest_match, INT max_mem, BOOL literal, BOOL never_nl, BOOL dot_nl, BOOL case_sensitive, BOOL perl_classes, BOOL word_boundary, BOOL one_line)
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
::
new xset = re2.set(
ENUM {none, start, both} anchor=none,
BOOL utf8=0,
BOOL posix_syntax=0,
BOOL longest_match=0,
INT max_mem=8388608,
BOOL literal=0,
BOOL never_nl=0,
BOOL dot_nl=0,
BOOL case_sensitive=1,
BOOL perl_classes=0,
BOOL word_boundary=0,
BOOL one_line=0
)
Initialize a set object that represents several patterns combined by
alternation -- ``|`` for "or".
Optional parameters control the interpretation of the resulting
composed pattern. The ``anchor`` parameter is an enum that can have
the values ``none``, ``start`` or ``both``, where ``none`` is the
default. ``start`` means that each pattern is matched as if it begins
with ``^`` for start-of-text, and ``both`` means that each pattern is
anchored with both ``^`` at the beginning and ``$`` for end-of-text at
the end. ``none`` means that each pattern is interpreted as a partial
match (although individual patterns within the set may have either of
``^`` of ``$``).
For example, if a set is initialized with ``anchor=both``, and the
patterns ``foo`` and ``bar`` are added, then matches against the set
match a string against ``^foo$|^bar$``, or equivalently
``^(foo|bar)$``.
The usual regex options can be set, which then control matching
against the resulting composed pattern. However, the ``never_capture``
option cannot be set, and is always implicitly true, since backrefs
and namedrefs are not possible with sets.
Example::
sub vcl_init {
# Initialize a regex set for partial matches
# with default options
new foo = re2.set();
# Initialize a regex set for case insensitive matches
# with anchors on both ends (^ and $).
new bar = re2.set(anchor=both, case_sensitive=false);
# Initialize a regex set using POSIX syntax, but allowing
# Perl character classes, and anchoring at the left (^).
new baz = re2.set(anchor=start, posix_syntax=true,
perl_classes=true);
}
.. _xset.add():
VOID xset.add(STRING, [STRING string], [BACKEND backend], [BOOL save], [BOOL never_capture], [INT integer])
-----------------------------------------------------------------------------------------------------------
::
VOID xset.add(
STRING,
[STRING string],
[BACKEND backend],
[BOOL save],
[BOOL never_capture],
[INT integer]
)
Add the given pattern to the set. If the pattern is invalid,
``.add()`` fails, and the VCL will fail to load, with an error message
describing the problem.
If values for the ``string``, ``backend`` and/or ``integer``
parameters are provided, then these values can be retrieved with the
``.string()``, ``.backend()`` and ``.integer()`` methods,
respectively, as described below. This makes it possible to associate
data with the added pattern after it matches successfully. By default
the pattern is not associated with any such value.
If ``save`` is true, then the given pattern is compiled and saved as a
``regex`` object, just as if the ``regex`` constructor described above
is invoked. This object is stored internally in the ``set`` object as
an independent matcher, separate from "compound" pattern formed by the
set as an alternation of the patterns added to it. By default,
``save`` is **false**.
When the ``.match()`` method on the set is successful, and one of the
patterns that matched is associated with a saved internal ``regex``
object, then that object may be used for subsequent method invocations
such as ``.sub()`` on the set object, whose meanings are the same as
documented above for ``regex`` objects. Details are described below.
When an internal ``regex`` object is saved (i.e. when ``save`` is
true), it is compiled with the same options that were provided to the
set object in the constructor. The ``never_capture`` option can also
be set to false for the individual regex, even though it is implicitly
set to true for the full set object (default is false).
``.add()`` MUST be called in ``vcl_init``, and MAY NOT be called after
``.compile()``. If ``.add()`` is called in any other subroutine, an
error message with ``VCL_Error`` is logged, and the call has no
effect. If it is called in ``vcl_init`` after ``.compile()``, then the
VCL load will fail with an error message.
In other words, add all patterns to the set in ``vcl_init``, and
finally call ``.compile()`` when you're done.
When the ``.matched(INT)`` method is called after a successful match,
the numbering corresponds to the order in which patterns were added.
The same is true of the INT arguments that may be given for methods
such as ``.string()``, ``.backend()`` or ``.sub()``, as described
below.
Example::
sub vcl_init {
# literal=true means that the dots are interpreted as literal
# dots, not "match any character".
new hostmatcher = re2.set(anchor=both, case_sensitive=false,
literal=true);
hostmatcher.add("www.domain1.com");
hostmatcher.add("www.domain2.com");
hostmatcher.add("www.domain3.com");
hostmatcher.compile();
}
# See the documentation of the .string() and .backend() methods
# below for uses of the parameters string and backend for .add().
.. _xset.compile():
VOID xset.compile()
-------------------
Compile the compound pattern represented by the set -- an alternation
of all patterns added by ``.add()``.
``.compile()`` fails if no patterns were added to the set. It may also
fail if the ``max_mem`` setting is not large enough for the composed
pattern. In that case, the VCL load will fail with an error message
(then consider a larger value for ``max_mem`` in the set constructor).
``.compile()`` MUST be called in ``vcl_init``, and MAY NOT be called
more than once for a set object. If it is called in any other
subroutine, a ``VCL_Error`` message is logged, and the call has no
effect. If it is called a second time in ``vcl_init``, the VCL load
will fail.
See above for examples.
.. _xset.match():
BOOL xset.match(STRING)
-----------------------
Returns ``true`` if the given string matches the compound pattern
represented by the set, i.e. if it matches any of the patterns that
were added to the set.
The matcher identifies all of the patterns that were added to the set
and match the given string. These can be determined after a successful
match using the ``.matched(INT)`` and ``.nmatches()`` methods
described below.
``.match()`` MUST be called after ``.compile()``; otherwise the match
always fails.
A match may also fail (returning ``false``) if the internal memory
limit imposed by the ``max_mem`` parameter in the constructor is
exceeded. (With the default value of ``max_mem``, this ordinarily
requires very large patterns and/or a very large string to be
matched.) Since about version 2017-12-01, the RE2 library reports
this condition; if so, the VMOD writes a ``VCL_Error`` message in the
log if it happens, except during ``vcl_init``, in which case the VCL
load fails with the error message. If matches fail due to the
out-of-memory condition, increase the ``max_mem`` parameter in the
constructor.
Example::
if (hostmatcher.match(req.http.Host)) {
call do_when_a_host_matched;
}
.. _xset.matched():
BOOL xset.matched(INT)
----------------------
Returns ``true`` after a successful match if the ``nth`` pattern that
was added to the set is among the patterns that matched, ``false``
otherwise. The numbering of the patterns corresponds to the order in
which patterns were added in ``vcl_init``, counting from 1.
The method refers back to the most recent invocation of ``.match()``
for the same object in the same client or backend context. It always
returns ``false``, for every value of the parameter, if it is called
after an unsuccessful match (``.match()`` returned ``false``).
``.matched()`` fails and returns ``false`` if:
* The ``.match()`` method was not called for this object in the same
client or backend scope.
* The integer parameter is out of range; that is, if it is less than 1
or greater than the number of patterns added to the set.
On failure, the method writes an error message to the log with the tag
``VCL_Error``; if it fails during ``vcl_init``, then the VCL load
fails with the error message. In any other VCL subroutine, the method
returns ``false`` on failure and processing continues; since ``false``
is a legitimate return value, you should consider monitoring the log
for the error messages.
Example::
if (hostmatcher.match(req.http.Host)) {
if (hostmatcher.matched(1)) {
call do_domain1;
}
if (hostmatcher.matched(2)) {
call do_domain2;
}
if (hostmatcher.matched(3)) {
call do_domain3;
}
}
.. _xset.nmatches():
INT xset.nmatches()
-------------------
Returns the number of patterns that were matched by the most recent
invocation of ``.match()`` for the same object in the same client or
backend context. The method always returns 0 after an unsuccessful
match (``.match()`` returned ``false``).
If ``.match()`` was not called for this object in the same client or
backend scope, ``.nmatches()`` fails and returns 0, writing an error
message with ``VCL_Error`` to the log. If this happens in
``vcl_init``, the VCL load fails with the error message. As with
``.matched()``, ``.nmatches()`` returns a legitimate value and VCL
processing continues when it fails in any other subroutine, so you
should monitor the log for the error messages.
Example::
if (myset.match(req.url)) {
std.log("URL matched " + myset.nmatches()
+ " patterns from the set");
}
.. _xset.which():
INT xset.which(ENUM {FIRST, LAST, UNIQUE} select=UNIQUE)
--------------------------------------------------------
Returns a number indicating which pattern in a set matched in the most
recent invocation of ``.match()`` in the client or backend
context. The number corresponds to the order in which patterns were
added to the set in ``vcl_init``, counting from 1.
If exactly one pattern matched in the most recent ``.match()`` call
(so that ``.nmatches()`` returns 1), and the ``select`` ENUM is set to
``UNIQUE``, then the number for that pattern is returned. ``select``
defaults to ``UNIQUE``, so it can be left out in this case.
If more than one pattern matched in the most recent ``.match()`` call
(``.nmatches()`` > 1), then the ``select`` ENUM determines the integer
that is returned. The values ``FIRST`` and ``LAST`` specify that, of
the patterns that matched, the first or last one added via the
``.add()`` method is chosen, and the number for that pattern is
returned.
``.which()`` fails, returning 0 with a ``VCL_Error`` message in the log,
if:
* ``.match()`` was not called for the set in the current client or
backend transaction, or if the previous call returned ``false``.
* More than one pattern in the set matched in the previous
``.match()`` call, but the ``select`` parameter is set to ``UNIQUE``
(or left out, since ``select`` defaults to ``UNIQUE``).
Examples::
sub vcl_init {
new myset = re2.set();
myset.add("foo"); # Pattern 1
myset.add("bar"); # Pattern 2
myset.add("baz"); # Pattern 3
myset.compile();
}
sub vcl_recv {
if (myset.match("bar")) {
# myset.which() returns 2.
}
if (myset.which("foobaz")) {
# myset.which() fails and returns 0, with a log
# message indicating that 2 patterns
# matched.
# myset.which(FIRST) returns 1.
# myset.which(LAST) returns 3.
}
if (myset.match("quux")) {
# ...
}
else {
# myset.which() fails and returns 0, with either or
# no value for the select ENUM, with a log message
# indicating that the previous .match() call was
# unsuccessful.
}
.. _xset.string():
STRING xset.string(INT n, ENUM select)
--------------------------------------
::
STRING xset.string(
INT n=0,
ENUM {FIRST, LAST, UNIQUE} select=UNIQUE
)
Returns the string associated with the `nth` pattern added to the set,
or with the pattern in the set that matched in the most recent call to
``.match()`` in the same task scope (client or backend context). The
string set with the ``string`` parameter of the ``.add()`` method
during ``vcl_init`` is returned.
The pattern is identified with the parameters ``n`` and ``select``
according to these rules, which also hold for all further ``set``
methods documented in the following.
* If ``n`` > 0, then select the `nth` pattern added to the set with
the ``.add()`` method, counting from 1. This identifies the `nth`
pattern in any context, regardless of whether ``.match()`` was
called previously, or whether a previous call returned ``true`` or
``false``. The ``select`` parameter is ignored in this case.
* If ``n`` <= 0, then select a pattern in the set that matched
successfully in the most recent call to ``.match()`` in the same
task scope. Since ``n`` is 0 by default, ``n`` can be left out for
this purpose.
* If ``n`` <= 0 and exactly one pattern in the set matched in the most
recent invocation of ``.match()`` (and hence ``.nmatches()`` returns
1), and ``select`` is set to ``UNIQUE``, then select that
pattern. ``select`` defaults to ``UNIQUE``, so when exactly one
pattern in the set matched, both ``n`` and ``select`` can be left
out.
* If ``n`` <= 0 and more than one pattern matched in the most recent
``.match()`` call (``.nmatches()`` > 1), then the selection of a
pattern is determined by the ``select`` parameter. As with
``.which()``, ``FIRST`` and ``LAST`` specify the first or last
matching pattern added via the ``.add()`` method.
For the pattern selected by these rules, return the string that was
set with the ``string`` parameter in the ``.add()`` method that added
the pattern to the set.
``.string()`` fails, returning NULL with an a ``VCL_Error`` message in
the log, if:
* The values of ``n`` and ``select`` are invalid:
* ``n`` is greater than the number of patterns in the set.
* ``n`` <= 0 (or left to the default), but ``.match()`` was not
called earlier in the same task scope (client or backend context).
* ``n`` <= 0, but the previous ``.match()`` call returned ``false``.
* ``n`` <= 0 and the ``select`` ENUM is ``UNIQUE`` (or default), but
more than one pattern matched in the previous ``.match()`` call.
* No string was associated with the pattern selected by ``n`` and
``select``; that is, the ``string`` parameter was not set in the
``.add()`` call that added the pattern.
Examples::
# Match the request URL against a set of patterns, and generate
# a synthetic redirect response with a Location header derived
# from the string assoicated with the matching pattern.
# In the first example, exactly one pattern in the set matches.
sub vcl_init {
# With anchor=both, we specify exact matches.
new matcher = re2.set(anchor=both);
matcher.add("/foo/bar", "/baz/quux");
matcher.add("/baz/bar/foo", "/baz/quux/foo");
matcher.add("/quux/bar/baz/foo", "/baz/quux/foo/bar");
matcher.compile();
}
sub vcl_recv {
if (matcher.match(req.url)) {
# Confirm that there was exactly one match
if (matcher.nmatches() != 1) {
return(fail);
}
# Divert to vcl_synth, sending the string associated
# with the matching pattern in the "reason" field.
return(synth(1301, matcher.string()));
}
}
sub vcl_synth {
# Construct a redirect response, using the path set in
# resp.reason.
if (resp.status == 1301) {
set resp.http.Location
= "http://otherdomain.org" + resp.reason;
set resp.status = 301;
set resp.reason = "Moved Permanently";
return(deliver);
}
}
# In the second example, the patterns that may match have
# common prefixes, and more than one pattern may match. We
# add patterns to the set in a "more specific" to "less
# specific" order, and we choose the most specific pattern
# that matches, by specifying the first matching pattern in
# the set.
sub vcl_init {
# With anchor=start, we specify matching prefixes.
new matcher = re2.set(anchor=start);
matcher.add("/foo/bar/baz/quux", "/baz/quux");
matcher.add("/foo/bar/baz", "/baz/quux/foo");
matcher.add("/foo/bar", "/baz/quux/foo/bar");
matcher.add("/foo", "/baz");
matcher.compile();
}
sub vcl_recv {
if (matcher.match(req.url)) {
# Select the first matching pattern
return(synth(1301, matcher.string(select=FIRST)));
}
}
# vcl_synth is implemented as shown above
.. _xset.backend():
BACKEND xset.backend(INT n, ENUM select)
----------------------------------------
::
BACKEND xset.backend(
INT n=0,
ENUM {FIRST, LAST, UNIQUE} select=UNIQUE
)
Returns the backend associated with the `nth` pattern added to the
set, or with the pattern in the set that matched in the most recent
call to ``.match()`` in the same task scope (client or backend
context).
The rules for selecting a pattern from the set and its associated
backend based on ``n`` and ``select`` are the same as described above
for ``.string()``.
``.backend()`` fails, returning NULL with an a ``VCL_Error`` message
in the log, under the same conditions described for ``.string()``
above -- ``n`` and ``select`` are invalid, or no backend was
associated with the selected pattern with the ``.add()`` method.
Example::
# Choose a backend based on the URL prefix.
# In this example, assume that backends b1 through b4
# have been defined.
sub vcl_init {
# Use anchor=start to match prefixes.
# The prefixes are unique, so exactly one will match.
new matcher = re2.set(anchor=start);
matcher.add("/foo", backend=b1);
matcher.add("/bar", backend=b2);
matcher.add("/baz", backend=b3);
matcher.add("/quux", backend=b4);
matcher.compile();
}
sub vcl_recv {
if (matcher.match(req.url)) {
# Confirm that there was exactly one match
if (matcher.nmatches() != 1) {
return(fail);
}
# Set the backend hint to the backend associated
# with the matching pattern.
set req.backend_hint = matcher.backend();
}
}
.. _xset.integer():
INT xset.integer(INT n, ENUM select)
------------------------------------
::
INT xset.integer(
INT n=0,
ENUM {FIRST, LAST, UNIQUE} select=UNIQUE
)
Returns the integer associated with the `nth` pattern added to the
set, or with the pattern in the set that matched in the most recent
call to ``.match()`` in the same task scope.
The rules for selecting a pattern from the set and its associated
integer based on ``n`` and ``select`` are the same as described above
for ``.string()``.
``.integer()`` invokes VCL failure under the same error conditions
described for ``.string()`` above -- ``n`` and ``select`` are invalid,
or no integer was associated with the selected pattern with the
``.add()`` method.
Note that VCL failure differs from the failure mode for ``.string()``
and ``.backend()``, since there is no distinguished "error" value that
could be returned as the INT. VCL failure has the same effect as if
``return(fail)`` were called from a VCL subroutine; usually, control
directs immediately to ``vcl_synth``, with the response status set to
503, and the response reason set to "VCL failed".
You can avoid that, for example, by testing if ``.nmatches()==1``
after calling ``.match()``, if you need to ensure that calling
``.integer(select=UNIQUE)`` will not fail.
Example::
# Generate redirect responses based on the Host header. In the
# example, subdomains are removed in the new Location, and the
# associated integer is used to set the redirect status code.
sub vcl_init {
# No more than one pattern can match the same string. So it
# is safe to call .integer() with default select=UNIQUE in
# vcl_recv below (no risk of VCL failure).
new redir = re2.set(anchor=both);
redir.add("www\.[^.]+\.foo\.com", integer=301, string="www.foo.com");
redir.add("www\.[^.]+\.bar\.com", integer=302, string="www.bar.com");
redir.add("www\.[^.]+\.baz\.com", integer=303, string="www.baz.com");
redir.add("www\.[^.]+\.quux\.com", integer=307, string="www.quux.com");
redir.compile();
}
sub vcl_recv {
if (redir.match(req.http.Host)) {
# Construct a Location header that will be used in the
# synthetic redirect response.
set req.http.Location = "http://" + redir.string() + req.url;
# Set the response status from the associated integer.
return( synth(redir.integer()) );
}
}
sub vcl_synth {
if (resp.status >= 301 && resp.status <= 307) {
# We come here from the synth return for the redirect
# response. The status code was set from .integer().
set resp.http.Location = req.http.Location;
return(deliver);
}
}
.. _xset.sub():
STRING xset.sub(STRING text, STRING rewrite, STRING fallback, INT n, ENUM select)
---------------------------------------------------------------------------------
::
STRING xset.sub(
STRING text,
STRING rewrite,
STRING fallback="**SUB METHOD FAILED**",
INT n=0,
ENUM {FIRST, LAST, UNIQUE} select=UNIQUE
)
Returns the result of the method call ``.sub(text, rewrite, fallback)``,
as documented above for the ``regex`` interface, invoked on the `nth`
pattern added to the set, or on the pattern in the set that matched in
the most recent call to ``.match()`` in the same task scope.
``.sub()`` requires that the pattern it identifies was saved as an
internal ``regex`` object, by setting ``save`` to true when it was
added with the ``.add()`` method.
The associated pattern is determined by ``n`` and ``select`` according
to the rules given above. If an internal ``regex`` object was saved
for that pattern, then the result of the ``.sub()`` method invoked on
that object is returned.
``.sub()`` fails, returning NULL with a ``VCL_Error`` message in the
log, if:
* The values of ``n`` and ``select`` are invalid, according to the
rules given above.
* ``save`` was false in the ``.add()`` method for the pattern
identified by ``n`` and ``select``; that is, no internal ``regex``
object was saved on which the ``.sub()`` method could have been
invoked.
* The ``.sub()`` method invoked on the ``regex`` object fails for any
of the reasons described for ``regex.sub()``.
Examples::
# Generate synthethic redirect responses on URLs that match a set of
# patterns, rewriting the URL according to the matched pattern.
# In this example, we set the new URL in the redirect location to
# the path that comes after the prefix of the original req.url.
sub vcl_init {
new matcher = re2.set(anchor=start);
matcher.add("/foo/(.*)", save=true);
matcher.add("/bar/(.*)", save=true);
matcher.add("/baz/(.*)", save=true);
matcher.compile();
}
sub vcl_recv {
if (matcher.match(req.url)) {
if (matcher.nmatches() != 1) {
return(fail);
}
return(synth(1301));
}
}
sub vcl_synth {
if (resp.status == 1301) {
# matcher.sub() rewrites the URL to the subpath after the
# original prefix.
set resp.http.Location
= "http://www.otherdomain.org" + matcher.sub(req.url, "/\1");
return(deliver);
}
}
.. _xset.suball():
STRING xset.suball(STRING text, STRING rewrite, STRING fallback, INT n, ENUM select)
------------------------------------------------------------------------------------
::
STRING xset.suball(
STRING text,
STRING rewrite,
STRING fallback="**SUBALL METHOD FAILED**",
INT n=0,
ENUM {FIRST, LAST, UNIQUE} select=UNIQUE
)
Like the ``.sub()`` method, this returns the result of calling
``.suball(text, rewrite, fallback)`` from the regex interface on the
`nth` pattern added to the set, or the pattern that most recently
matched in a ``.match()`` call.
``.suball()`` is subject to the same conditions as the ``.sub()`` method:
* The pattern to which it is applied is identified by ``n`` and
``select`` according to the rules given above.
* It fails if:
* The pattern that it identifies was not saved with ``.add(save=true)``.
* The values of ``n`` or ``select`` are invalid.
* The ``.suball()`` method invoked on the saved ``regex`` object
fails.
Example::
# In any URL that matches one of the words given below, replace all
# occurrences of the matching word with "quux" (for example to
# rewrite path components or elements of query strings).
sub vcl_init {
new matcher = re2.set();
matcher.add("\bfoo\b", save=true);
matcher.add("\bbar\b", save=true);
matcher.add("\bbaz\b", save=true);
matcher.compile();
}
sub vcl_recv {
if (matcher.match(req.url)) {
if (matcher.nmatches() != 1) {
return(fail);
}
set req.url = matcher.suball(req.url, "quux");
}
}
.. _xset.extract():
STRING xset.extract(STRING text, STRING rewrite, STRING fallback, INT n, ENUM select)
-------------------------------------------------------------------------------------
::
STRING xset.extract(
STRING text,
STRING rewrite,
STRING fallback="**EXTRACT METHOD FAILED**",
INT n=0,
ENUM {FIRST, LAST, UNIQUE} select=UNIQUE
)
Like the ``.sub()`` and ``.suball()`` methods, this method returns the
result of calling ``.extract(text, rewrite, fallback)`` from the regex
interface on the `nth` pattern added to the set, or the pattern that most
recently matched in a ``.match()`` call.
``.extract()`` is subject to the same conditions as the other rewrite
methods:
* The pattern to which it is applied is identified by ``n`` and
``select`` according to the rules given above.
* It fails if:
* The pattern that it identifies was not saved with ``.add(save=true)``.
* The values of ``n`` or ``select`` are invalid.
* The ``.extract()`` method invoked on the saved ``regex`` object
fails.
Example::
# Rewrite any URL that matches one of the patterns in the set
# by exchanging the path components.
sub vcl_init {
new matcher = re2.set(anchor=both);
matcher.add("/(foo)/(bar)/", save=true);
matcher.add("/(bar)/(baz)/", save=true);
matcher.add("/(baz)/(quux)/", save=true);
matcher.compile();
}
sub vcl_recv {
if (matcher.match(req.url)) {
if (matcher.nmatches() != 1) {
return(fail);
}
set req.url = matcher.extract(req.url, "/\2/\1/");
}
}
.. _xset.saved():
BOOL xset.saved(ENUM which, INT n, ENUM select)
-----------------------------------------------
::
BOOL xset.saved(
ENUM {REGEX, STR, BE, INT} which=REGEX,
INT n=0,
ENUM {FIRST, LAST, UNIQUE} select=UNIQUE
)
Returns true if and only if an object of the type indicated by
``which`` was saved at initialization time for the ``nth`` pattern
added to the set, or for the pattern indicated by ``select`` after the
most recent ``.match()`` call.
In other words, ``.saved()`` returns true:
* for ``which=REGEX`` if the individual regex was saved with
``.add(save=true)`` for the indicated pattern
* for ``which=STR`` if a string was stored with the ``string``
parameter in ``.add()``
* for ``which=BE`` if a backend was stored with the ``backend``
attribute.
* for ``which=INT`` if an integer was stored with the ``integer``
attribute.
The default value of ``which`` is ``REGEX``.
The pattern in the set is identified by ``n`` and ``select`` according
to the rules given above. ``.saved()`` fails, returning false with a
``VCL_Error`` message in the log, if the values of ``n`` or ``select``
are invalid.
Example::
sub vcl_init {
new s = re2.set();
s.add("1", save=true, string="1", backend=b1);
s.add("2", save=true, string="2");
s.add("3", save=true, backend=b3);
s.add("4", save=true);
s.add("5", string="5", backend=b5);
s.add("6", string="6");
s.add("7", backend=b7);
s.add("8");
s.compile();
}
# Then the following holds for this set:
# s.saved(n=1) == true # for any value of which
# s.saved(which=REGEX, n=2) == true
# s.saved(which=STR, n=2) == true
# s.saved(which=BE, n=2) == false
# s.saved(which=REGEX, n=3) == true
# s.saved(which=STR, n=3) == false
# s.saved(which=BE, n=3) == true
# s.saved(which=REGEX, n=4) == true
# s.saved(which=STR, n=4) == false
# s.saved(which=BE, n=4) == false
# s.saved(which=REGEX, n=5) == false
# s.saved(which=STR, n=5) == true
# s.saved(which=BE, n=5) == true
# s.saved(which=REGEX, n=6) == false
# s.saved(which=STR, n=6) == true
# s.saved(which=BE, n=6) == false
# s.saved(which=REGEX, n=7) == false
# s.saved(which=STR, n=7) == false
# s.saved(which=BE, n=7) == true
# s.saved(n=8) == false # for any value of which
if (s.match("4")) {
# The fourth pattern has been uniquely matched.
# So in this context: s.saved() == true
# Since save=true was used in .add() for the 4th pattern,
# and which=REGEX by default.
}
.. _xset.hdr_filter():
VOID xset.hdr_filter(HTTP, BOOL whitelist=1)
--------------------------------------------
Filters the headers in the HTTP object, which may be one of ``req``,
``resp``, ``bereq``, or ``beresp``. In other words, filter the headers
in the client or backend request or response.
If ``whitelist`` is ``true``, then headers that match one of the
patterns in the set are retained, and all other headers are removed.
Otherwise, headers that match a pattern in the set are removed, and
all others are retained. By default, ``whitelist`` is ``true``.
Example::
sub vcl_init {
# Header whitelist
new white = re2.set(anchor=start);
white.add("Foo:");
white.add("Bar:");
white.add("Baz: baz$");
white.compile();
# Header blacklist
new black = re2.set(anchor=start);
black.add("Chaotic:");
black.add("Evil:");
black.add("Wicked: wicked$");
black.compile();
}
sub vcl_recv {
# Filter the client request header with the whitelist.
# Headers that do not match any pattern in the set are removed.
white.hdr_filter(req);
}
sub vcl_deliver {
# Filter the client response header with the blacklist.
# Headers that match any pattern in the set are removed.
black.hdr_filter(resp, false);
}
.. _re2.quotemeta():
STRING quotemeta(STRING, STRING fallback)
-----------------------------------------
::
STRING quotemeta(
STRING,
STRING fallback="**QUOTEMETA FUNCTION FAILED**"
)
Returns a copy of the argument string with all regex metacharacters
escaped via backslash. When the returned string is used as a regular
expression, it will exactly match the original string, regardless of
any special characters. This function has a purpose similar to a
``\Q..\E`` sequence within a regex, or the ``literal=true`` setting in
a regex constructor.
The function fails and returns ``fallback`` if there is insufficient
workspace for the return string.
Example::
# The following are always true:
re2.quotemeta("1.5-2.0?") == "1\.5\-2\.0\?"
re2.match(re2.quotemeta("1.5-2.0?"), "1.5-2.0?")
.. _re2.version():
STRING version()
----------------
Return the version string for this VMOD.
Example::
std.log("Using VMOD re2 version: " + re2.version());
REQUIREMENTS
============
The VMOD requires the Varnish since version 6.2, or the master
branch. See the source repository for versions of the VMOD that are
compatible with other Varnish versions.
It requires the RE2 library, and has been tested against RE2 versions
since 2015-06-01 (through 2019-08-01 at the time of writing).
If the VMOD is built against versions of RE2 since 2017-12-01, it uses
a version of the set match operation that reports out-of-memory
conditions during a match. (Versions of RE2 since June 2019 no longer
have this error, but nevertheless the different internal call is used
for set matches.) In that case, the VMOD is not compatible with
earlier versions of RE2. This is only a problem if the runtime version
of the library differs from the version against which the VMOD was
built. If you encounter this error, consider re-building the VMOD
against the runtime version of RE2, or installing a newer version of
RE2.
INSTALLATION
============
See `INSTALL.rst <INSTALL.rst>`_ in the source repository.
LIMITATIONS
===========
The VMOD allocates Varnish workspace for captured groups and rewritten
strings. If operations fail with "insufficient workspace" error
messages in the Varnish log (with the ``VCL_Error`` tag), increase the
varnishd runtime parameters ``workspace_client`` and/or
``workspace_backend``.
The RE2 documentation states that successful matches are slowed quite
a bit when they also capture substrings. There is also additional
overhead from the VMOD, unless the ``never_capture`` flag is true, to
manage data about captured groups in the workspace. This overhead is
incurred even if there are no capturing expressions in a pattern,
since it is always possible to call ``backref(0)`` to obtain the
matched portion of a string.
So if you are using a pattern only to match against strings, and never
to capture subexpressions, consider setting the ``never_capture``
option to true, to eliminate the extra work for both RE2 and the VMOD.
AUTHOR
======
* Geoffrey Simmons <geoff@uplex.de>
UPLEX Nils Goroll Systemoptimierung
SEE ALSO
========
* varnishd(1)
* vcl(7)
* VMOD source repository: https://code.uplex.de/uplex-varnish/libvmod-re2
* Gitlab mirror: https://gitlab.com/uplex/varnish/libvmod-re2
* RE2 git repo: https://github.com/google/re2
* RE2 syntax: https://github.com/google/re2/wiki/Syntax
* "Implementing Regular Expressions": https://swtch.com/~rsc/regexp/
* Series of articles motivating the design of RE2, with discussion
of how RE2 compares with PCRE
COPYRIGHT
=========
::
Copyright (c) 2016-2018 UPLEX Nils Goroll Systemoptimierung
All rights reserved
Author: Geoffrey Simmons <geoffrey.simmons@uplex.de>
See LICENSE
......@@ -44,6 +44,14 @@ AC_ARG_WITH([genhtml],
AC_CHECK_PROGS(GENHTML, [genhtml], []))
AM_CONDITIONAL(HAVE_GENHTML, [test -n "$GENHTML"])
AC_ARG_WITH([pandoc],
AS_HELP_STRING(
[--with-pandoc=PATH],
[Location of pandoc to generate README.md (auto)]),
[PANDOC="$withval"],
AC_CHECK_PROGS(PANDOC, [pandoc], []))
AM_CONDITIONAL(HAVE_PANDOC, [test -n "$PANDOC"])
m4_ifndef([VARNISH_PREREQ], AC_MSG_ERROR([Need varnish.m4 -- see INSTALL.rst]))
PKG_CHECK_MODULES([RE2], [re2])
......
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment