Commit 2cc6a0a9 authored by Geoff Simmons's avatar Geoff Simmons

Document set.string() and set.backend().

parent 3d7fe675
Pipeline #345 skipped
......@@ -45,11 +45,16 @@ import re2 [from "path"] ;
# set object interface
new OBJECT = re2.set([ENUM anchor] [, <regex options>])
VOID <obj>.add(STRING)
VOID <obj>.add(STRING [, STRING string] [, BACKEND backend])
VOID <obj>.compile()
BOOL <obj>.match(STRING)
INT <obj>.nmatches()
BOOL <obj>.matched(INT)
STRING <obj>.string([INT n,] [ENUM select])
BACKEND <obj>.backend([INT n,] [ENUM select])
# VMOD version
STRING re2.version()
DESCRIPTION
===========
......@@ -148,6 +153,78 @@ in which they were added::
}
}
An advantage of alternations and sets with RE2, as opposed to an
alternation in PCRE or a series of separate matches in an
if-elsif-elsif sequence, comes from the fact that the matcher is
implemented as a state machine. That means that the matcher progresses
through the string to be matched just once, following patterns in the
set that match through the state machine, or determining that there is
no match as soon as there are no more possible paths in the state
machine. So a string can be matched against a large set of patterns in
time that is proportional to the length of the string to be
matched. In contrast, PCRE matches patterns in an alternation one
after another, stopping after the first matching pattern, or
attempting matches against all of them if there is no match. Thus a
match against an alternation in PCRE is not unlike an if-elsif-elsif
sequence of individual matches, and requires the time needed for each
individual match, overall in proportion with the number of patterns to
be matched.
Another advantage of the VMOD's set object is the ability to associate
strings or backends with the patterns added to the set with the
``.add()`` method::
sub vcl_init {
new prefix = re2.set(anchor=start);
prefix.add("/foo", string="www.domain1.com");
prefix.add("/bar", string="www.domain2.com");
prefix.add("/baz", string="www.domain3.com");
prefix.add("/quux", string="www.domain4.com");
prefix.compile();
new appmatcher = re2.set(anchor=start);
appmatcher.add("/foo", backend=app1);
appmatcher.add("/bar", backend=app2);
appmatcher.add("/baz", backend=app3);
appmatcher.add("/quux", backend=app4);
appmatcher.compile();
}
After a successful match, the string or backend associated with the
matching pattern can be retrieved with the ``.string()`` and
``.backend()`` methods. This makes it possible, for example, to
construct a redirect response or choose the backend with code that is
both efficient and compact, even with a large set of patterns to be
matched::
# Use the prefix object to construct a redirect response from
# a matching request URL.
sub vcl_recv {
if (prefix.match(req.url)) {
# Pass the string associated with the matching pattern
# to vcl_synth.
return(synth(1301, prefix.string()));
}
}
sub vcl_synth {
# The string associated with the matching pattern is in
# resp.reason.
if (resp.status == 1301) {
set resp.http.Location = "http://" + resp.reason + req.url;
set resp.status = 301;
set resp.reason = "Moved Permanently";
}
}
# Use the appmatcher object to choose a backend based on the
# request URL prefix.
sub vcl_recv {
if (appmatcher.match(req.url)) {
set req.backend_hint = appmatcher.backend();
}
}
regex options
-------------
......@@ -384,7 +461,7 @@ Example::
sub vcl_init {
new domainmatcher = re2.regex("^www\.(?P<domain>[^.]+)\.com$");
}
sub vcl_recv {
if (domainmatcher.match(req.http.Host)) {
set req.http.X-Domain = domainmatcher.namedref("domain");
......@@ -680,7 +757,7 @@ set
Initialize a set object that represents several patterns combined by
alternation -- ``|`` for "or".
Optional parameters control the interpretation of the resulting
composed pattern. The ``anchor`` parameter is an enum that can have
the values ``none``, ``start`` or ``both``, where ``none`` is the
......@@ -695,7 +772,7 @@ For example, if a set is initialized with ``anchor=both``, and the
patterns ``foo`` and ``bar`` are added, then matches against the set
match a string against ``^foo$|^bar$``, or equivalently
``^(foo|bar)$``.
The usual regex options can be set, which then control matching
against the resulting composed pattern. However, the ``never_capture``
option cannot be set, and is always implicitly true, since backrefs
......@@ -731,6 +808,14 @@ Add the given pattern to the set. If the pattern is invalid,
``.add()`` fails, and the VCL will fail to load, with an error message
describing the problem.
If values for the ``string`` and/or ``backend`` parameters are
provided, then these values can be retrieved with the ``.string()``
and ``.backend()`` methods, respectively, as described below. This
makes it possible to associate a string or a backend with the added
pattern after it matches successfully. ``string`` and ``backend``
default to NULL; that is; by default the pattern is not associated
with any such value.
``.add()`` MUST be called in ``vcl_init``, and MAY NOT be called after
``.compile()``. If ``.add()`` is called in any other subroutine, an
error message with ``VCL_Error`` is logged, and the call has no
......@@ -742,6 +827,8 @@ finally call ``.compile()`` when you're done.
When the ``.matched(INT)`` method is called after a successful match,
the numbering corresponds to the order in which patterns were added.
The same is true of the INT arguments that may be given for the
``.string()`` or ``.backend()`` methods.
Example::
......@@ -756,6 +843,9 @@ Example::
hostmatcher.compile();
}
# See the documentation of the .string() and .backend() methods
# below for uses of the parameters string and backend for .add().
.. _func_set.compile:
set.compile
......@@ -894,6 +984,118 @@ set.string
STRING set.string(INT n=0, ENUM {FIRST,LAST} select=0)
Returns the string associated with the `nth` pattern added to the set,
or with the pattern in the set that matched in the most recent call to
``.match()`` in the same task scope (client or backend context).
If ``n`` > 0, then return the string associated with the `nth` pattern
added to the set with the ``string`` parameter of the ``.add()``
method, counting from 1. This will return the `nth` string in any
context, regardless of whether ``.match()`` was called previously, or
whether a previous call returned ``true`` or ``false``.
If ``n`` <= 0, then return the string associated with a pattern in the
set that matched successfully in the most recent call to ``.match()``
in the task scope. Since ``n`` is 0 by default, ``n`` can be left out
for this purpose.
If ``n`` <= 0 and exactly one pattern in the set matched in the most
recent invocation of ``.match()`` (and hence ``.nmatches()`` returns
1), then the string associated with that pattern is returned. The
``select`` parameter is ignored in this case. Thus ``.string()``
can be used for this purpose with no arguments.
If ``n`` <= 0 and more than one pattern matched in the most recent
``.match()`` call (``.nmatches()`` > 1), then the string returned is
determined by the ``select`` parameter. The values ``FIRST`` and
``LAST`` specify that, of the patterns that matched, the first or last
one added via the ``.add()`` method is chosen, and the string
associated with that pattern is returned.
``.string()`` fails, returning NULL with an a ``VCL_Error`` message in
the log, if:
* ``n`` is greater than the number of patterns in the set.
* ``n`` <= 0 (or left to the default), but ``.match()`` was not called
earlier in the same task scope (client or backend context).
* ``n`` <= 0, but the previous ``.match()`` call returned ``false``.
* ``n`` <= 0 and no value is given for the ``select`` ENUM, but more
than one pattern matched in the previous ``.match()`` call.
* A pattern from the set is selected as described above, but no string
was associated with that pattern; that is, the ``string`` parameter
was not set in the ``.add()`` call that added the pattern.
Examples::
# Match the request URL against a set of patterns, and generate
# a synthetic redirect response with a Location header derived
# from the string assoicated with the matching pattern.
# In the first example, exactly one pattern in the set matches.
sub vcl_init {
# With anchor=both, we specify exact matches.
new matcher = re2.set(anchor=both);
matcher.add("/foo/bar", "/baz/quux");
matcher.add("/baz/bar/foo", "/baz/quux/foo");
matcher.add("/quux/bar/baz/foo", "/baz/quux/foo/bar");
matcher.compile();
}
sub vcl_recv {
if (matcher.match(req.url)) {
# Confirm that there was exactly one match
if (matcher.nmatches() != 1) {
return(fail);
}
# Divert to vcl_synth, sending the string associated
# with the matching pattern in the "reason" field.
return(synth(1301, matcher.string()));
}
}
sub vcl_synth {
# Construct a redirect response, using the path set in
# resp.reason.
if (resp.status == 1301) {
set resp.http.Location
= "http://otherdomain.org" + resp.reason;
set resp.status = 301;
set resp.reason = "Moved Permanently";
return(deliver);
}
}
# In the second example, the patterns that may match have
# common prefixes, and more than one pattern may match. We
# add patterns to the set in a "more specific" to "less
# specific" order, and we choose the most specific pattern
# that matches, by specifying the first matching pattern in
# the set.
sub vcl_init {
# With anchor=start, we specify matching prefixes.
new matcher = re2.set(anchor=start);
matcher.add("/foo/bar/baz/quux", "/baz/quux");
matcher.add("/foo/bar/baz", "/baz/quux/foo");
matcher.add("/foo/bar", "/baz/quux/foo/bar");
matcher.add("/foo", "/baz");
matcher.compile();
}
sub vcl_recv {
if (matcher.match(req.url)) {
# Select the first matching pattern
return(synth(1301, matcher.string(select=FIRST)));
}
}
# vcl_synth is implemented as shown above
.. _func_set.backend:
set.backend
......@@ -903,6 +1105,62 @@ set.backend
BACKEND set.backend(INT n=0, ENUM {FIRST,LAST} select=0)
Returns the backend associated with the `nth` pattern added to the
set, or with the pattern in the set that matched in the most recent
call to ``.match()`` in the same task scope (client or backend
context).
The rules for selecting a pattern from the set and its associated
backend are the same as for ``.string()`` above:
* If ``n`` > 0, then return the string associated with the `nth`
pattern added to the set with the ``backend`` parameter of the
``.add()`` method, counting from 1 (independent of any previous
``.match()`` call).
* If ``n`` <= 0 (or left to the default) and exactly one pattern in
the set matched in the most recent invocation of ``.match()``
(``.nmatches()`` == 1), then return the backend associated with that
pattern (ignoring the value of ``select``).
* If ``n`` <= 0 and ``.nmatches()`` > 1, then return the backend
associated with the first or last matching pattern in the set
as determined by the ``select`` parameter.
``.backend()`` fails, returning NULL with an a ``VCL_Error`` message
in the log, under the same conditions described for ``.string()``
above.
Example::
# Choose a backend based on the URL prefix.
# In this example, assume that backends b1 through b4
# have been defined.
sub vcl_init {
# Use anchor=start to match prefixes.
# The prefixes are unique, so exactly one will match.
new matcher = re2.set(anchor=start);
matcher.add("/foo", backend=b1);
matcher.add("/bar", backend=b2);
matcher.add("/baz", backend=b3);
matcher.add("/quux", backend=b4);
matcher.compile();
}
sub vcl_recv {
if (matcher.match(req.url)) {
# Confirm that there was exactly one match
if (matcher.nmatches() != 1) {
return(fail);
}
# Set the backend hint to the backend associated
# with the matching pattern.
set req.backend_hint = matcher.backend();
}
}
.. _func_version:
version
......
......@@ -30,11 +30,16 @@ $Module re2 3 Varnish Module for access to the Google RE2 regular expression eng
# set object interface
new OBJECT = re2.set([ENUM anchor] [, <regex options>])
VOID <obj>.add(STRING)
VOID <obj>.add(STRING [, STRING string] [, BACKEND backend])
VOID <obj>.compile()
BOOL <obj>.match(STRING)
INT <obj>.nmatches()
BOOL <obj>.matched(INT)
STRING <obj>.string([INT n,] [ENUM select])
BACKEND <obj>.backend([INT n,] [ENUM select])
# VMOD version
STRING re2.version()
DESCRIPTION
===========
......@@ -133,6 +138,78 @@ in which they were added::
}
}
An advantage of alternations and sets with RE2, as opposed to an
alternation in PCRE or a series of separate matches in an
if-elsif-elsif sequence, comes from the fact that the matcher is
implemented as a state machine. That means that the matcher progresses
through the string to be matched just once, following patterns in the
set that match through the state machine, or determining that there is
no match as soon as there are no more possible paths in the state
machine. So a string can be matched against a large set of patterns in
time that is proportional to the length of the string to be
matched. In contrast, PCRE matches patterns in an alternation one
after another, stopping after the first matching pattern, or
attempting matches against all of them if there is no match. Thus a
match against an alternation in PCRE is not unlike an if-elsif-elsif
sequence of individual matches, and requires the time needed for each
individual match, overall in proportion with the number of patterns to
be matched.
Another advantage of the VMOD's set object is the ability to associate
strings or backends with the patterns added to the set with the
``.add()`` method::
sub vcl_init {
new prefix = re2.set(anchor=start);
prefix.add("/foo", string="www.domain1.com");
prefix.add("/bar", string="www.domain2.com");
prefix.add("/baz", string="www.domain3.com");
prefix.add("/quux", string="www.domain4.com");
prefix.compile();
new appmatcher = re2.set(anchor=start);
appmatcher.add("/foo", backend=app1);
appmatcher.add("/bar", backend=app2);
appmatcher.add("/baz", backend=app3);
appmatcher.add("/quux", backend=app4);
appmatcher.compile();
}
After a successful match, the string or backend associated with the
matching pattern can be retrieved with the ``.string()`` and
``.backend()`` methods. This makes it possible, for example, to
construct a redirect response or choose the backend with code that is
both efficient and compact, even with a large set of patterns to be
matched::
# Use the prefix object to construct a redirect response from
# a matching request URL.
sub vcl_recv {
if (prefix.match(req.url)) {
# Pass the string associated with the matching pattern
# to vcl_synth.
return(synth(1301, prefix.string()));
}
}
sub vcl_synth {
# The string associated with the matching pattern is in
# resp.reason.
if (resp.status == 1301) {
set resp.http.Location = "http://" + resp.reason + req.url;
set resp.status = 301;
set resp.reason = "Moved Permanently";
}
}
# Use the appmatcher object to choose a backend based on the
# request URL prefix.
sub vcl_recv {
if (appmatcher.match(req.url)) {
set req.backend_hint = appmatcher.backend();
}
}
regex options
-------------
......@@ -333,7 +410,7 @@ Example::
sub vcl_init {
new domainmatcher = re2.regex("^www\.(?P<domain>[^.]+)\.com$");
}
sub vcl_recv {
if (domainmatcher.match(req.http.Host)) {
set req.http.X-Domain = domainmatcher.namedref("domain");
......@@ -590,7 +667,7 @@ $Object set(ENUM { none, start, both } anchor="none", BOOL utf8=0,
Initialize a set object that represents several patterns combined by
alternation -- ``|`` for "or".
Optional parameters control the interpretation of the resulting
composed pattern. The ``anchor`` parameter is an enum that can have
the values ``none``, ``start`` or ``both``, where ``none`` is the
......@@ -605,7 +682,7 @@ For example, if a set is initialized with ``anchor=both``, and the
patterns ``foo`` and ``bar`` are added, then matches against the set
match a string against ``^foo$|^bar$``, or equivalently
``^(foo|bar)$``.
The usual regex options can be set, which then control matching
against the resulting composed pattern. However, the ``never_capture``
option cannot be set, and is always implicitly true, since backrefs
......@@ -634,6 +711,14 @@ Add the given pattern to the set. If the pattern is invalid,
``.add()`` fails, and the VCL will fail to load, with an error message
describing the problem.
If values for the ``string`` and/or ``backend`` parameters are
provided, then these values can be retrieved with the ``.string()``
and ``.backend()`` methods, respectively, as described below. This
makes it possible to associate a string or a backend with the added
pattern after it matches successfully. ``string`` and ``backend``
default to NULL; that is; by default the pattern is not associated
with any such value.
``.add()`` MUST be called in ``vcl_init``, and MAY NOT be called after
``.compile()``. If ``.add()`` is called in any other subroutine, an
error message with ``VCL_Error`` is logged, and the call has no
......@@ -645,6 +730,8 @@ finally call ``.compile()`` when you're done.
When the ``.matched(INT)`` method is called after a successful match,
the numbering corresponds to the order in which patterns were added.
The same is true of the INT arguments that may be given for the
``.string()`` or ``.backend()`` methods.
Example::
......@@ -659,6 +746,9 @@ Example::
hostmatcher.compile();
}
# See the documentation of the .string() and .backend() methods
# below for uses of the parameters string and backend for .add().
$Method VOID .compile()
Compile the compound pattern represented by the set -- an alternation
......@@ -762,8 +852,176 @@ Example::
$Method STRING .string(INT n=0, ENUM {FIRST, LAST} select=0)
Returns the string associated with the `nth` pattern added to the set,
or with the pattern in the set that matched in the most recent call to
``.match()`` in the same task scope (client or backend context).
If ``n`` > 0, then return the string associated with the `nth` pattern
added to the set with the ``string`` parameter of the ``.add()``
method, counting from 1. This will return the `nth` string in any
context, regardless of whether ``.match()`` was called previously, or
whether a previous call returned ``true`` or ``false``.
If ``n`` <= 0, then return the string associated with a pattern in the
set that matched successfully in the most recent call to ``.match()``
in the task scope. Since ``n`` is 0 by default, ``n`` can be left out
for this purpose.
If ``n`` <= 0 and exactly one pattern in the set matched in the most
recent invocation of ``.match()`` (and hence ``.nmatches()`` returns
1), then the string associated with that pattern is returned. The
``select`` parameter is ignored in this case. Thus ``.string()``
can be used for this purpose with no arguments.
If ``n`` <= 0 and more than one pattern matched in the most recent
``.match()`` call (``.nmatches()`` > 1), then the string returned is
determined by the ``select`` parameter. The values ``FIRST`` and
``LAST`` specify that, of the patterns that matched, the first or last
one added via the ``.add()`` method is chosen, and the string
associated with that pattern is returned.
``.string()`` fails, returning NULL with an a ``VCL_Error`` message in
the log, if:
* ``n`` is greater than the number of patterns in the set.
* ``n`` <= 0 (or left to the default), but ``.match()`` was not called
earlier in the same task scope (client or backend context).
* ``n`` <= 0, but the previous ``.match()`` call returned ``false``.
* ``n`` <= 0 and no value is given for the ``select`` ENUM, but more
than one pattern matched in the previous ``.match()`` call.
* A pattern from the set is selected as described above, but no string
was associated with that pattern; that is, the ``string`` parameter
was not set in the ``.add()`` call that added the pattern.
Examples::
# Match the request URL against a set of patterns, and generate
# a synthetic redirect response with a Location header derived
# from the string assoicated with the matching pattern.
# In the first example, exactly one pattern in the set matches.
sub vcl_init {
# With anchor=both, we specify exact matches.
new matcher = re2.set(anchor=both);
matcher.add("/foo/bar", "/baz/quux");
matcher.add("/baz/bar/foo", "/baz/quux/foo");
matcher.add("/quux/bar/baz/foo", "/baz/quux/foo/bar");
matcher.compile();
}
sub vcl_recv {
if (matcher.match(req.url)) {
# Confirm that there was exactly one match
if (matcher.nmatches() != 1) {
return(fail);
}
# Divert to vcl_synth, sending the string associated
# with the matching pattern in the "reason" field.
return(synth(1301, matcher.string()));
}
}
sub vcl_synth {
# Construct a redirect response, using the path set in
# resp.reason.
if (resp.status == 1301) {
set resp.http.Location
= "http://otherdomain.org" + resp.reason;
set resp.status = 301;
set resp.reason = "Moved Permanently";
return(deliver);
}
}
# In the second example, the patterns that may match have
# common prefixes, and more than one pattern may match. We
# add patterns to the set in a "more specific" to "less
# specific" order, and we choose the most specific pattern
# that matches, by specifying the first matching pattern in
# the set.
sub vcl_init {
# With anchor=start, we specify matching prefixes.
new matcher = re2.set(anchor=start);
matcher.add("/foo/bar/baz/quux", "/baz/quux");
matcher.add("/foo/bar/baz", "/baz/quux/foo");
matcher.add("/foo/bar", "/baz/quux/foo/bar");
matcher.add("/foo", "/baz");
matcher.compile();
}
sub vcl_recv {
if (matcher.match(req.url)) {
# Select the first matching pattern
return(synth(1301, matcher.string(select=FIRST)));
}
}
# vcl_synth is implemented as shown above
$Method BACKEND .backend(INT n=0, ENUM {FIRST, LAST} select=0)
Returns the backend associated with the `nth` pattern added to the
set, or with the pattern in the set that matched in the most recent
call to ``.match()`` in the same task scope (client or backend
context).
The rules for selecting a pattern from the set and its associated
backend are the same as for ``.string()`` above:
* If ``n`` > 0, then return the string associated with the `nth`
pattern added to the set with the ``backend`` parameter of the
``.add()`` method, counting from 1 (independent of any previous
``.match()`` call).
* If ``n`` <= 0 (or left to the default) and exactly one pattern in
the set matched in the most recent invocation of ``.match()``
(``.nmatches()`` == 1), then return the backend associated with that
pattern (ignoring the value of ``select``).
* If ``n`` <= 0 and ``.nmatches()`` > 1, then return the backend
associated with the first or last matching pattern in the set
as determined by the ``select`` parameter.
``.backend()`` fails, returning NULL with an a ``VCL_Error`` message
in the log, under the same conditions described for ``.string()``
above.
Example::
# Choose a backend based on the URL prefix.
# In this example, assume that backends b1 through b4
# have been defined.
sub vcl_init {
# Use anchor=start to match prefixes.
# The prefixes are unique, so exactly one will match.
new matcher = re2.set(anchor=start);
matcher.add("/foo", backend=b1);
matcher.add("/bar", backend=b2);
matcher.add("/baz", backend=b3);
matcher.add("/quux", backend=b4);
matcher.compile();
}
sub vcl_recv {
if (matcher.match(req.url)) {
# Confirm that there was exactly one match
if (matcher.nmatches() != 1) {
return(fail);
}
# Set the backend hint to the backend associated
# with the matching pattern.
set req.backend_hint = matcher.backend();
}
}
$Function STRING version()
Return the version string for this VMOD.
......
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment