README.rst 10.1 KB

vmod_re

Varnish Module for Regular Expression Matching with Subexpression Capture

Manual section: 3

SYNOPSIS

import re [from "path"] ;

DESCRIPTION

Varnish Module (VMOD) for matching strings against regular expressions, and for extracting captured substrings after matches.

Regular expression matching as implemented by the VMOD is equivalent to VCL's infix operator ~. The VMOD is motivated by the fact that backreference capture in standard VCL requires verbose and suboptimal use of the regsub or regsuball functions. For example, this common idiom in VCL captures a string of digits following the substring "bar" from one request header into another:

sub vcl_recv {
        if (req.http.Foo ~ "bar\d+")) {
           set req.http.Baz = regsub(req.http.Foo,
                                     "^.*bar(\d+).*$", "\1");

        }
}

It requires two regex executions when a match is found, the second one less efficient than the first (since it must match the entire string to be replaced while capturing a substring), and is just cumbersome.

The equivalent solution with the VMOD looks like this:

import re;

sub vcl_init {
        new myregex = re.regex("bar(\d+)");
}

sub vcl_recv {
        if (myregex.match(req.http.Foo)) {
           set req.http.Baz = myregex.backref(1, "");
        }
}

The object is created at VCL initialization with the regex containing the capture expression, only describing the substring to be matched. When a match with the match method succeeds, then a captured string can be obtained from the backref method.

The VMOD also supports dynamic regex matching with the match_dyn method:

import re;

sub vcl_init {
        new myregex = re.regex("");
}

sub vcl_backend_response {
        if (myregex.match_dyn(beresp.http.Bar + "(\d+)",
                              req.http.Foo)) {
           set beresp.http.Baz = myregex.backref(1, "");
        }
}

In match_dyn, the regex in the first argument is compiled when it is called, and matched against the string in the second argument; the regex provided in vcl_init is ignored. Subsequent calls to backref extract substrings from the matched string.

As with the constructor, the regex argument to match_dyn should contain any capturing expressions needed for calls to backref.

match_dyn makes it possible to construct regexen whose contents are not fully known until runtime, but match is more efficient, since it re-uses the compiled expression obtained at VCL initialization. So if you are matching against a fixed pattern that never changes during the lifetime of VCL, use match.

CONTENTS

  • regex(STRING)
  • STRING version()

regex

new OBJ = regex(STRING)
Prototype
new OBJ = re.regex(STRING)
Description
Create a regex object with the given regular expression. The expression is compiled when the constructor is called. It should include any capturing parentheses that will be needed for extracting backreferences.
Example
new myregex = re.regex("\bmax-age\s*=\s*(\d+)");

regex.failed

BOOL regex.failed()
Description
Returns true if regex compilation in the constructor failed. NOTE: This method only pertains to compilation in the constructor, not to compilation of a regex in match_dyn.
Example
if (myregex.failed()) { # ...

regex.error

STRING regex.error()
Description
Returns an error message if regex compilation in the constructor failed, or the empty string if compilation succedded. Like failed this only pertains to failures of compilation in the constructor, not in match_dyn.
Example
std.log("myregex failed to compile: " + myregex.error());

regex.match

BOOL regex.match(STRING)
Description
Determines whether the given string matches the regex compiled by the constructor; functionally equivalent to VCL's infix operator ~.
Example
if (myregex.match(beresp.http.Surrogate-Control)) { # ...

regex.match_dyn

BOOL regex.match_dyn(STRING, STRING)
Description
Compiles the regular expression given in the first argument, and determines whether it matches the string in the second argument. The regex supplied in the constructor is ignored.
Example
if (myregex.match_dyn(req.http.Foo + "(\d+)",
beresp.http.Bar)) { # ...

regex.backref

STRING regex.backref(INT, STRING)
Description

Extracts the nth subexpression of the most recent successful call of the match or match_dyn method for this object in the same VCL subroutine call, or a fallback string in case the extraction fails. Backref 0 indicates the entire matched string. Thus this function behaves like the \n symbols in regsub and regsuball, and the $1, $2 ... variables in Perl.

After unsuccessful matches, the fallback string is returned for any call to backref.

The VCL infix operators ~ and !~ do not affect this method, nor do the functions regsub or regsuball.

If backref is called without any prior call to match or match_dyn for this object in the same VCL context, then an error message is emitted to the Varnish log using the VCL_Error tag, and the fallback string is returned.

Example
set beresp.ttl = std.duration(myregex.backref(1, "120"), 120s);

version

STRING version()
Description
Returns the version string for this vmod.
Example
set resp.http.X-re-version = re.version();

REQUIREMENTS

The VMOD requires Varnish 5.0.0. See the project repository for versions that are compatible with other versions of Varnish.

INSTALLATION

The VMOD is built on a system where an instance of Varnish is installed, and the auto-tools will attempt to locate the Varnish instance, and then pull in libraries and other support files from there.

Quick start

This sequence should be enough in typical setups:

  1. ./autogen.sh (for git-installation)
  2. ./configure
  3. make
  4. make check (regression tests)
  5. make install (may require root: sudo make install)

Alternative configs

If you have installed Varnish to a non-standard directory, call autogen.sh and configure with PKG_CONFIG_PATH pointing to the appropriate path. For example, when varnishd configure was called with --prefix=$PREFIX, use

PKG_CONFIG_PATH=${PREFIX}/lib/pkgconfig export PKG_CONFIG_PATH

For developers

As with Varnish, you can use these configure options for stricter compiling:

  • --enable-developer-warnings
  • --enable-extra-developer-warnings (for GCC 4)
  • --enable-werror

The VMOD must always build successfully with these options enabled.

Also as with Varnish, you can add --enable-debugging-symbols, so that the VMOD's symbols are available to debuggers, in core dumps and so forth.

AUTHORS

UPLEX Nils Goroll Systemoptimierung

HISTORY

Version 0.1: Initial version, compatible with Varnish 3

Version 0.2: various fixes, last version compatible with Varnish 3

Version 0.3: compatible with Varnish 4

Version 0.4: support dynamic matches

Version 0.5: add the failed() and error() methods

Version 0.6: bugfix backrefs for which no string is captured

Version 1.0: stable version compatible with Varnish 4.0, maintained on
branch 4.0, before beginning upgrades for 4.1

Version 1.1: compatible with Varnish 5.0

LIMITATIONS

Regular expressions passed into the constructor and into match_dyn are compiled at run-time, so there are no errors at VCL compile-time for invalid expressions. If an expression is invalid, then a VCL_error message is emitted to the Varnish log, matches always fail, and errors in the constructor can be inspected with failed and error.

The VMOD allocates memory for captured subexpressions from Varnish workspaces, whose sizes are determined by the runtime parameters workspace_backend, for calls within the vcl_backend_* subroutines, and workspace_client, for the other VCL subs. The VMOD copies the string to be matched into the workspace, along with some overhead. For typical usage, the default workspace sizes are probably enough; but if you are matching against many, long strings in each client or backend context, you might need to increase the Varnish parameters for workspace sizes. If the VMOD cannot allocate enough workspace, then a VCL_error message is emitted, and the match methods as well as backref will fail. (If you're just using the regexen for matching and not to capture backrefs, then you might as well just use the standard VCL operators ~ and !~, and save the workspace.)

backref can extract up to 10 subexpressions, in addition to the full expression indicated by backref 0. If a match or match_dyn operation would have resulted in more than 11 captures (10 substrings and the full string), then a VCL_Error message is emitted to the Varnish log, and the captures are limited to 11.

XXX: the following paragraph is currently not true, bug is under investigation

Regular expression matching is subject to the same limitations that hold for standard regexen in VCL, for example as set by the runtime parameters pcre_match_limit and pcre_match_limit_recursion.

SEE ALSO

COPYRIGHT

Copyright (c) 2014-2015 UPLEX Nils Goroll Systemoptimierung
All rights reserved

This document is licensed under the same conditions as the libvmod-re
project. See LICENSE for details.

Author: Geoffrey Simmons <geoffrey.simmons@uplex.de>