L

libvmod-re

Varnish Module for Regular Expression Matching with Subexpression Capture

skipped a96736ac save one strlen() · by Nils Goroll

vmod_re

Varnish Module for Regular Expression Matching with Subexpression Capture

Manual section: 3

SYNOPSIS

import re [from "path"] ;

DESCRIPTION

Varnish Module (VMOD) for matching strings against regular expressions, and for extracting captured substrings after matches.

Regular expression matching as implemented by the VMOD is equivalent to VCL's infix operator ~. The VMOD is motivated by the fact that backreference capture in standard VCL requires verbose and suboptimal use of the regsub or regsuball functions. For example, this common idiom in VCL captures a string of digits following the substring "bar" from one request header into another:

sub vcl_recv {
        if (req.http.Foo ~ "bar\d+")) {
           set req.http.Baz = regsub(req.http.Foo,
                                     "^.*bar(\d+).*$", "\1");

        }
}

It requires two regex executions when a match is found, the second one less efficient than the first (since it must match the entire string to be replaced while capturing a substring), and is just cumbersome.

The equivalent solution with the VMOD looks like this:

import re;

sub vcl_init {
        new myregex = re.regex("bar(\d+)");
}

sub vcl_recv {
        if (myregex.match(req.http.Foo)) {
           set req.http.Baz = myregex.backref(1);
        }
}

The object is created at VCL initialization with the regex containing the capture expression, only describing the substring to be matched. When a match with the match method succeeds, then a captured string can be obtained from the backref method.

Calls to the backref method refer back to the most recent successful call to match for the same object in the same task scope; that is, in the same client or backend context. For example if match is called for an object in one of the vcl_backend_* subroutines and returns true, then subsequent calls to backref in the same backend scope extract substrings from the matched substring.

The VMOD also supports dynamic regex matching with the match_dyn and backref_dyn functions:

import re;

sub vcl_backend_response {
        if (re.match_dyn(beresp.http.Bar + "(\d+)",
                              req.http.Foo)) {
           set beresp.http.Baz = re.backref_dyn(1);
        }
}

In match_dyn, the regex in the first argument is compiled when it is called, and matched against the string in the second argument. Subsequent calls to backref_dyn extract substrings from the matched string for the most recent successful call to match_dyn in the same task scope.

As with the constructor, the regex argument to match_dyn should contain any capturing expressions needed for calls to backref_dyn.

match_dyn makes it possible to construct regexen whose contents are not fully known until runtime, but match is more efficient, since it re-uses the compiled expression obtained at VCL initialization. So if you are matching against a fixed pattern that never changes during the lifetime of VCL, use match.

CONTENTS

  • regex(STRING)
  • BOOL match_dyn(PRIV_TASK, STRING, STRING)
  • STRING backref_dyn(PRIV_TASK, INT, STRING)
  • STRING version()

regex

new OBJ = regex(STRING)
Description

Create a regex object with the given regular expression. The expression is compiled when the constructor is called. It should include any capturing parentheses that will be needed for extracting backreferences.

If the regular expression fails to compile, then the VCL load fails with an error message describing the problem.

Example
new myregex = re.regex("\bmax-age\s*=\s*(\d+)");

regex.match

BOOL regex.match(STRING)
Description
Determines whether the given string matches the regex compiled by the constructor; functionally equivalent to VCL's infix operator ~.
Example
if (myregex.match(beresp.http.Surrogate-Control)) { # ...

regex.backref

STRING regex.backref(INT, STRING fallback="**BACKREF METHOD FAILED**")
Description

Extracts the nth subexpression of the most recent successful call of the match method for this object in the same task scope (client or backend context), or a fallback string in case the extraction fails. Backref 0 indicates the entire matched string. Thus this function behaves like the \n symbols in regsub and regsuball, and the $1, $2 ... variables in Perl.

After unsuccessful matches, the fallback string is returned for any call to backref. The default value of fallback is "**BACKREF METHOD FAILED**".

The VCL infix operators ~ and !~ do not affect this method, nor do the functions regsub or regsuball.

If backref is called without any prior call to match for this object in the same task scope, then an error message is emitted to the Varnish log using the VCL_Error tag, and the fallback string is returned.

Example
set beresp.ttl = std.duration(myregex.backref(1, "120"), 120s);

match_dyn

BOOL match_dyn(PRIV_TASK, STRING, STRING)
Description

Compiles the regular expression given in the first argument, and determines whether it matches the string in the second argument.

If the regular expression fails to compile, then an error message describing the problem is emitted to the Varnish log with the tag VCL_Error, and match_dyn returns false.

Example
if (re.match_dyn(req.http.Foo + "(\d+)", beresp.http.Bar)) { # ...

backref_dyn

STRING backref_dyn(PRIV_TASK, INT, STRING fallback="**BACKREF FUNCTION FAILED**")
Description

Similar to the backref method, this function extracts the nth subexpression of the most recent successful call of the match_dyn function in the same task scope, or a fallback string in case the extraction fails.

After unsuccessful matches, the fallback string is returned for any call to backref_dyn. The default value of fallback is "**BACKREF FUNCTION FAILED**".

If backref_dyn is called without any prior call to match_dyn in the same task scope, then a VCL_Error message is logged, and the fallback string is returned.

version

STRING version()
Description
Returns the version string for this vmod.
Example
set resp.http.X-re-version = re.version();

REQUIREMENTS

The VMOD requires the Varnish master branch since commit a339f63. See the project repository for versions that are compatible with other versions of Varnish.

INSTALLATION

The VMOD is built on a system where an instance of Varnish is installed, and the auto-tools will attempt to locate the Varnish instance, and then pull in libraries and other support files from there.

Quick start

This sequence should be enough in typical setups:

  1. ./autogen.sh (for git-installation)
  2. ./configure
  3. make
  4. make check (regression tests)
  5. make install (may require root: sudo make install)

Alternative configs

If you have installed Varnish to a non-standard directory, call autogen.sh and configure with PKG_CONFIG_PATH pointing to the appropriate path. For example, when varnishd configure was called with --prefix=$PREFIX, use

PKG_CONFIG_PATH=${PREFIX}/lib/pkgconfig export PKG_CONFIG_PATH

For developers

As with Varnish, you can use these configure options for stricter compiling:

  • --enable-developer-warnings
  • --enable-extra-developer-warnings (for GCC 4)
  • --enable-werror

The VMOD must always build successfully with these options enabled.

Also as with Varnish, you can add --enable-debugging-symbols, so that the VMOD's symbols are available to debuggers, in core dumps and so forth.

AUTHORS

UPLEX Nils Goroll Systemoptimierung

HISTORY

Version 0.1: Initial version, compatible with Varnish 3

Version 0.2: various fixes, last version compatible with Varnish 3

Version 0.3: compatible with Varnish 4

Version 0.4: support dynamic matches

Version 0.5: add the failed() and error() methods

Version 0.6: bugfix backrefs for which no string is captured

Version 1.0: stable version compatible with Varnish 4.0, maintained on
branch 4.0, before beginning upgrades for 4.1

Version 1.1: compatible with Varnish 5.0

Version 2.0: compatible with Varnish 5.1

LIMITATIONS

The VMOD allocates memory for captured subexpressions from Varnish workspaces, whose sizes are determined by the runtime parameters workspace_backend, for calls within the vcl_backend_* subroutines, and workspace_client, for the other VCL subs. The VMOD copies the string to be matched into the workspace, if it's not already in the workspace, and also uses workspace to save data about backreferences.

For typical usage, the default workspace sizes are probably enough; but if you are matching against many, long strings in each client or backend context, you might need to increase the Varnish parameters for workspace sizes. If the VMOD cannot allocate enough workspace, then a VCL_error message is emitted, and the match methods as well as backref will fail. (If you're just using the regexen for matching and not to capture backrefs, then you might as well just use the standard VCL operators ~ and !~, and save the workspace.)

backref can extract up to 10 subexpressions, in addition to the full expression indicated by backref 0. If a match or match_dyn operation would have resulted in more than 11 captures (10 substrings and the full string), then a VCL_Error message is emitted to the Varnish log, and the captures are limited to 11.

Regular expression matching is subject to the same limitations that hold for standard regexen in VCL, for example as set by the runtime parameters pcre_match_limit and pcre_match_limit_recursion.

SEE ALSO

COPYRIGHT

Copyright (c) 2014-2015 UPLEX Nils Goroll Systemoptimierung
All rights reserved

This document is licensed under the same conditions as the libvmod-re
project. See LICENSE for details.

Author: Geoffrey Simmons <geoffrey.simmons@uplex.de>