README.rst 10.6 KB
Newer Older
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61
..
.. NB:  This file is machine generated, DO NOT EDIT!
..
.. Edit vmod.vcc and run make instead
..

.. role:: ref(emphasis)

.. _vmod_re(3):

=======
vmod_re
=======

-------------------------------------------------------------------------
Varnish Module for Regular Expression Matching with Subexpression Capture
-------------------------------------------------------------------------

:Manual section: 3

SYNOPSIS
========

import re [from "path"] ;


DESCRIPTION
===========

Varnish Module (VMOD) for matching strings against regular expressions,
and for extracting captured substrings after matches.

Regular expression matching as implemented by the VMOD is equivalent
to VCL's infix operator ``~``. The VMOD is motivated by the fact that
backreference capture in standard VCL requires verbose and suboptimal
use of the ``regsub`` or ``regsuball`` functions. For example, this
common idiom in VCL captures a string of digits following the
substring ``"bar"`` from one request header into another::

	sub vcl_recv {
		if (req.http.Foo ~ "bar\d+")) {
		   set req.http.Baz = regsub(req.http.Foo,
                                             "^.*bar(\d+).*$", "\1");

		}
	}

It requires two regex executions when a match is found, the second one
less efficient than the first (since it must match the entire string
to be replaced while capturing a substring), and is just cumbersome.

The equivalent solution with the VMOD looks like this::

	import re;

	sub vcl_init {
		new myregex = re.regex("bar(\d+)");
	}

	sub vcl_recv {
		if (myregex.match(req.http.Foo)) {
62
		   set req.http.Baz = myregex.backref(1);
63 64 65 66 67 68 69 70
		}
	}

The object is created at VCL initialization with the regex containing
the capture expression, only describing the substring to be
matched. When a match with the ``match`` method succeeds, then a
captured string can be obtained from the ``backref`` method.

71 72 73 74 75 76 77 78
Calls to the ``backref`` method refer back to the most recent
successful call to ``match`` for the same object in the same task
scope; that is, in the same client or backend context. For example if
``match`` is called for an object in one of the ``vcl_backend_*``
subroutines and returns ``true``, then subsequent calls to ``backref``
in the same backend scope extract substrings from the matched
substring.

79
The VMOD also supports dynamic regex matching with the ``match_dyn``
80
and ``backref_dyn`` functions::
81 82 83 84

	import re;

	sub vcl_backend_response {
85
		if (re.match_dyn(beresp.http.Bar + "(\d+)",
86
		                      req.http.Foo)) {
87
		   set beresp.http.Baz = re.backref_dyn(1);
88 89 90 91
		}
	}

In ``match_dyn``, the regex in the first argument is compiled when it
92 93 94 95
is called, and matched against the string in the second
argument. Subsequent calls to ``backref_dyn`` extract substrings from
the matched string for the most recent successful call to
``match_dyn`` in the same task scope.
96 97

As with the constructor, the regex argument to ``match_dyn`` should
98
contain any capturing expressions needed for calls to ``backref_dyn``.
99 100 101 102 103 104 105 106 107 108 109

``match_dyn`` makes it possible to construct regexen whose contents
are not fully known until runtime, but ``match`` is more efficient,
since it re-uses the compiled expression obtained at VCL
initialization. So if you are matching against a fixed pattern that
never changes during the lifetime of VCL, use ``match``.

CONTENTS
========

* regex(STRING)
110 111
* BOOL match_dyn(PRIV_TASK, STRING, STRING)
* STRING backref_dyn(PRIV_TASK, INT, STRING)
112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128
* STRING version()

.. _obj_regex:

regex
-----

::

	new OBJ = regex(STRING)

Description
	Create a regex object with the given regular expression. The
	expression is compiled when the constructor is called. It
	should include any capturing parentheses that will be needed
	for extracting backreferences.

129 130
	If the regular expression fails to compile, then the VCL
	load fails with an error message describing the problem.
131 132

Example
133
	``new myregex = re.regex("\bmax-age\s*=\s*(\d+)");``
134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158

.. _func_regex.match:

regex.match
-----------

::

	BOOL regex.match(STRING)

Description
	Determines whether the given string matches the regex compiled
	by the constructor; functionally equivalent to VCL's infix
	operator ``~``.

Example
	``if (myregex.match(beresp.http.Surrogate-Control)) { # ...``

.. _func_regex.backref:

regex.backref
-------------

::

159
	STRING regex.backref(INT, STRING fallback="**BACKREF METHOD FAILED**")
160 161 162

Description
	Extracts the `nth` subexpression of the most recent successful
163 164 165 166 167 168
	call of the ``match`` method for this object in the same task
	scope (client or backend context), or a fallback string in
	case the extraction fails.  Backref 0 indicates the entire
	matched string. Thus this function behaves like the ``\n``
	symbols in ``regsub`` and ``regsuball``, and the ``$1``,
	``$2`` ...  variables in Perl.
169 170

	After unsuccessful matches, the ``fallback`` string is returned
171 172
	for any call to ``backref``. The default value of ``fallback``
	is ``"**BACKREF METHOD FAILED**"``.
173 174 175 176 177

	The VCL infix operators ``~`` and ``!~`` do not affect this
	method, nor do the functions ``regsub`` or ``regsuball``.

	If ``backref`` is called without any prior call to ``match``
178 179 180
	for this object in the same task scope, then an error message
	is emitted to the Varnish log using the ``VCL_Error`` tag, and
	the fallback string is returned.
181 182 183 184

Example
        ``set beresp.ttl = std.duration(myregex.backref(1, "120"), 120s);``

185 186 187 188 189 190 191 192 193 194 195 196 197 198
.. _func_match_dyn:

match_dyn
---------

::

	BOOL match_dyn(PRIV_TASK, STRING, STRING)

Description
	Compiles the regular expression given in the first argument,
	and determines whether it matches the string in the second
	argument.

199 200 201 202 203
	If the regular expression fails to compile, then an error
	message describing the problem is emitted to the Varnish log
	with the tag ``VCL_Error``, and ``match_dyn`` returns
	``false``.

204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229
Example
	``if (re.match_dyn(req.http.Foo + "(\d+)", beresp.http.Bar)) { # ...``

.. _func_backref_dyn:

backref_dyn
-----------

::

	STRING backref_dyn(PRIV_TASK, INT, STRING fallback="**BACKREF FUNCTION FAILED**")

Description
	Similar to the ``backref`` method, this function extracts the
	`nth` subexpression of the most recent successful call of the
	``match_dyn`` function in the same task scope, or a fallback
	string in case the extraction fails.

	After unsuccessful matches, the ``fallback`` string is returned
	for any call to ``backref_dyn``. The default value of ``fallback``
	is ``"**BACKREF FUNCTION FAILED**"``.

	If ``backref_dyn`` is called without any prior call to ``match_dyn``
	in the same task scope, then a ``VCL_Error`` message is logged, and
	the fallback string is returned.

230 231 232 233 234 235 236 237 238 239 240 241 242 243 244
.. _func_version:

version
-------

::

	STRING version()

Description
        Returns the version string for this vmod.

Example
        ``set resp.http.X-re-version = re.version();``

245 246 247
REQUIREMENTS
============

Geoff Simmons's avatar
Geoff Simmons committed
248 249
The VMOD requires Varnish version 5.1. See the project repository for
versions that are compatible with other versions of Varnish.
250

251 252 253
INSTALLATION
============

254 255 256 257
The VMOD is built on a system where an instance of Varnish is
installed, and the auto-tools will attempt to locate the Varnish
instance, and then pull in libraries and other support files from
there.
258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323

Quick start
-----------

This sequence should be enough in typical setups:

1. ``./autogen.sh``  (for git-installation)
2. ``./configure``
3. ``make``
4. ``make check`` (regression tests)
5. ``make install`` (may require root: sudo make install)

Alternative configs
-------------------

If you have installed Varnish to a non-standard directory, call
``autogen.sh`` and ``configure`` with ``PKG_CONFIG_PATH`` pointing to
the appropriate path. For example, when varnishd configure was called
with ``--prefix=$PREFIX``, use

 PKG_CONFIG_PATH=${PREFIX}/lib/pkgconfig
 export PKG_CONFIG_PATH

For developers
--------------

As with Varnish, you can use these ``configure`` options for stricter
compiling:

* ``--enable-developer-warnings``
* ``--enable-extra-developer-warnings`` (for GCC 4)
* ``--enable-werror``

The VMOD must always build successfully with these options enabled.

Also as with Varnish, you can add ``--enable-debugging-symbols``, so
that the VMOD's symbols are available to debuggers, in core dumps and
so forth.


AUTHORS
=======

* Geoffrey Simmons <geoff@uplex.de>
* Nils Goroll <nils.goroll@uplex.de>

UPLEX Nils Goroll Systemoptimierung

HISTORY
=======

Version 0.1: Initial version, compatible with Varnish 3

Version 0.2: various fixes, last version compatible with Varnish 3

Version 0.3: compatible with Varnish 4

Version 0.4: support dynamic matches

Version 0.5: add the failed() and error() methods

Version 0.6: bugfix backrefs for which no string is captured

Version 1.0: stable version compatible with Varnish 4.0, maintained on
             branch 4.0, before beginning upgrades for 4.1

324 325 326 327
Version 1.1: compatible with Varnish 5.0

Version 2.0: compatible with Varnish 5.1

Geoff Simmons's avatar
Geoff Simmons committed
328 329
Version 2.1: bugfix release requiring Varnish 5.1.x

330 331 332 333 334 335 336
LIMITATIONS
===========

The VMOD allocates memory for captured subexpressions from Varnish
workspaces, whose sizes are determined by the runtime parameters
``workspace_backend``, for calls within the ``vcl_backend_*``
subroutines, and ``workspace_client``, for the other VCL subs. The
337 338 339 340 341 342 343 344 345 346 347 348
VMOD copies the string to be matched into the workspace, if it's not
already in the workspace, and also uses workspace to save data about
backreferences.

For typical usage, the default workspace sizes are probably enough;
but if you are matching against many, long strings in each client or
backend context, you might need to increase the Varnish parameters for
workspace sizes. If the VMOD cannot allocate enough workspace, then a
``VCL_error`` message is emitted, and the match methods as well as
``backref`` will fail. (If you're just using the regexen for matching
and not to capture backrefs, then you might as well just use the
standard VCL operators ``~`` and ``!~``, and save the workspace.)
349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380

``backref`` can extract up to 10 subexpressions, in addition to the
full expression indicated by backref 0. If a ``match`` or
``match_dyn`` operation would have resulted in more than 11 captures
(10 substrings and the full string), then a ``VCL_Error`` message is
emitted to the Varnish log, and the captures are limited to 11.

Regular expression matching is subject to the same limitations that
hold for standard regexen in VCL, for example as set by the runtime
parameters ``pcre_match_limit`` and ``pcre_match_limit_recursion``.

SEE ALSO
========

* varnishd(1)
* vcl(7)
* pcre(3)
* http://lassekarstensen.wordpress.com/2013/12/19/converting-a-varnish-3-0-vmod-to-4-0/

COPYRIGHT
=========

::

  Copyright (c) 2014-2015 UPLEX Nils Goroll Systemoptimierung
  All rights reserved
 
  This document is licensed under the same conditions as the libvmod-re
  project. See LICENSE for details.
 
  Author: Geoffrey Simmons <geoffrey.simmons@uplex.de>