README.rst 33.2 KB
Newer Older
1 2 3 4 5 6 7 8 9 10 11 12 13 14
..
.. NB:  This file is machine generated, DO NOT EDIT!
..
.. Edit vmod.vcc and run make instead
..

.. role:: ref(emphasis)

.. _vmod_re2(3):

========
vmod_re2
========

Geoff Simmons's avatar
Geoff Simmons committed
15 16 17
---------------------------------------------------------------------
Varnish Module for access to the Google RE2 regular expression engine
---------------------------------------------------------------------
18 19 20 21 22 23 24 25 26

:Manual section: 3

SYNOPSIS
========

import re2 [from "path"] ;


Geoff Simmons's avatar
Geoff Simmons committed
27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117
::

  # regex object interface
  new OBJECT = re2.regex(STRING pattern [, <regex options>])
  BOOL <obj>.match(STRING)
  STRING <obj>.backref(INT ref)
  STRING <obj>.namedref(STRING name)
  STRING <obj>.sub(STRING text, STRING rewrite)
  STRING <obj>.suball(STRING text, STRING rewrite)
  STRING <obj>.extract(STRING text, STRING rewrite)

  # regex function interface
  BOOL re2.match(STRING pattern, STRING subject [, <regex options>])
  STRING re2.backref(INT ref)
  STRING re2.namedref(STRING name)
  STRING re2.sub(STRING pattern, STRING text, STRING rewrite [, <regex options>])
  STRING re2.suball(STRING pattern, STRING text, STRING rewrite [, <regex options>])
  STRING re2.extract(STRING pattern, STRING text, STRING rewrite [, <regex options>])

  # set object interface
  new OBJECT = re2.set([ENUM anchor] [, <regex options>])
  VOID <obj>.add(STRING)
  VOID <obj>.compile()
  BOOL <obj>.match(STRING)

DESCRIPTION
===========

Varnish Module (VMOD) for access to the Google RE2 regular expression engine.

Varnish VCL uses the PCRE library (Perl Compatible Regular Expressions) for
its native regular expressions, which runs very efficiently for many common
uses of pattern matching in VCL, as attested by years of successful use of
PCRE with Varnish.

But for certain kinds of patterns, the worst-case running time of the PCRE
matcher is exponential in the length of the string to be matched. The
matcher uses backtracking, implemented with recursive calls to the internal
``match()`` function. In principle there is no upper bound to the possible
depth of backtracking and recursion, except as imposed by the ``varnishd``
runtime parameters ``pcre_match_limit`` and ``pcre_match_limit_recursion``;
matches fail if either of these limits are met. Stack overflow caused by
deep backtracking has occasionally been the subject of ``varnishd`` issues.

RE2 differs from PCRE in that it limits the syntax of patterns so that they
always specify a regular language in the formally strict sense. Most notably,
backreferences within a pattern are not permitted, for example ``(foo|bar)\1``
to match ``foofoo`` and ``barbar``, but not ``foobar`` or ``barfoo``. See the
link in ``SEE ALSO`` for the specification of RE2 syntax.

This means that an RE2 matcher runs as a finite automaton, which guarantees
linear running time in the length of the matched string. There is no
backtracking, and hence no risk of deep recursion or stack overflow.

The relative advantages and disadvantages of RE2 and PCRE is a broad subject,
beyond the scope of this manual. See the references in ``SEE ALSO`` for more
in-depth discussion.

regex object and function interfaces
------------------------------------

The VMOD provides regular expression operations by way of the ``regex`` object
interface and a functional interface. For ``regex`` objects, the pattern is
compiled at VCL initialization time, and the compiled pattern is re-used for
each invocation of its methods. Compilation failures (due to errors in the
pattern) cause failure at initialization time, and the VCL fails to load. The
``.backref()`` and ``.namedref()`` methods refer back to the last invocation
of the ``.match()`` method for the same object.

The functional interface provides the same set of operations, but the pattern
is compiled at runtime on each invocation (and then discarded). Compilation
failures are reported as errors in the Varnish log. The ``backref()`` and
``namedref()`` functions refer back to the last invocation of the ``match()``
function, for any pattern.

Compiling a pattern at runtime on each invocation is considerably more costly
than re-using a compiled pattern. So for patterns that are fixed and known
at VCL initialization, the object interface should be used. The functional
interface should only be used for patterns whose contents are not known until
runtime.

set object interface
--------------------

``set`` objects provide a shorthand for constructing patterns that consist of
an alternation -- a group of patterns combined with ``|`` for "or". For
example::

  import re2;

  sub vcl_init {
Geoff Simmons's avatar
Geoff Simmons committed
118
        new myset = re2.set();
Geoff Simmons's avatar
Geoff Simmons committed
119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197
	myset.add("foo");
	myset.add("bar");
	myset.add("baz");
	myset.compile();
  }

``myset.match(<string>)`` can now be used to match a string against the
pattern ``foo|bar|baz``.

regex options
-------------

Where a pattern is compiled -- in the ``regex`` and ``set`` constructors, and
in functions that require compilation -- options may be specified that can
affect the interpretation of the pattern or the operation of the matcher. There
are default values for each option, and it is only necessary to specify options
in VCL that differ from the defaults. Options specified in a ``set``
constructor apply to all of the patterns in the resulting alternation.

``utf8``
  If true, characters in a pattern match Unicode code points, and hence may
  match more than one byte. If false, the pattern and strings to be matched
  are interpreted as Latin-1 (ISO 8859-1), and a pattern character matches
  exactly one byte. Default is **false**. Note that this differs from the
  RE2 default.
``posix_syntax``
  If true, patterns are restricted to POSIX (egrep) syntax. Otherwise,
  the pattern syntax resembles that of PCRE, with some deviations. See the
  link in ``SEE ALSO`` for the syntax specification. Default is **false**.
  The options ``perl_classes``, ``word_boundary`` and ``one_line`` are
  only consulted when this option is true.
``longest_match``
  If true, the matcher searches for the longest possible match where
  alternatives are possible. Otherwise, search for the first match. For
  example with the pattern ``a(b|bb)`` and the string ``abb``, ``abb``
  matches when ``longest_match`` is true, and backref 1 is ``bb``. Otherwise,
  ``ab`` matches, and backref 1 is ``b``. Default is **false**.
``max_mem``
  An upper bound (in bytes) for the size of the compiled pattern. If ``max_mem``
  is too small, the matcher may fall back to less efficient algorithms, or the
  pattern may fail to compile. Default is the RE2 default (8MB), which should
  suffice for typical patterns.
``literal``
  If true, the pattern is interpreted as a literal string, and no regex
  metacharacters (such as ``*``, ``+``, ``^`` and so forth) have their special
  meaning. Default is **false**.
``never_nl``
  If true, the newline character ``\n`` in a string is never matched, even if it
  appears in the pattern. Default is **false**.
``dot_nl``
  If true, then the dot character ``.`` in a pattern matches everything,
  including newline. Otherwise, ``.`` never matches newline. Default is
  **false**.
``never_capture``
  If true, parentheses in a pattern are interpreted as non-capturing, and all
  invocations of the ``backref`` and ``namedref`` methods or functions will
  fail, including ``backref(0)`` after a successful match. Default is **false**,
  except for set objects, for which ``never_capture`` is always true (and cannot
  be changed), since back references are not possible with sets.
``case_sensitive``
  If true, matches are case-sensitive. A pattern can override this option with
  the ``(?i)`` flag, unless ``posix_syntax`` is true. Default is **true**.

The following options are only consulted when ``posix_syntax`` is true. If
``posix_syntax`` is false, then these features are always enabled and cannot be
turned off.

``perl_classes``
  If true, then the perl character classes ``\d``, ``\s``, ``\w``, ``\D``,
  ``\S`` and ``\W`` are permitted in a pattern. Default is **false**.
``word_boundary``
  If true, the perl assertions ``\b`` and ``\B`` (word boundary and not a word
  boundary) are permitted. Default is **false**.
``one_line``
  If true, then ``^`` and ``$`` only match at the beginning and end of the
  string to be matched, regardless of newlines. Otherwise, ``^`` also matches
  just after a newline, and ``$`` also matches just before a newline. Default is
  **false**.

198 199 200
CONTENTS
========

201
* STRING backref(PRIV_TASK, INT, STRING)
202
* STRING extract(STRING, STRING, STRING, STRING, BOOL, BOOL, BOOL, INT, BOOL, BOOL, BOOL, BOOL, BOOL, BOOL, BOOL, BOOL)
203 204
* BOOL match(PRIV_TASK, STRING, STRING, BOOL, BOOL, BOOL, INT, BOOL, BOOL, BOOL, BOOL, BOOL, BOOL, BOOL, BOOL)
* STRING namedref(PRIV_TASK, STRING, STRING)
205
* Object regex
206
* STRING regex.backref(INT, STRING)
207
* STRING regex.extract(STRING, STRING, STRING)
208
* BOOL regex.match(STRING)
209
* STRING regex.namedref(STRING, STRING)
210
* STRING regex.sub(STRING, STRING, STRING)
211
* STRING regex.suball(STRING, STRING, STRING)
212 213 214 215
* Object set
* VOID set.add(STRING)
* VOID set.compile()
* BOOL set.match(STRING)
Geoff Simmons's avatar
Geoff Simmons committed
216
* STRING sub(STRING, STRING, STRING, STRING, BOOL, BOOL, BOOL, INT, BOOL, BOOL, BOOL, BOOL, BOOL, BOOL, BOOL, BOOL)
217
* STRING suball(STRING, STRING, STRING, STRING, BOOL, BOOL, BOOL, INT, BOOL, BOOL, BOOL, BOOL, BOOL, BOOL, BOOL, BOOL)
218 219 220 221 222 223 224 225
* STRING version()

.. _obj_regex:

Object regex
============


Geoff Simmons's avatar
Geoff Simmons committed
226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244
Prototype
  new OBJECT = re2.regex(STRING pattern [, <regex options>])

Description
  Create a regex object from ``pattern`` and the given options (or option
  defaults). If the pattern is invalid, then VCL will fail to load and the VCC
  compiler will emit an error message.

Example::

  sub vcl_init {
      new domainmatcher = re2.regex("^www\.([^.]+)\.com$");
      new maxagematcher = re2.regex("max-age\s*=\s*(\d+)");

      # Group possible subdomains without capturing
      new submatcher = re2.regex("^www\.(domain1|domain2)\.com$",
	                         never_capture=true);
  }

245 246 247 248 249 250 251 252
.. _func_regex.match:

BOOL regex.match(STRING)
------------------------

Prototype
	BOOL regex.match(STRING)

Geoff Simmons's avatar
Geoff Simmons committed
253 254 255 256 257 258 259 260 261 262
Description
  Returns ``true`` if and only if the compiled regex matches the given
  string; corresponds to VCL's infix operator ``~``.

Example::

  if (myregex.match(req.http.Host)) {
     call do_on_match;
  }

263 264 265 266 267 268 269 270
.. _func_regex.backref:

STRING regex.backref(INT, STRING)
---------------------------------

Prototype
	STRING regex.backref(INT ref, STRING fallback)

Geoff Simmons's avatar
Geoff Simmons committed
271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323
Description
  Returns the `nth` captured subexpression from the most recent successful
  call of the ``.match()`` method for this object in the same client or backend,
  context, or a fallback string in case the capture fails. Backref 0 indicates
  the entire matched string. Thus this function behaves like the ``\n`` in the
  native VCL functions ``regsub`` and ``regsuball``, and the ``$1``, ``$2`` ...
  variables in Perl.

  Since Varnish client and backend operations run in different threads,
  ``.backref()`` can only refer back to a ``.match()`` call in the same
  thread. Thus a ``.backref()`` call in any of the ``vcl_backend_*``
  subroutines -- the backend context -- refers back to a previous ``.match()``
  in any of those same subroutines; and a call in any of the other VCL
  subroutines -- the client context -- refers back to a ``.match()`` in the
  same client context.

  After unsuccessful matches, the ``fallback`` string is returned for any call
  to ``.backref()``. The default value of ``fallback`` is ``"**BACKREF METHOD
  FAILED**"``. ``.backref()`` always fails after a failed match, even if
  ``.match()`` had been called successfully before the failure.

  ``.backref()`` may also return ``fallback`` after a successful match, if
  no captured group in the matching string corresponds to the backref number.
  For example, when the pattern ``(a|(b))c`` matches the string ``ac``, there
  is no backref 2, since nothing matches ``b`` in the string.

  The VCL infix operators ``~`` and ``!~`` do not affect this method, nor do
  the functions ``regsub`` or ``regsuball``. Nor is it affected by the matches
  performed by any other method or function in this VMOD (such as the ``sub()``,
  ``suball()`` or ``extract()`` methods or functions, or the ``set`` object's
  ``.match()`` method).

  ``.backref()`` fails, returning ``fallback`` and writing an error
  message to the Varnish log with the ``VCL_Error`` tag, under the
  following conditions (even if a previous match was successful and a
  substring could have been captured):

* The ``fallback`` string is undefined, for example if set from an unset
  header variable.
* The ``never_capture`` option was set to ``true`` for this object. In this
  case, even ``.backref(0)`` fails after a successful match (otherwise, backref
  0 always returns the full matched string).
* ``ref`` (the backref number) is out of range, i.e. it is larger than the
  highest number for a capturing group in the pattern.
* ``.match()`` was never called for this object prior to calling ``.backref()``.
* There is insufficient workspace for the string to be returned.

Example::

  if (domainmatcher.match(req.http.Host)) {
     set req.http.X-Domain = domainmatcher.backref(1);
  }

324
.. _func_regex.namedref:
Geoff Simmons's avatar
Geoff Simmons committed
325

326 327
STRING regex.namedref(STRING, STRING)
-------------------------------------
Geoff Simmons's avatar
Geoff Simmons committed
328 329

Prototype
330
	STRING regex.namedref(STRING name, STRING fallback)
Geoff Simmons's avatar
Geoff Simmons committed
331

Geoff Simmons's avatar
Geoff Simmons committed
332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373
Description
  Returns the captured subexpression designated by ``name`` from the most
  recent successful call to ``.match()`` in the current context (client or
  backend), or ``fallback`` in case of failure.

  Named capturing groups are written in RE2 as: ``(?P<name>re)``. (Note that
  this syntax with ``P``, inspired by Python, differs from the notation for
  named capturing groups in PCRE.) Thus when ``(?P<foo>.+)bar$`` matches
  ``bazbar``, then ``.namedref("foo")`` returns ``baz``.

  Note that a named capturing group can also be referenced as a numbered group.
  So in the previous example, ``.backref(1)`` also returns ``baz``.

  ``fallback`` is returned when ``.namedref()`` is called after an
  unsuccessful match. The default fallback is ``"**NAMEDREF METHOD FAILED**"``.

  Like ``.backref()``, ``.namedref()`` is not affected by native VCL regex
  operations, nor by any other matches performed by methods or functions of
  the VMOD, except for a prior ``.match()`` for the same object.

  ``.namedref()`` fails, returning ``fallback`` and logging a ``VCL_Error``
  message, if:

* The ``fallback`` string is undefined.
* ``name`` is undefined or the empty string.
* The ``never_capture`` option was set to ``true``.
* There is no such named group.
* ``.match()`` was not called for this object.
* There is insufficient workspace for the string to be returned.

Example::

  sub vcl_init {
  	new domainmatcher = re2.regex("^www\.(?P<domain>[^.]+)\.com$");
  }

  sub vcl_recv {
  	if (domainmatcher.match(req.http.Host)) {
  	   set req.http.X-Domain = domainmatcher.namedref("domain");
	}
  }

374 375 376 377 378 379 380 381
.. _func_regex.sub:

STRING regex.sub(STRING, STRING, STRING)
----------------------------------------

Prototype
	STRING regex.sub(STRING text, STRING rewrite, STRING fallback)

Geoff Simmons's avatar
Geoff Simmons committed
382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410
Description
  If the compiled pattern for this regex object matches ``text``, then
  return the result of replacing the first match in ``text`` with ``rewrite``.
  Within ``rewrite``, ``\1`` through ``\9`` can be used to insert the
  the numbered capturing group from the pattern, and ``\0`` to insert the
  entire matching text. This method corresponds to the VCL native function
  ``regsub()``.

  ``fallback`` is returned if the pattern does not match ``text``. The default
  fallback is ``"**SUB METHOD FAILED**"``.

  ``.sub()`` fails, returning ``fallback`` and logging a ``VCL_Error`` message,
  if:

* Any of ``text``, ``rewrite`` or ``fallback`` are undefined.
* There is insufficient workspace for the rewritten string.

Example::

  sub vcl_init {
      new bmatcher = re2.regex("b+");
  }

  sub vcl_recv {
      # If Host contains "www.yabba.dabba.doo.com", then this will
      # set X-Yada to "www.yada.dabba.doo.com".
      set req.http.X-Yada = bmatcher.sub(req.http.Host, "d");
  }

411 412 413 414 415 416 417 418
.. _func_regex.suball:

STRING regex.suball(STRING, STRING, STRING)
-------------------------------------------

Prototype
	STRING regex.suball(STRING text, STRING rewrite, STRING fallback)

Geoff Simmons's avatar
Geoff Simmons committed
419 420 421 422
Description
  Like ``.sub()``, except that all successive non-overlapping matches in
  ``text`` are replaced with ``rewrite``. This method corresponds to VCL
  native ``regsuball()``.
423

Geoff Simmons's avatar
Geoff Simmons committed
424 425
  The default fallback is ``"**SUBALL METHOD FAILED**"``. ``.suball()``
  fails under the same conditions as ``.sub()``.
426

Geoff Simmons's avatar
Geoff Simmons committed
427 428
  Since only non-overlapping matches are substituted, replacing ``"ana"``
  within ``"banana"`` only results in one substitution, not two.
429

Geoff Simmons's avatar
Geoff Simmons committed
430
Example::
431

Geoff Simmons's avatar
Geoff Simmons committed
432 433 434
  sub vcl_init {
      new bmatcher = re2.regex("b+");
  }
435

Geoff Simmons's avatar
Geoff Simmons committed
436 437 438 439 440
  sub vcl_recv {
      # If Host contains "www.yabba.dabba.doo.com", then set X-Yada to
      # "www.yada.dada.doo.com".
      set req.http.X-Yada = bmatcher.suball(req.http.Host, "d");
  }
441

Geoff Simmons's avatar
Geoff Simmons committed
442
.. _func_regex.extract:
443

Geoff Simmons's avatar
Geoff Simmons committed
444 445
STRING regex.extract(STRING, STRING, STRING)
--------------------------------------------
446 447

Prototype
Geoff Simmons's avatar
Geoff Simmons committed
448
	STRING regex.extract(STRING text, STRING rewrite, STRING fallback)
449

Geoff Simmons's avatar
Geoff Simmons committed
450 451 452 453
Description
  If the compiled pattern for this regex object matches ``text``, then
  return ``rewrite`` with substitutions from the matching portions of
  ``text``. Non-matching substrings of ``text`` are ignored.
454

Geoff Simmons's avatar
Geoff Simmons committed
455 456
  The default fallback is ``"**EXTRACT METHOD FAILED**"``. Like ``.sub()``
  and ``.suball()``, ``.extract()`` fails if:
457

Geoff Simmons's avatar
Geoff Simmons committed
458 459
* Any of ``text``, ``rewrite`` or ``fallback`` are undefined.
* There is insufficient workspace for the rewritten string.
460

Geoff Simmons's avatar
Geoff Simmons committed
461
Example::
462

Geoff Simmons's avatar
Geoff Simmons committed
463 464 465
	sub vcl_init {
	    new email = re2.regex("(.*)@([^.]*)");
	}
466

Geoff Simmons's avatar
Geoff Simmons committed
467 468 469 470 471 472 473
	sub vcl_deliver {
	    # Sets X-UUCP to "kremvax!boris"
	    set resp.http.X-UUCP = email.extract("boris@kremvax.ru", "\2!\1");
	}

regex functional interface
==========================
474

475 476 477 478 479 480 481 482
.. _func_match:

BOOL match(PRIV_TASK, STRING, STRING, BOOL, BOOL, BOOL, INT, BOOL, BOOL, BOOL, BOOL, BOOL, BOOL, BOOL, BOOL)
------------------------------------------------------------------------------------------------------------

Prototype
	BOOL match(PRIV_TASK, STRING pattern, STRING subject, BOOL utf8, BOOL posix_syntax, BOOL longest_match, INT max_mem, BOOL literal, BOOL never_nl, BOOL dot_nl, BOOL never_capture, BOOL case_sensitive, BOOL perl_classes, BOOL word_boundary, BOOL one_line)

Geoff Simmons's avatar
Geoff Simmons committed
483 484 485 486 487 488 489 490 491 492 493 494 495 496 497
Description
  Like the ``regex.match()`` method, return ``true`` if  ``pattern`` matches
  ``subject``, where ``pattern`` is compiled with the given options (or default
  options) on each invocation.

  If ``pattern`` fails to compile, then an error message is logged with
  the ``VCL_Error`` tag, and ``false`` is returned.

Example::

  # Match the bereq Host header against a backend response header
  if (re2.match(pattern=bereq.http.Host, subject=beresp.http.X-Host)) {
     call do_on_match;
  }

498 499 500 501 502 503 504 505
.. _func_backref:

STRING backref(PRIV_TASK, INT, STRING)
--------------------------------------

Prototype
	STRING backref(PRIV_TASK, INT ref, STRING fallback)

Geoff Simmons's avatar
Geoff Simmons committed
506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536
Description
  Returns the `nth` captured subexpression from the most recent successful
  call of the ``match()`` function in the current client or backend context,
  or a fallback string if the capture fails. The default ``fallback`` is
  ``"**BACKREF FUNCTION FAILED**"``.

  Similarly to the ``regex.backref()`` method, ``fallback`` is returned
  after any failed invocation of the ``match()`` function, or if there
  is no captured group corresponding to the backref number. The function
  is not affected by native VCL regex operations, or any other method or
  function of the VMOD except for the ``match()`` function.

  The function fails, returning ``fallback`` and logging a ``VCL_Error``
  message, under the same conditions as the corresponding method:

* ``fallback`` is undefined.
* ``never_capture`` was true in the previous invocation of the ``match()``
  function.
* ``ref`` is out of range.
* The ``match()`` function was never called in this context.
* The pattern failed to compile for the previous ``match()`` call.
* There is insufficient workspace for the captured subexpression.

Example::

  # Match against a pattern provided in a beresp header, and capture
  # subexpression 1.
  if (re2.match(pattern=beresp.http.X-Pattern, bereq.http.X-Foo)) {
     set beresp.http.X-Capture = re2.backref(1);
  }

537 538 539 540 541 542 543 544
.. _func_namedref:

STRING namedref(PRIV_TASK, STRING, STRING)
------------------------------------------

Prototype
	STRING namedref(PRIV_TASK, STRING name, STRING fallback)

Geoff Simmons's avatar
Geoff Simmons committed
545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570
Description
  Returns the captured subexpression designated by ``name`` from the most
  recent successful call to the ``match()`` function in the current context, or
  ``fallback`` in case of failure. The default fallback is ``"**NAMEDREF
  FUNCTION FAILED**"``.

  The function returns ``fallback`` when the previous invocation of the
  ``match()`` function failed, and is only affected by use of the ``match()``
  function. The function fails, returning ``fallback`` and logging a
  ``VCL_Error`` message, under the same conditions as the corresponding
  method:

* ``fallback`` is undefined.
* ``name`` is undefined or the empty string.
* The ``never_capture`` option was set to ``true``.
* There is no such named group.
* ``match()`` was not called in this context.
* The pattern failed to compile for the previous ``match()`` call.
* There is insufficient workspace for the captured expression.

Example::

  if (re2.match(beresp.http.X-Pattern-With-Names, bereq.http.X-Foo)) {
     set beresp.http.X-Capture = re2.namedref("foo");
  }

Geoff Simmons's avatar
Geoff Simmons committed
571 572 573 574 575 576 577 578
.. _func_sub:

STRING sub(STRING, STRING, STRING, STRING, BOOL, BOOL, BOOL, INT, BOOL, BOOL, BOOL, BOOL, BOOL, BOOL, BOOL, BOOL)
-----------------------------------------------------------------------------------------------------------------

Prototype
	STRING sub(STRING pattern, STRING text, STRING rewrite, STRING fallback, BOOL utf8, BOOL posix_syntax, BOOL longest_match, INT max_mem, BOOL literal, BOOL never_nl, BOOL dot_nl, BOOL never_capture, BOOL case_sensitive, BOOL perl_classes, BOOL word_boundary, BOOL one_line)

Geoff Simmons's avatar
Geoff Simmons committed
579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603
Description
  Compiles ``pattern`` with the given options, and if it matches ``text``,
  then return the result of replacing the first match in ``text`` with
  ``rewrite``. As with the ``regex.sub()`` method, ``\0`` through ``\9``
  may be used in ``rewrite`` to substitute captured groups from the
  pattern.

  ``fallback`` is returned if the pattern does not match ``text``. The default
  fallback is ``"**SUB FUNCTION FAILED**"``.

  ``sub()`` fails, returning ``fallback`` and logging a ``VCL_Error`` message,
  if:

* ``pattern`` cannot be compiled.
* Any of ``text``, ``rewrite`` or ``fallback`` are undefined.
* There is insufficient workspace for the rewritten string.

Example::

  # If the beresp header X-Sub-Letters contains "b+", and Host contains
  # "www.yabba.dabba.doo.com", then set X-Yada to
  # "www.yada.dabba.doo.com".
  set beresp.http.X-Yada = re2.sub(beresp.http.X-Sub-Letters,
                                   bereq.http.Host, "d");

604 605 606 607 608 609 610 611
.. _func_suball:

STRING suball(STRING, STRING, STRING, STRING, BOOL, BOOL, BOOL, INT, BOOL, BOOL, BOOL, BOOL, BOOL, BOOL, BOOL, BOOL)
--------------------------------------------------------------------------------------------------------------------

Prototype
	STRING suball(STRING pattern, STRING text, STRING rewrite, STRING fallback, BOOL utf8, BOOL posix_syntax, BOOL longest_match, INT max_mem, BOOL literal, BOOL never_nl, BOOL dot_nl, BOOL never_capture, BOOL case_sensitive, BOOL perl_classes, BOOL word_boundary, BOOL one_line)

Geoff Simmons's avatar
Geoff Simmons committed
612 613 614 615 616 617 618 619 620 621 622 623 624 625 626
Description
  Like the ``sub()`` function, except that all successive non-overlapping
  matches in ``text`` are replace with ``rewrite``.

  The default fallback is ``"**SUBALL FUNCTION FAILED**"``. The ``suball()``
  function fails under the same conditions as ``sub()``.

Example::

  # If the beresp header X-Sub-Letters contains "b+", and Host contains
  # "www.yabba.dabba.doo.com", then set X-Yada to
  # "www.yada.dada.doo.com".
  set beresp.http.X-Yada = re2.suball(beresp.http.X-Sub-Letters,
                                      bereq.http.Host, "d");

627 628 629 630 631 632 633 634
.. _func_extract:

STRING extract(STRING, STRING, STRING, STRING, BOOL, BOOL, BOOL, INT, BOOL, BOOL, BOOL, BOOL, BOOL, BOOL, BOOL, BOOL)
---------------------------------------------------------------------------------------------------------------------

Prototype
	STRING extract(STRING pattern, STRING text, STRING rewrite, STRING fallback, BOOL utf8, BOOL posix_syntax, BOOL longest_match, INT max_mem, BOOL literal, BOOL never_nl, BOOL dot_nl, BOOL never_capture, BOOL case_sensitive, BOOL perl_classes, BOOL word_boundary, BOOL one_line)

Geoff Simmons's avatar
Geoff Simmons committed
635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778
Description
  Compiles ``pattern`` with the given options, and if it matches ``text``,
  then return ``rewrite`` with substitutions from the matching portions of
  ``text``, ignoring the non-matching portions.

  The default fallback is ``"**EXTRACT FUNCTION FAILED**"``. The ``extract()``
  function fails under the same conditions as ``sub()`` and ``suball()``.

Example::

  # If beresp header X-Params contains "(foo|bar)=(baz|quux)", and the
  # URL contains "bar=quux", then set X-Query to "bar:quux".
  set beresp.http.X-Query = re2.extract(beresp.http.X-Params, bereq.url,
                                        "\1:\2");

.. _obj_set:

Object set
==========


Prototype
  new OBJECT = re2.set([ENUM anchor] [, <regex options>])

Description
  Initialize a set object that represents several patterns combined by
  alternation -- ``|`` for "or".

  Optional parameters control the interpretation of the resulting composed
  pattern. The ``anchor`` parameter is an enum that can have the values
  ``none``, ``start`` or ``both``, where ``none`` is the default. ``start``
  means that each pattern is matched as if it begins with ``^`` for
  start-of-text, and ``both`` means that each pattern is anchored with both
  ``^`` at the beginning and ``$`` for end-of-text at the end. ``none`` means
  that each pattern is interpreted as a partial match (although individual
  patterns within the set may have either of ``^`` of ``$``).

  For example, if a set is initialized with ``anchor=both``, and the patterns
  ``foo`` and ``bar`` are added, then matches against the set match a string
  against ``^foo$|^bar$``, or equivalently ``^(foo|bar)$``.

  The usual regex options can be set, which then control matching against
  the resulting composed pattern. However, the ``never_capture`` option
  cannot be set, and is always implicitly true, since backrefs and
  namedrefs are not possible with sets.

Example::

  sub vcl_init {
        # Initialize a regex set for partial matches
	# with default options
  	new foo = re2.set();

        # Initialize a regex set for case insensitive matches
	# with anchors on both ends (^ and $).
  	new bar = re2.set(anchor=both, case_sensitive=false);

        # Initialize a regex set using POSIX syntax, but allowing
	# Perl character classes, and anchoring at the left (^).
  	new baz = re2.set(anchor=start, posix_syntax=true,
	                  perl_classes=true);
  }

.. _func_set.add:

VOID set.add(STRING)
--------------------

Prototype
	VOID set.add(STRING)

Description
  Add the given pattern to the set. If the pattern is invalid, ``.add()``
  fails, and the VCL will fail to load, with an error message describing
  the problem.

  ``.add()`` MUST be called in ``vcl_init``, and MAY NOT be called after
  ``.compile()``.  If ``.add()`` is called in any other subroutine, an
  error message with ``VCL_Error`` is logged, and the call has no effect.
  If it is called in ``vcl_init`` after ``.compile()``, then the VCL load
  will fail with an error message.

  In other words, add all patterns to the set in ``vcl_init``, and finally
  call ``.compile()`` when you're done.

Example::

  sub vcl_init {
      # literal=true means that the dots are interpreted as literal
      # dots, not "match any character".
      new hostmatcher = re2.set(anchor=both, case_sensitive=false,
                                literal=true);
      hostmatcher.add("www.domain1.com");
      hostmatcher.add("www.domain2.com");
      hostmatcher.add("www.domain3.com");
      hostmatcher.compile();
  }

.. _func_set.compile:

VOID set.compile()
------------------

Prototype
	VOID set.compile()

Description
  Compile the compound pattern represented by the set -- an alternation of
  all patterns added by ``.add()``.

  ``.compile()`` may fail if the ``max_mem`` setting is not large enough
  for the composed pattern. In that case, the VCL load will fail with an
  error message (then consider a larger value for ``max_mem`` in the set
  constructor).

  ``.compile()`` MUST be called in ``vcl_init``, and MAY NOT be called
  more than once for a set object. If it is called in any other subroutine,
  a ``VCL_Error`` message is logged, and the call has no effect. If it
  is called a second time in ``vcl_init``, the VCL load will fail.

  See above for examples.

.. _func_set.match:

BOOL set.match(STRING)
----------------------

Prototype
	BOOL set.match(STRING)

Description
  Returns ``true`` if the given string matches the compound pattern
  represented by the set, i.e. if it matches any of the patterns that
  were added to the set.

  ``.match()`` MUST be called after ``.compile()``; otherwise the
  match always fails.

Example::

  if (hostmatcher.match(req.http.Host)) {
     call do_when_a_host_matched;
  }

779 780 781 782 783 784 785
.. _func_version:

STRING version()
----------------

Prototype
	STRING version()
Geoff Simmons's avatar
Geoff Simmons committed
786 787 788 789 790 791 792 793 794 795 796

Description
  Return the version string for this VMOD.

Example::

  std.log("Using VMOD re2 version: " + re2.version());

REQUIREMENTS
============

797
The VMOD requires Varnish 4.1.2 through 4.1.6.
798 799

It requires the RE2 library, and has been tested against RE2 versions
800
2015-05-01 through 2017-06-01.
Geoff Simmons's avatar
Geoff Simmons committed
801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 911 912

INSTALLATION
============

The VMOD is built against a Varnish installation, and the autotools
use ``pkg-config(1)`` to locate the necessary header files and other
resources for both Varnish and RE2. This sequence will install the VMOD::

  > ./autogen.sh	# for builds from the git repo
  > ./configure
  > make
  > make check		# to run unit tests in src/tests/*.vtc
  > make distcheck	# run check and prepare a distribution tarball
  > sudo make install

If you have installed Varnish and/or RE2 in non-standard directories,
call ``autogen.sh`` and ``configure`` with the ``PKG_CONFIG_PATH``
environment variable set to include the paths where the ``.pc`` files
can be located for ``varnishapi`` and ``re2``. For example, when
varnishd configure was called with ``--prefix=$PREFIX``, use::

  > PKG_CONFIG_PATH=${PREFIX}/lib/pkgconfig
  > export PKG_CONFIG_PATH

By default, the vmod ``configure`` script installs the vmod in
the same directory as Varnish, determined via ``pkg-config(1)``. The
vmod installation directory can be overridden by passing the
``VMOD_DIR`` variable to ``configure``.

Other files such as this man-page are installed in the locations
determined by ``configure``, which inherits its default ``--prefix``
setting from Varnish.

For developers
--------------

The VMOD source code is in C and C++, since the RE2 API is
C++. Compilation has been tested with gcc/g++ and clang.

The build specifies C99 conformance for C sources (``-std=c99``), and
C++11 for C++ (``-std=c++11``). For both, all compiler warnings are
turned on, and all warnings are considered errors (``-Werror -Wall``).
The code should always build without warnings or errors under these
constraints.

By default, ``CFLAGS`` and ``CXXFLAGS`` are set to ``-g -O2``, so that
symbols are included in the shared library, and optimization is at
level ``O2``. To change or disable these options, set ``CFLAGS``
and/or ``CXXFLAGS`` explicitly before calling ``configure`` (they may
be set to the empty string).

For development/debugging cycles, the ``configure`` option
``--enable-debugging`` is recommended (off by default). This will turn
off optimizations and function inlining, so that a debugger will step
through the code as expected.

By default, the VMOD is built with the stack protector enabled
(compile option ``-fstack-protector``), but it can be disabled with
the ``configure`` option ``--disable-stack-protector``.

LIMITATIONS
===========

The VMOD allocates Varnish workspace for captured groups and rewritten
strings. If operations fail with "insufficient workspace" error
messages in the Varnish log (with the ``VCL_Error`` tag), increase the
varnishd runtime parameters ``workspace_client`` and/or
``workspace_backend``.

The RE2 documentation states that successful matches are slowed quite
a bit when they also capture substrings. There is also additional
overhead from the VMOD, unless the ``never_capture`` flag is true, to
manage data about captured groups in the workspace. This overhead is
incurred even if there are no capturing expressions in a pattern,
since it is always possible to call ``backref(0)`` to obtain the
matched portion of a string.

So if you are using a pattern only to match against strings, and never
to capture subexpressions, consider setting the ``never_capture``
option to true, to eliminate the extra work for both RE2 and the VMOD.

AUTHOR
======

* Geoffrey Simmons <geoff@uplex.de>

UPLEX Nils Goroll Systemoptimierung

HISTORY
=======

* version 0.1: initial version

SEE ALSO
========

* varnishd(1)
* vcl(7)
* RE2 git repo: https://github.com/google/re2
* RE2 syntax: https://github.com/google/re2/wiki/Syntax
* "Implementing Regular Expressions": https://swtch.com/~rsc/regexp/

  * Series of articles motivating the design of RE2, with discussion
    of how RE2 compares with PCRE

COPYRIGHT
=========

This document is licensed under the same conditions as the libvmod-re2
project. See LICENSE for details.

* Copyright (c) 2016 UPLEX Nils Goroll Systemoptimierung
Geoff Simmons's avatar
Geoff Simmons committed
913