README.rst 61.8 KB
Newer Older
1 2 3 4 5 6 7 8 9 10 11 12 13 14
..
.. NB:  This file is machine generated, DO NOT EDIT!
..
.. Edit vmod.vcc and run make instead
..

.. role:: ref(emphasis)

.. _vmod_re2(3):

========
vmod_re2
========

15 16 17
-----------------------------------------------------------------------
"Varnish Module for access to the Google RE2 regular expression engine"
-----------------------------------------------------------------------
18 19 20 21 22 23

:Manual section: 3




Geoff Simmons's avatar
Geoff Simmons committed
24

25 26 27 28




Geoff Simmons's avatar
Geoff Simmons committed
29 30 31
SYNOPSIS
========

Geoff Simmons's avatar
Geoff Simmons committed
32 33
::

Geoff Simmons's avatar
Geoff Simmons committed
34 35
  import re2;

Geoff Simmons's avatar
Geoff Simmons committed
36 37 38 39 40 41 42 43
  # regex object interface
  new OBJECT = re2.regex(STRING pattern [, <regex options>])
  BOOL <obj>.match(STRING)
  STRING <obj>.backref(INT ref)
  STRING <obj>.namedref(STRING name)
  STRING <obj>.sub(STRING text, STRING rewrite)
  STRING <obj>.suball(STRING text, STRING rewrite)
  STRING <obj>.extract(STRING text, STRING rewrite)
Geoff Simmons's avatar
Geoff Simmons committed
44
  INT <obj>.cost()
45
  
Geoff Simmons's avatar
Geoff Simmons committed
46 47 48 49
  # regex function interface
  BOOL re2.match(STRING pattern, STRING subject [, <regex options>])
  STRING re2.backref(INT ref)
  STRING re2.namedref(STRING name)
50 51 52 53 54 55
  STRING re2.sub(STRING pattern, STRING text, STRING rewrite
                 [, <regex options>])
  STRING re2.suball(STRING pattern, STRING text, STRING rewrite
                    [, <regex options>])
  STRING re2.extract(STRING pattern, STRING text, STRING rewrite
                     [, <regex options>])
Geoff Simmons's avatar
Geoff Simmons committed
56
  INT re2.cost(STRING pattern [, <regex options>])
Geoff Simmons's avatar
Geoff Simmons committed
57 58 59

  # set object interface
  new OBJECT = re2.set([ENUM anchor] [, <regex options>])
60 61
  VOID <obj>.add(STRING [, BOOL save] [, BOOL never_capture] [, STRING string]
                 [, BACKEND backend] [, INT integer])
Geoff Simmons's avatar
Geoff Simmons committed
62 63
  VOID <obj>.compile()
  BOOL <obj>.match(STRING)
64 65
  INT <obj>.nmatches()
  BOOL <obj>.matched(INT)
Geoff Simmons's avatar
Geoff Simmons committed
66
  INT <obj>.which([ENUM select])
67 68
  STRING <obj>.string([INT n,] [ENUM select])
  BACKEND <obj>.backend([INT n,] [ENUM select])
69
  INT     <obj>.integer([INT n] [, ENUM select])
70 71 72 73
  STRING <obj>.sub(STRING text, STRING rewrite [, INT n]
                   [, ENUM select])
  STRING <obj>.suball(STRING text, STRING rewrite [, INT n]
                      [, ENUM select])
74 75
  STRING <obj>.extract(STRING text, STRING rewrite [, INT n]
                       [, ENUM select])
76
  BOOL <obj>.saved([ENUM {REGEX, STR, BE, INT} which] [, INT n]
77
                   [, ENUM select])
78

79 80 81
  # utility function
  STRING re2.quotemeta(STRING)

82 83
  # VMOD version
  STRING re2.version()
Geoff Simmons's avatar
Geoff Simmons committed
84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148

DESCRIPTION
===========

Varnish Module (VMOD) for access to the Google RE2 regular expression engine.

Varnish VCL uses the PCRE library (Perl Compatible Regular Expressions) for
its native regular expressions, which runs very efficiently for many common
uses of pattern matching in VCL, as attested by years of successful use of
PCRE with Varnish.

But for certain kinds of patterns, the worst-case running time of the PCRE
matcher is exponential in the length of the string to be matched. The
matcher uses backtracking, implemented with recursive calls to the internal
``match()`` function. In principle there is no upper bound to the possible
depth of backtracking and recursion, except as imposed by the ``varnishd``
runtime parameters ``pcre_match_limit`` and ``pcre_match_limit_recursion``;
matches fail if either of these limits are met. Stack overflow caused by
deep backtracking has occasionally been the subject of ``varnishd`` issues.

RE2 differs from PCRE in that it limits the syntax of patterns so that they
always specify a regular language in the formally strict sense. Most notably,
backreferences within a pattern are not permitted, for example ``(foo|bar)\1``
to match ``foofoo`` and ``barbar``, but not ``foobar`` or ``barfoo``. See the
link in ``SEE ALSO`` for the specification of RE2 syntax.

This means that an RE2 matcher runs as a finite automaton, which guarantees
linear running time in the length of the matched string. There is no
backtracking, and hence no risk of deep recursion or stack overflow.

The relative advantages and disadvantages of RE2 and PCRE is a broad subject,
beyond the scope of this manual. See the references in ``SEE ALSO`` for more
in-depth discussion.

regex object and function interfaces
------------------------------------

The VMOD provides regular expression operations by way of the ``regex`` object
interface and a functional interface. For ``regex`` objects, the pattern is
compiled at VCL initialization time, and the compiled pattern is re-used for
each invocation of its methods. Compilation failures (due to errors in the
pattern) cause failure at initialization time, and the VCL fails to load. The
``.backref()`` and ``.namedref()`` methods refer back to the last invocation
of the ``.match()`` method for the same object.

The functional interface provides the same set of operations, but the pattern
is compiled at runtime on each invocation (and then discarded). Compilation
failures are reported as errors in the Varnish log. The ``backref()`` and
``namedref()`` functions refer back to the last invocation of the ``match()``
function, for any pattern.

Compiling a pattern at runtime on each invocation is considerably more costly
than re-using a compiled pattern. So for patterns that are fixed and known
at VCL initialization, the object interface should be used. The functional
interface should only be used for patterns whose contents are not known until
runtime.

set object interface
--------------------

``set`` objects provide a shorthand for constructing patterns that consist of
an alternation -- a group of patterns combined with ``|`` for "or". For
example::

  import re2;
149
  
Geoff Simmons's avatar
Geoff Simmons committed
150
  sub vcl_init {
Geoff Simmons's avatar
Geoff Simmons committed
151
        new myset = re2.set();
152 153 154
	myset.add("foo");	# Pattern 1
	myset.add("bar");	# Pattern 2
	myset.add("baz");	# Pattern 3
Geoff Simmons's avatar
Geoff Simmons committed
155 156 157
	myset.compile();
  }

158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180
``myset.match(<string>)`` can now be used to match a string against
the pattern ``foo|bar|baz``. When a match is successful, the matcher
has determined all of the patterns that matched. These can then be
retrieved with the method ``.nmatches()`` for the number of matched
patterns, and with ``.matched(n)``, which returns ``true`` if the
``nth`` pattern matched, where the patterns are numbered in the order
in which they were added::

  if (myset.match("foobar")) {
      std.log("Matched " + myset.nmatches() + " patterns");
      if (myset.matched(1)) {
          # Pattern /foo/ matched
          call do_foo;
      }
      if (myset.matched(2)) {
          # Pattern /bar/ matched
          call do_bar;
      }
      if (myset.matched(3)) {
          # Pattern /baz/ matched
          call do_baz;
      }
  }
Geoff Simmons's avatar
Geoff Simmons committed
181

182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253
An advantage of alternations and sets with RE2, as opposed to an
alternation in PCRE or a series of separate matches in an
if-elsif-elsif sequence, comes from the fact that the matcher is
implemented as a state machine. That means that the matcher progresses
through the string to be matched just once, following patterns in the
set that match through the state machine, or determining that there is
no match as soon as there are no more possible paths in the state
machine. So a string can be matched against a large set of patterns in
time that is proportional to the length of the string to be
matched. In contrast, PCRE matches patterns in an alternation one
after another, stopping after the first matching pattern, or
attempting matches against all of them if there is no match. Thus a
match against an alternation in PCRE is not unlike an if-elsif-elsif
sequence of individual matches, and requires the time needed for each
individual match, overall in proportion with the number of patterns to
be matched.

Another advantage of the VMOD's set object is the ability to associate
strings or backends with the patterns added to the set with the
``.add()`` method::

  sub vcl_init {
	new prefix = re2.set(anchor=start);
	prefix.add("/foo", string="www.domain1.com");
	prefix.add("/bar", string="www.domain2.com");
	prefix.add("/baz", string="www.domain3.com");
	prefix.add("/quux", string="www.domain4.com");
	prefix.compile();

	new appmatcher = re2.set(anchor=start);
	appmatcher.add("/foo", backend=app1);
	appmatcher.add("/bar", backend=app2);
	appmatcher.add("/baz", backend=app3);
	appmatcher.add("/quux", backend=app4);
	appmatcher.compile();
  }

After a successful match, the string or backend associated with the
matching pattern can be retrieved with the ``.string()`` and
``.backend()`` methods. This makes it possible, for example, to
construct a redirect response or choose the backend with code that is
both efficient and compact, even with a large set of patterns to be
matched::

  # Use the prefix object to construct a redirect response from
  # a matching request URL.
  sub vcl_recv {
      if (prefix.match(req.url)) {
          # Pass the string associated with the matching pattern
          # to vcl_synth.
          return(synth(1301, prefix.string()));
      }
  }

  sub vcl_synth {
      # The string associated with the matching pattern is in
      # resp.reason.
      if (resp.status == 1301) {
          set resp.http.Location = "http://" + resp.reason + req.url;
          set resp.status = 301;
          set resp.reason = "Moved Permanently";
      }
  }

  # Use the appmatcher object to choose a backend based on the
  # request URL prefix.
  sub vcl_recv {
      if (appmatcher.match(req.url)) {
          set req.backend_hint = appmatcher.backend();
      }
  }

Geoff Simmons's avatar
Geoff Simmons committed
254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323
regex options
-------------

Where a pattern is compiled -- in the ``regex`` and ``set`` constructors, and
in functions that require compilation -- options may be specified that can
affect the interpretation of the pattern or the operation of the matcher. There
are default values for each option, and it is only necessary to specify options
in VCL that differ from the defaults. Options specified in a ``set``
constructor apply to all of the patterns in the resulting alternation.

``utf8``
  If true, characters in a pattern match Unicode code points, and hence may
  match more than one byte. If false, the pattern and strings to be matched
  are interpreted as Latin-1 (ISO 8859-1), and a pattern character matches
  exactly one byte. Default is **false**. Note that this differs from the
  RE2 default.
``posix_syntax``
  If true, patterns are restricted to POSIX (egrep) syntax. Otherwise,
  the pattern syntax resembles that of PCRE, with some deviations. See the
  link in ``SEE ALSO`` for the syntax specification. Default is **false**.
  The options ``perl_classes``, ``word_boundary`` and ``one_line`` are
  only consulted when this option is true.
``longest_match``
  If true, the matcher searches for the longest possible match where
  alternatives are possible. Otherwise, search for the first match. For
  example with the pattern ``a(b|bb)`` and the string ``abb``, ``abb``
  matches when ``longest_match`` is true, and backref 1 is ``bb``. Otherwise,
  ``ab`` matches, and backref 1 is ``b``. Default is **false**.
``max_mem``
  An upper bound (in bytes) for the size of the compiled pattern. If ``max_mem``
  is too small, the matcher may fall back to less efficient algorithms, or the
  pattern may fail to compile. Default is the RE2 default (8MB), which should
  suffice for typical patterns.
``literal``
  If true, the pattern is interpreted as a literal string, and no regex
  metacharacters (such as ``*``, ``+``, ``^`` and so forth) have their special
  meaning. Default is **false**.
``never_nl``
  If true, the newline character ``\n`` in a string is never matched, even if it
  appears in the pattern. Default is **false**.
``dot_nl``
  If true, then the dot character ``.`` in a pattern matches everything,
  including newline. Otherwise, ``.`` never matches newline. Default is
  **false**.
``never_capture``
  If true, parentheses in a pattern are interpreted as non-capturing, and all
  invocations of the ``backref`` and ``namedref`` methods or functions will
  fail, including ``backref(0)`` after a successful match. Default is **false**,
  except for set objects, for which ``never_capture`` is always true (and cannot
  be changed), since back references are not possible with sets.
``case_sensitive``
  If true, matches are case-sensitive. A pattern can override this option with
  the ``(?i)`` flag, unless ``posix_syntax`` is true. Default is **true**.

The following options are only consulted when ``posix_syntax`` is true. If
``posix_syntax`` is false, then these features are always enabled and cannot be
turned off.

``perl_classes``
  If true, then the perl character classes ``\d``, ``\s``, ``\w``, ``\D``,
  ``\S`` and ``\W`` are permitted in a pattern. Default is **false**.
``word_boundary``
  If true, the perl assertions ``\b`` and ``\B`` (word boundary and not a word
  boundary) are permitted. Default is **false**.
``one_line``
  If true, then ``^`` and ``$`` only match at the beginning and end of the
  string to be matched, regardless of newlines. Otherwise, ``^`` also matches
  just after a newline, and ``$`` also matches just before a newline. Default is
  **false**.

324 325 326

.. _obj_regex:

Geoff Simmons's avatar
Geoff Simmons committed
327 328
regex(...)
----------
329

330
::
331

332
   new xregex = re2.regex(
Geoff Simmons's avatar
Geoff Simmons committed
333 334 335 336 337 338 339 340 341 342 343 344 345 346
      STRING pattern,
      BOOL utf8=0,
      BOOL posix_syntax=0,
      BOOL longest_match=0,
      INT max_mem=8388608,
      BOOL literal=0,
      BOOL never_nl=0,
      BOOL dot_nl=0,
      BOOL never_capture=0,
      BOOL case_sensitive=1,
      BOOL perl_classes=0,
      BOOL word_boundary=0,
      BOOL one_line=0
   )
Geoff Simmons's avatar
Geoff Simmons committed
347

348 349 350
Create a regex object from ``pattern`` and the given options (or
option defaults). If the pattern is invalid, then VCL will fail to
load and the VCC compiler will emit an error message.
Geoff Simmons's avatar
Geoff Simmons committed
351 352 353 354 355 356 357 358 359 360 361 362

Example::

  sub vcl_init {
      new domainmatcher = re2.regex("^www\.([^.]+)\.com$");
      new maxagematcher = re2.regex("max-age\s*=\s*(\d+)");

      # Group possible subdomains without capturing
      new submatcher = re2.regex("^www\.(domain1|domain2)\.com$",
	                         never_capture=true);
  }

363 364
.. _func_regex.match:

Geoff Simmons's avatar
Geoff Simmons committed
365 366
BOOL xregex.match(STRING)
-------------------------
367

368 369
Returns ``true`` if and only if the compiled regex matches the given
string; corresponds to VCL's infix operator ``~``.
Geoff Simmons's avatar
Geoff Simmons committed
370 371 372 373 374 375 376

Example::

  if (myregex.match(req.http.Host)) {
     call do_on_match;
  }

Geoff Simmons's avatar
Geoff Simmons committed
377

378 379
.. _func_regex.backref:

Geoff Simmons's avatar
Geoff Simmons committed
380 381
STRING xregex.backref(INT ref, STRING fallback)
-----------------------------------------------
382 383 384

::

Geoff Simmons's avatar
Geoff Simmons committed
385 386 387 388
      STRING xregex.backref(
            INT ref,
            STRING fallback="**BACKREF METHOD FAILED**"
      )
389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427

Returns the `nth` captured subexpression from the most recent
successful call of the ``.match()`` method for this object in the same
client or backend, context, or a fallback string in case the capture
fails. Backref 0 indicates the entire matched string. Thus this
function behaves like the ``\n`` in the native VCL functions
``regsub`` and ``regsuball``, and the ``$1``, ``$2`` ... variables in
Perl.

Since Varnish client and backend operations run in different threads,
``.backref()`` can only refer back to a ``.match()`` call in the same
thread. Thus a ``.backref()`` call in any of the ``vcl_backend_*``
subroutines -- the backend context -- refers back to a previous
``.match()`` in any of those same subroutines; and a call in any of
the other VCL subroutines -- the client context -- refers back to a
``.match()`` in the same client context.

After unsuccessful matches, the ``fallback`` string is returned for
any call to ``.backref()``. The default value of ``fallback`` is
``"**BACKREF METHOD FAILED**"``. ``.backref()`` always fails after a
failed match, even if ``.match()`` had been called successfully before
the failure.

``.backref()`` may also return ``fallback`` after a successful match,
if no captured group in the matching string corresponds to the backref
number. For example, when the pattern ``(a|(b))c`` matches the string
``ac``, there is no backref 2, since nothing matches ``b`` in the
string.

The VCL infix operators ``~`` and ``!~`` do not affect this method,
nor do the functions ``regsub`` or ``regsuball``. Nor is it affected
by the matches performed by any other method or function in this VMOD
(such as the ``sub()``, ``suball()`` or ``extract()`` methods or
functions, or the ``set`` object's ``.match()`` method).

``.backref()`` fails, returning ``fallback`` and writing an error
message to the Varnish log with the ``VCL_Error`` tag, under the
following conditions (even if a previous match was successful and a
substring could have been captured):
Geoff Simmons's avatar
Geoff Simmons committed
428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444

* The ``fallback`` string is undefined, for example if set from an unset
  header variable.
* The ``never_capture`` option was set to ``true`` for this object. In this
  case, even ``.backref(0)`` fails after a successful match (otherwise, backref
  0 always returns the full matched string).
* ``ref`` (the backref number) is out of range, i.e. it is larger than the
  highest number for a capturing group in the pattern.
* ``.match()`` was never called for this object prior to calling ``.backref()``.
* There is insufficient workspace for the string to be returned.

Example::

  if (domainmatcher.match(req.http.Host)) {
     set req.http.X-Domain = domainmatcher.backref(1);
  }

Geoff Simmons's avatar
Geoff Simmons committed
445

446
.. _func_regex.namedref:
Geoff Simmons's avatar
Geoff Simmons committed
447

Geoff Simmons's avatar
Geoff Simmons committed
448 449
STRING xregex.namedref(STRING name, STRING fallback)
----------------------------------------------------
450 451

::
Geoff Simmons's avatar
Geoff Simmons committed
452

Geoff Simmons's avatar
Geoff Simmons committed
453 454 455 456
      STRING xregex.namedref(
            STRING name,
            STRING fallback="**NAMEDREF METHOD FAILED**"
      )
Geoff Simmons's avatar
Geoff Simmons committed
457

458 459 460
Returns the captured subexpression designated by ``name`` from the
most recent successful call to ``.match()`` in the current context
(client or backend), or ``fallback`` in case of failure.
Geoff Simmons's avatar
Geoff Simmons committed
461

462 463 464 465 466
Named capturing groups are written in RE2 as: ``(?P<name>re)``. (Note
that this syntax with ``P``, inspired by Python, differs from the
notation for named capturing groups in PCRE.) Thus when
``(?P<foo>.+)bar$`` matches ``bazbar``, then ``.namedref("foo")``
returns ``baz``.
Geoff Simmons's avatar
Geoff Simmons committed
467

468 469 470
Note that a named capturing group can also be referenced as a numbered
group. So in the previous example, ``.backref(1)`` also returns
``baz``.
Geoff Simmons's avatar
Geoff Simmons committed
471

472 473 474
``fallback`` is returned when ``.namedref()`` is called after an
unsuccessful match. The default fallback is ``"**NAMEDREF METHOD
FAILED**"``.
Geoff Simmons's avatar
Geoff Simmons committed
475

476 477 478 479
Like ``.backref()``, ``.namedref()`` is not affected by native VCL
regex operations, nor by any other matches performed by methods or
functions of the VMOD, except for a prior ``.match()`` for the same
object.
Geoff Simmons's avatar
Geoff Simmons committed
480

481 482
``.namedref()`` fails, returning ``fallback`` and logging a
``VCL_Error`` message, if:
Geoff Simmons's avatar
Geoff Simmons committed
483 484 485 486 487 488 489 490 491 492 493 494 495

* The ``fallback`` string is undefined.
* ``name`` is undefined or the empty string.
* The ``never_capture`` option was set to ``true``.
* There is no such named group.
* ``.match()`` was not called for this object.
* There is insufficient workspace for the string to be returned.

Example::

  sub vcl_init {
  	new domainmatcher = re2.regex("^www\.(?P<domain>[^.]+)\.com$");
  }
496

Geoff Simmons's avatar
Geoff Simmons committed
497 498 499 500 501 502
  sub vcl_recv {
  	if (domainmatcher.match(req.http.Host)) {
  	   set req.http.X-Domain = domainmatcher.namedref("domain");
	}
  }

Geoff Simmons's avatar
Geoff Simmons committed
503

504 505
.. _func_regex.sub:

Geoff Simmons's avatar
Geoff Simmons committed
506 507
regex.sub(...)
--------------
508

509 510
::

Geoff Simmons's avatar
Geoff Simmons committed
511 512 513 514 515
      STRING xregex.sub(
            STRING text,
            STRING rewrite,
            STRING fallback="**SUB METHOD FAILED**"
      )
516

517 518 519 520 521 522
If the compiled pattern for this regex object matches ``text``, then
return the result of replacing the first match in ``text`` with
``rewrite``. Within ``rewrite``, ``\1`` through ``\9`` can be used to
insert the the numbered capturing group from the pattern, and ``\0``
to insert the entire matching text. This method corresponds to the VCL
native function ``regsub()``.
Geoff Simmons's avatar
Geoff Simmons committed
523

524 525
``fallback`` is returned if the pattern does not match ``text``. The
default fallback is ``"**SUB METHOD FAILED**"``.
Geoff Simmons's avatar
Geoff Simmons committed
526

527 528
``.sub()`` fails, returning ``fallback`` and logging a ``VCL_Error``
message, if:
Geoff Simmons's avatar
Geoff Simmons committed
529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544

* Any of ``text``, ``rewrite`` or ``fallback`` are undefined.
* There is insufficient workspace for the rewritten string.

Example::

  sub vcl_init {
      new bmatcher = re2.regex("b+");
  }

  sub vcl_recv {
      # If Host contains "www.yabba.dabba.doo.com", then this will
      # set X-Yada to "www.yada.dabba.doo.com".
      set req.http.X-Yada = bmatcher.sub(req.http.Host, "d");
  }

Geoff Simmons's avatar
Geoff Simmons committed
545

546 547
.. _func_regex.suball:

Geoff Simmons's avatar
Geoff Simmons committed
548 549
regex.suball(...)
-----------------
550 551

::
552

Geoff Simmons's avatar
Geoff Simmons committed
553 554 555 556 557
      STRING xregex.suball(
            STRING text,
            STRING rewrite,
            STRING fallback="**SUBALL METHOD FAILED**"
      )
558

559 560 561
Like ``.sub()``, except that all successive non-overlapping matches in
``text`` are replaced with ``rewrite``. This method corresponds to VCL
native ``regsuball()``.
562

563 564
The default fallback is ``"**SUBALL METHOD FAILED**"``. ``.suball()``
fails under the same conditions as ``.sub()``.
565

566 567 568
Since only non-overlapping matches are substituted, replacing
``"ana"`` within ``"banana"`` only results in one substitution, not
two.
569

Geoff Simmons's avatar
Geoff Simmons committed
570
Example::
571

Geoff Simmons's avatar
Geoff Simmons committed
572 573 574
  sub vcl_init {
      new bmatcher = re2.regex("b+");
  }
575

Geoff Simmons's avatar
Geoff Simmons committed
576 577 578 579 580
  sub vcl_recv {
      # If Host contains "www.yabba.dabba.doo.com", then set X-Yada to
      # "www.yada.dada.doo.com".
      set req.http.X-Yada = bmatcher.suball(req.http.Host, "d");
  }
581

Geoff Simmons's avatar
Geoff Simmons committed
582

Geoff Simmons's avatar
Geoff Simmons committed
583
.. _func_regex.extract:
584

Geoff Simmons's avatar
Geoff Simmons committed
585 586
regex.extract(...)
------------------
587 588

::
589

Geoff Simmons's avatar
Geoff Simmons committed
590 591 592 593 594
      STRING xregex.extract(
            STRING text,
            STRING rewrite,
            STRING fallback="**EXTRACT METHOD FAILED**"
      )
595

596 597 598
If the compiled pattern for this regex object matches ``text``, then
return ``rewrite`` with substitutions from the matching portions of
``text``. Non-matching substrings of ``text`` are ignored.
599

600 601
The default fallback is ``"**EXTRACT METHOD FAILED**"``. Like
``.sub()`` and ``.suball()``, ``.extract()`` fails if:
602

Geoff Simmons's avatar
Geoff Simmons committed
603 604
* Any of ``text``, ``rewrite`` or ``fallback`` are undefined.
* There is insufficient workspace for the rewritten string.
605

Geoff Simmons's avatar
Geoff Simmons committed
606
Example::
607

Geoff Simmons's avatar
Geoff Simmons committed
608 609 610
	sub vcl_init {
	    new email = re2.regex("(.*)@([^.]*)");
	}
611

Geoff Simmons's avatar
Geoff Simmons committed
612 613 614 615 616
	sub vcl_deliver {
	    # Sets X-UUCP to "kremvax!boris"
	    set resp.http.X-UUCP = email.extract("boris@kremvax.ru", "\2!\1");
	}

Geoff Simmons's avatar
Geoff Simmons committed
617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633

.. _func_regex.cost:

INT xregex.cost()
-----------------

Return a numeric measurement > 0 for this regex object from the RE2
library.  According to the RE2 documentation:

  ... a very approximate measure of a regexp's "cost". Larger numbers
  are more expensive than smaller numbers.

The absolute numeric values are opaque and not relevant, but they are
meaningful relative to one another -- more complex regexen have a
higher cost than less complex regexen. This may be useful during
development and optimization of regular expressions.

634 635 636 637
Example::

  std.log("r1 cost=" + r1.cost() + " r_alt cost=" + r_alt.cost());

Geoff Simmons's avatar
Geoff Simmons committed
638 639
regex functional interface
==========================
640

Geoff Simmons's avatar
Geoff Simmons committed
641 642 643 644




645 646
.. _func_match:

Geoff Simmons's avatar
Geoff Simmons committed
647 648
match(...)
----------
649

650
::
651

Geoff Simmons's avatar
Geoff Simmons committed
652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667
   BOOL match(
      STRING pattern,
      STRING subject,
      BOOL utf8=0,
      BOOL posix_syntax=0,
      BOOL longest_match=0,
      INT max_mem=8388608,
      BOOL literal=0,
      BOOL never_nl=0,
      BOOL dot_nl=0,
      BOOL never_capture=0,
      BOOL case_sensitive=1,
      BOOL perl_classes=0,
      BOOL word_boundary=0,
      BOOL one_line=0
   )
Geoff Simmons's avatar
Geoff Simmons committed
668

669 670 671 672 673 674
Like the ``regex.match()`` method, return ``true`` if ``pattern``
matches ``subject``, where ``pattern`` is compiled with the given
options (or default options) on each invocation.

If ``pattern`` fails to compile, then an error message is logged with
the ``VCL_Error`` tag, and ``false`` is returned.
Geoff Simmons's avatar
Geoff Simmons committed
675 676 677 678 679 680 681 682

Example::

  # Match the bereq Host header against a backend response header
  if (re2.match(pattern=bereq.http.Host, subject=beresp.http.X-Host)) {
     call do_on_match;
  }

Geoff Simmons's avatar
Geoff Simmons committed
683

684 685
.. _func_backref:

Geoff Simmons's avatar
Geoff Simmons committed
686 687
STRING backref(INT ref, STRING fallback)
----------------------------------------
688 689

::
690

Geoff Simmons's avatar
Geoff Simmons committed
691 692 693 694
   STRING backref(
      INT ref,
      STRING fallback="**BACKREF FUNCTION FAILED**"
   )
695

696 697 698 699
Returns the `nth` captured subexpression from the most recent
successful call of the ``match()`` function in the current client or
backend context, or a fallback string if the capture fails. The
default ``fallback`` is ``"**BACKREF FUNCTION FAILED**"``.
Geoff Simmons's avatar
Geoff Simmons committed
700

701 702 703 704 705
Similarly to the ``regex.backref()`` method, ``fallback`` is returned
after any failed invocation of the ``match()`` function, or if there
is no captured group corresponding to the backref number. The function
is not affected by native VCL regex operations, or any other method or
function of the VMOD except for the ``match()`` function.
Geoff Simmons's avatar
Geoff Simmons committed
706

707 708
The function fails, returning ``fallback`` and logging a ``VCL_Error``
message, under the same conditions as the corresponding method:
Geoff Simmons's avatar
Geoff Simmons committed
709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725

* ``fallback`` is undefined.
* ``never_capture`` was true in the previous invocation of the ``match()``
  function.
* ``ref`` is out of range.
* The ``match()`` function was never called in this context.
* The pattern failed to compile for the previous ``match()`` call.
* There is insufficient workspace for the captured subexpression.

Example::

  # Match against a pattern provided in a beresp header, and capture
  # subexpression 1.
  if (re2.match(pattern=beresp.http.X-Pattern, bereq.http.X-Foo)) {
     set beresp.http.X-Capture = re2.backref(1);
  }

Geoff Simmons's avatar
Geoff Simmons committed
726

727 728
.. _func_namedref:

Geoff Simmons's avatar
Geoff Simmons committed
729 730
STRING namedref(STRING name, STRING fallback)
---------------------------------------------
731

732 733
::

Geoff Simmons's avatar
Geoff Simmons committed
734 735 736 737
   STRING namedref(
      STRING name,
      STRING fallback="**NAMEDREF FUNCTION FAILED**"
   )
738

739 740 741 742
Returns the captured subexpression designated by ``name`` from the
most recent successful call to the ``match()`` function in the current
context, or ``fallback`` in case of failure. The default fallback is
``"**NAMEDREF FUNCTION FAILED**"``.
Geoff Simmons's avatar
Geoff Simmons committed
743

744 745 746 747 748
The function returns ``fallback`` when the previous invocation of the
``match()`` function failed, and is only affected by use of the
``match()`` function. The function fails, returning ``fallback`` and
logging a ``VCL_Error`` message, under the same conditions as the
corresponding method:
Geoff Simmons's avatar
Geoff Simmons committed
749 750 751 752 753 754 755 756 757 758 759 760 761 762 763

* ``fallback`` is undefined.
* ``name`` is undefined or the empty string.
* The ``never_capture`` option was set to ``true``.
* There is no such named group.
* ``match()`` was not called in this context.
* The pattern failed to compile for the previous ``match()`` call.
* There is insufficient workspace for the captured expression.

Example::

  if (re2.match(beresp.http.X-Pattern-With-Names, bereq.http.X-Foo)) {
     set beresp.http.X-Capture = re2.namedref("foo");
  }

Geoff Simmons's avatar
Geoff Simmons committed
764

Geoff Simmons's avatar
Geoff Simmons committed
765 766
.. _func_sub:

Geoff Simmons's avatar
Geoff Simmons committed
767 768
sub(...)
--------
769 770

::
Geoff Simmons's avatar
Geoff Simmons committed
771

Geoff Simmons's avatar
Geoff Simmons committed
772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789
   STRING sub(
      STRING pattern,
      STRING text,
      STRING rewrite,
      STRING fallback="**SUB FUNCTION FAILED**",
      BOOL utf8=0,
      BOOL posix_syntax=0,
      BOOL longest_match=0,
      INT max_mem=8388608,
      BOOL literal=0,
      BOOL never_nl=0,
      BOOL dot_nl=0,
      BOOL never_capture=0,
      BOOL case_sensitive=1,
      BOOL perl_classes=0,
      BOOL word_boundary=0,
      BOOL one_line=0
   )
Geoff Simmons's avatar
Geoff Simmons committed
790

791 792 793 794 795
Compiles ``pattern`` with the given options, and if it matches
``text``, then return the result of replacing the first match in
``text`` with ``rewrite``. As with the ``regex.sub()`` method, ``\0``
through ``\9`` may be used in ``rewrite`` to substitute captured
groups from the pattern.
Geoff Simmons's avatar
Geoff Simmons committed
796

797 798
``fallback`` is returned if the pattern does not match ``text``. The
default fallback is ``"**SUB FUNCTION FAILED**"``.
Geoff Simmons's avatar
Geoff Simmons committed
799

800 801
``sub()`` fails, returning ``fallback`` and logging a ``VCL_Error``
message, if:
Geoff Simmons's avatar
Geoff Simmons committed
802 803 804 805 806 807 808 809 810 811 812 813 814

* ``pattern`` cannot be compiled.
* Any of ``text``, ``rewrite`` or ``fallback`` are undefined.
* There is insufficient workspace for the rewritten string.

Example::

  # If the beresp header X-Sub-Letters contains "b+", and Host contains
  # "www.yabba.dabba.doo.com", then set X-Yada to
  # "www.yada.dabba.doo.com".
  set beresp.http.X-Yada = re2.sub(beresp.http.X-Sub-Letters,
                                   bereq.http.Host, "d");

Geoff Simmons's avatar
Geoff Simmons committed
815

816 817
.. _func_suball:

Geoff Simmons's avatar
Geoff Simmons committed
818 819
suball(...)
-----------
820

821 822
::

Geoff Simmons's avatar
Geoff Simmons committed
823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840
   STRING suball(
      STRING pattern,
      STRING text,
      STRING rewrite,
      STRING fallback="**SUBALL FUNCTION FAILED**",
      BOOL utf8=0,
      BOOL posix_syntax=0,
      BOOL longest_match=0,
      INT max_mem=8388608,
      BOOL literal=0,
      BOOL never_nl=0,
      BOOL dot_nl=0,
      BOOL never_capture=0,
      BOOL case_sensitive=1,
      BOOL perl_classes=0,
      BOOL word_boundary=0,
      BOOL one_line=0
   )
841

842 843
Like the ``sub()`` function, except that all successive
non-overlapping matches in ``text`` are replace with ``rewrite``.
Geoff Simmons's avatar
Geoff Simmons committed
844

845 846
The default fallback is ``"**SUBALL FUNCTION FAILED**"``. The
``suball()`` function fails under the same conditions as ``sub()``.
Geoff Simmons's avatar
Geoff Simmons committed
847 848 849 850 851 852 853 854 855

Example::

  # If the beresp header X-Sub-Letters contains "b+", and Host contains
  # "www.yabba.dabba.doo.com", then set X-Yada to
  # "www.yada.dada.doo.com".
  set beresp.http.X-Yada = re2.suball(beresp.http.X-Sub-Letters,
                                      bereq.http.Host, "d");

Geoff Simmons's avatar
Geoff Simmons committed
856

857 858
.. _func_extract:

Geoff Simmons's avatar
Geoff Simmons committed
859 860
extract(...)
------------
861 862

::
863

Geoff Simmons's avatar
Geoff Simmons committed
864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881
   STRING extract(
      STRING pattern,
      STRING text,
      STRING rewrite,
      STRING fallback="**EXTRACT FUNCTION FAILED**",
      BOOL utf8=0,
      BOOL posix_syntax=0,
      BOOL longest_match=0,
      INT max_mem=8388608,
      BOOL literal=0,
      BOOL never_nl=0,
      BOOL dot_nl=0,
      BOOL never_capture=0,
      BOOL case_sensitive=1,
      BOOL perl_classes=0,
      BOOL word_boundary=0,
      BOOL one_line=0
   )
882

883 884 885
Compiles ``pattern`` with the given options, and if it matches
``text``, then return ``rewrite`` with substitutions from the matching
portions of ``text``, ignoring the non-matching portions.
Geoff Simmons's avatar
Geoff Simmons committed
886

887 888 889
The default fallback is ``"**EXTRACT FUNCTION FAILED**"``. The
``extract()`` function fails under the same conditions as ``sub()``
and ``suball()``.
Geoff Simmons's avatar
Geoff Simmons committed
890 891 892 893 894 895 896 897

Example::

  # If beresp header X-Params contains "(foo|bar)=(baz|quux)", and the
  # URL contains "bar=quux", then set X-Query to "bar:quux".
  set beresp.http.X-Query = re2.extract(beresp.http.X-Params, bereq.url,
                                        "\1:\2");

Geoff Simmons's avatar
Geoff Simmons committed
898

Geoff Simmons's avatar
Geoff Simmons committed
899 900 901 902 903 904 905 906 907 908 909 910 911 912 913 914 915 916 917 918 919 920 921 922 923 924 925 926 927 928 929 930 931 932 933
.. _func_cost:

cost(...)
---------

::

   INT cost(
      STRING pattern,
      BOOL utf8=0,
      BOOL posix_syntax=0,
      BOOL longest_match=0,
      INT max_mem=8388608,
      BOOL literal=0,
      BOOL never_nl=0,
      BOOL dot_nl=0,
      BOOL never_capture=0,
      BOOL case_sensitive=1,
      BOOL perl_classes=0,
      BOOL word_boundary=0,
      BOOL one_line=0
   )

Like the ``.cost()`` method above, return a numeric measurement > 0
from the RE2 library for ``pattern`` with the given options. More
complex regexen have a higher cost than less complex regexen.

Fails and returns -1 if ``pattern`` cannot be compiled.

Example::

  std.log("simple cost=" + re2.cost("simple")
          + " complex cost=" + re2.cost("complex{1,128}"));


Geoff Simmons's avatar
Geoff Simmons committed
934 935
.. _obj_set:

Geoff Simmons's avatar
Geoff Simmons committed
936 937
set(...)
--------
Geoff Simmons's avatar
Geoff Simmons committed
938

939
::
Geoff Simmons's avatar
Geoff Simmons committed
940

941
   new xset = re2.set(
Geoff Simmons's avatar
Geoff Simmons committed
942 943 944 945 946 947 948 949 950 951 952 953 954
      ENUM {none, start, both} anchor=none,
      BOOL utf8=0,
      BOOL posix_syntax=0,
      BOOL longest_match=0,
      INT max_mem=8388608,
      BOOL literal=0,
      BOOL never_nl=0,
      BOOL dot_nl=0,
      BOOL case_sensitive=1,
      BOOL perl_classes=0,
      BOOL word_boundary=0,
      BOOL one_line=0
   )
955 956 957

Initialize a set object that represents several patterns combined by
alternation -- ``|`` for "or".
958

959 960 961 962 963 964 965 966 967 968 969 970 971 972
Optional parameters control the interpretation of the resulting
composed pattern. The ``anchor`` parameter is an enum that can have
the values ``none``, ``start`` or ``both``, where ``none`` is the
default. ``start`` means that each pattern is matched as if it begins
with ``^`` for start-of-text, and ``both`` means that each pattern is
anchored with both ``^`` at the beginning and ``$`` for end-of-text at
the end. ``none`` means that each pattern is interpreted as a partial
match (although individual patterns within the set may have either of
``^`` of ``$``).

For example, if a set is initialized with ``anchor=both``, and the
patterns ``foo`` and ``bar`` are added, then matches against the set
match a string against ``^foo$|^bar$``, or equivalently
``^(foo|bar)$``.
973

974 975 976 977
The usual regex options can be set, which then control matching
against the resulting composed pattern. However, the ``never_capture``
option cannot be set, and is always implicitly true, since backrefs
and namedrefs are not possible with sets.
Geoff Simmons's avatar
Geoff Simmons committed
978 979 980 981 982 983 984 985 986 987 988 989 990 991 992 993 994 995 996 997

Example::

  sub vcl_init {
        # Initialize a regex set for partial matches
	# with default options
  	new foo = re2.set();

        # Initialize a regex set for case insensitive matches
	# with anchors on both ends (^ and $).
  	new bar = re2.set(anchor=both, case_sensitive=false);

        # Initialize a regex set using POSIX syntax, but allowing
	# Perl character classes, and anchoring at the left (^).
  	new baz = re2.set(anchor=start, posix_syntax=true,
	                  perl_classes=true);
  }

.. _func_set.add:

Geoff Simmons's avatar
Geoff Simmons committed
998 999
set.add(...)
------------
1000 1001

::
Geoff Simmons's avatar
Geoff Simmons committed
1002

Geoff Simmons's avatar
Geoff Simmons committed
1003 1004
      VOID xset.add(
            STRING,
1005 1006 1007 1008 1009
            [STRING string],
            [BACKEND backend],
            [BOOL save],
            [BOOL never_capture],
            [INT integer]
Geoff Simmons's avatar
Geoff Simmons committed
1010
      )
Geoff Simmons's avatar
Geoff Simmons committed
1011

1012 1013 1014
Add the given pattern to the set. If the pattern is invalid,
``.add()`` fails, and the VCL will fail to load, with an error message
describing the problem.
Geoff Simmons's avatar
Geoff Simmons committed
1015

1016 1017 1018 1019 1020 1021
If values for the ``string``, ``backend`` and/or ``integer``
parameters are provided, then these values can be retrieved with the
``.string()``, ``.backend()`` and ``.integer()`` methods,
respectively, as described below. This makes it possible to associate
data with the added pattern after it matches successfully. By default
the pattern is not associated with any such value.
1022

1023 1024 1025 1026 1027 1028 1029 1030 1031 1032 1033 1034 1035 1036 1037 1038 1039 1040 1041
If ``save`` is true, then the given pattern is compiled and saved as a
``regex`` object, just as if the ``regex`` constructor described above
is invoked. This object is stored internally in the ``set`` object as
an independent matcher, separate from "compound" pattern formed by the
set as an alternation of the patterns added to it. By default,
``save`` is **false**.

When the ``.match()`` method on the set is successful, and one of the
patterns that matched is associated with a saved internal ``regex``
object, then that object may be used for subsequent method invocations
such as ``.sub()`` on the set object, whose meanings are the same as
documented above for ``regex`` objects. Details are described below.

When an internal ``regex`` object is saved (i.e. when ``save`` is
true), it is compiled with the same options that were provided to the
set object in the constructor. The ``never_capture`` option can also
be set to false for the individual regex, even though it is implicitly
set to true for the full set object (default is false).

1042 1043 1044 1045 1046
``.add()`` MUST be called in ``vcl_init``, and MAY NOT be called after
``.compile()``.  If ``.add()`` is called in any other subroutine, an
error message with ``VCL_Error`` is logged, and the call has no
effect. If it is called in ``vcl_init`` after ``.compile()``, then the
VCL load will fail with an error message.
Geoff Simmons's avatar
Geoff Simmons committed
1047

1048 1049
In other words, add all patterns to the set in ``vcl_init``, and
finally call ``.compile()`` when you're done.
Geoff Simmons's avatar
Geoff Simmons committed
1050

1051 1052
When the ``.matched(INT)`` method is called after a successful match,
the numbering corresponds to the order in which patterns were added.
1053 1054 1055
The same is true of the INT arguments that may be given for methods
such as ``.string()``, ``.backend()`` or ``.sub()``, as described
below.
1056

Geoff Simmons's avatar
Geoff Simmons committed
1057 1058 1059 1060 1061 1062 1063 1064 1065 1066 1067 1068 1069
Example::

  sub vcl_init {
      # literal=true means that the dots are interpreted as literal
      # dots, not "match any character".
      new hostmatcher = re2.set(anchor=both, case_sensitive=false,
                                literal=true);
      hostmatcher.add("www.domain1.com");
      hostmatcher.add("www.domain2.com");
      hostmatcher.add("www.domain3.com");
      hostmatcher.compile();
  }

1070 1071 1072
  # See the documentation of the .string() and .backend() methods
  # below for uses of the parameters string and backend for .add().

1073

Geoff Simmons's avatar
Geoff Simmons committed
1074
.. _func_set.compile:
Geoff Simmons's avatar
Geoff Simmons committed
1075

Geoff Simmons's avatar
Geoff Simmons committed
1076 1077
VOID xset.compile()
-------------------
Geoff Simmons's avatar
Geoff Simmons committed
1078

1079 1080
Compile the compound pattern represented by the set -- an alternation
of all patterns added by ``.add()``.
Geoff Simmons's avatar
Geoff Simmons committed
1081

1082 1083 1084 1085
``.compile()`` fails if no patterns were added to the set. It may also
fail if the ``max_mem`` setting is not large enough for the composed
pattern. In that case, the VCL load will fail with an error message
(then consider a larger value for ``max_mem`` in the set constructor).
Geoff Simmons's avatar
Geoff Simmons committed
1086

1087 1088 1089 1090 1091
``.compile()`` MUST be called in ``vcl_init``, and MAY NOT be called
more than once for a set object. If it is called in any other
subroutine, a ``VCL_Error`` message is logged, and the call has no
effect. If it is called a second time in ``vcl_init``, the VCL load
will fail.
Geoff Simmons's avatar
Geoff Simmons committed
1092

1093
See above for examples.
Geoff Simmons's avatar
Geoff Simmons committed
1094

1095

Geoff Simmons's avatar
Geoff Simmons committed
1096
.. _func_set.match:
Geoff Simmons's avatar
Geoff Simmons committed
1097

Geoff Simmons's avatar
Geoff Simmons committed
1098 1099
BOOL xset.match(STRING)
-----------------------
Geoff Simmons's avatar
Geoff Simmons committed
1100

1101 1102 1103
Returns ``true`` if the given string matches the compound pattern
represented by the set, i.e. if it matches any of the patterns that
were added to the set.
Geoff Simmons's avatar
Geoff Simmons committed
1104

1105 1106 1107 1108 1109
The matcher identifies all of the patterns that were added to the set
and match the given string. These can be determined after a successful
match using the ``.matched(INT)`` and ``.nmatches()`` methods
described below.

1110 1111
``.match()`` MUST be called after ``.compile()``; otherwise the match
always fails.
Geoff Simmons's avatar
Geoff Simmons committed
1112

1113 1114 1115 1116 1117 1118 1119 1120 1121 1122 1123
A match may also fail (returning ``false``) if the internal memory
limit imposed by the ``max_mem`` parameter in the constructor is
exceeded. (With the default value of ``max_mem``, this ordinarily
requires very large patterns and/or a very large string to be
matched.)  Since about version 2017-12-01, the RE2 library reports
this condition; if so, the VMOD writes a ``VCL_Error`` message in the
log if it happens, except during ``vcl_init``, in which case the VCL
load fails with the error message. If matches fail due to the
out-of-memory condition, increase the ``max_mem`` parameter in the
constructor.

Geoff Simmons's avatar
Geoff Simmons committed
1124 1125 1126 1127 1128 1129
Example::

  if (hostmatcher.match(req.http.Host)) {
     call do_when_a_host_matched;
  }

1130

Geoff Simmons's avatar
Geoff Simmons committed
1131
.. _func_set.matched:
1132

Geoff Simmons's avatar
Geoff Simmons committed
1133 1134
BOOL xset.matched(INT)
----------------------
1135 1136 1137 1138 1139 1140 1141 1142 1143 1144 1145 1146 1147 1148 1149 1150 1151 1152 1153 1154 1155 1156 1157 1158 1159 1160 1161 1162 1163 1164 1165 1166 1167 1168 1169 1170 1171 1172 1173 1174 1175

Returns ``true`` after a successful match if the ``nth`` pattern that
was added to the set is among the patterns that matched, ``false``
otherwise. The numbering of the patterns corresponds to the order in
which patterns were added in ``vcl_init``, counting from 1.

The method refers back to the most recent invocation of ``.match()``
for the same object in the same client or backend context. It always
returns ``false``, for every value of the parameter, if it is called
after an unsuccessful match (``.match()`` returned ``false``).

``.matched()`` fails and returns ``false`` if:

* The ``.match()`` method was not called for this object in the same
  client or backend scope.

* The integer parameter is out of range; that is, if it is less than 1
  or greater than the number of patterns added to the set.

On failure, the method writes an error message to the log with the tag
``VCL_Error``; if it fails during ``vcl_init``, then the VCL load
fails with the error message. In any other VCL subroutine, the method
returns ``false`` on failure and processing continues; since ``false``
is a legitimate return value, you should consider monitoring the log
for the error messages.

Example::

  if (hostmatcher.match(req.http.Host)) {
      if (hostmatcher.matched(1)) {
          call do_domain1;
      }
      if (hostmatcher.matched(2)) {
          call do_domain2;
      }
      if (hostmatcher.matched(3)) {
          call do_domain3;
      }
  }


Geoff Simmons's avatar
Geoff Simmons committed
1176
.. _func_set.nmatches:
1177

Geoff Simmons's avatar
Geoff Simmons committed
1178 1179
INT xset.nmatches()
-------------------
1180 1181 1182 1183 1184 1185 1186 1187 1188 1189 1190 1191 1192 1193 1194 1195 1196 1197 1198 1199 1200

Returns the number of patterns that were matched by the most recent
invocation of ``.match()`` for the same object in the same client or
backend context. The method always returns 0 after an unsuccessful
match (``.match()`` returned ``false``).

If ``.match()`` was not called for this object in the same client or
backend scope, ``.nmatches()`` fails and returns 0, writing an error
message with ``VCL_Error`` to the log. If this happens in
``vcl_init``, the VCL load fails with the error message. As with
``.matched()``, ``.nmatches()`` returns a legitimate value and VCL
processing continues when it fails in any other subroutine, so you
should monitor the log for the error messages.

Example::

  if (myset.match(req.url)) {
      std.log("URL matched " + myset.nmatches()
              + " patterns from the set");
  }

1201

Geoff Simmons's avatar
Geoff Simmons committed
1202
.. _func_set.which:
1203

Geoff Simmons's avatar
Geoff Simmons committed
1204 1205
INT xset.which(ENUM {FIRST, LAST, UNIQUE} select=UNIQUE)
--------------------------------------------------------
1206

Geoff Simmons's avatar
Geoff Simmons committed
1207 1208 1209 1210 1211 1212
Returns a number indicating which pattern in a set matched in the most
recent invocation of ``.match()`` in the client or backend
context. The number corresponds to the order in which patterns were
added to the set in ``vcl_init``, counting from 1.

If exactly one pattern matched in the most recent ``.match()`` call
1213 1214 1215
(so that ``.nmatches()`` returns 1), and the ``select`` ENUM is set to
``UNIQUE``, then the number for that pattern is returned. ``select``
defaults to ``UNIQUE``, so it can be left out in this case.
Geoff Simmons's avatar
Geoff Simmons committed
1216 1217 1218 1219 1220 1221 1222 1223 1224 1225 1226 1227 1228 1229 1230

If more than one pattern matched in the most recent ``.match()`` call
(``.nmatches()`` > 1), then the ``select`` ENUM determines the integer
that is returned. The values ``FIRST`` and ``LAST`` specify that, of
the patterns that matched, the first or last one added via the
``.add()`` method is chosen, and the number for that pattern is
returned.

``.which()`` fails, returning 0 with a ``VCL_Error`` message in the log,
if:

* ``.match()`` was not called for the set in the current client or
  backend transaction, or if the previous call returned ``false``.

* More than one pattern in the set matched in the previous
1231 1232
  ``.match()`` call, but the ``select`` parameter is set to ``UNIQUE``
  (or left out, since ``select`` defaults to ``UNIQUE``).
Geoff Simmons's avatar
Geoff Simmons committed
1233 1234 1235 1236 1237 1238 1239 1240 1241 1242 1243 1244 1245 1246 1247 1248 1249 1250 1251 1252 1253 1254 1255 1256 1257 1258 1259 1260 1261 1262 1263 1264

Examples::

  sub vcl_init {
      new myset = re2.set();
      myset.add("foo");	# Pattern 1
      myset.add("bar");	# Pattern 2
      myset.add("baz");	# Pattern 3
      myset.compile();
  }

  sub vcl_recv {
      if (myset.match("bar")) {
          # myset.which() returns 2.
      }
      if (myset.which("foobaz")) {
          # myset.which() fails and returns 0, with a log
          #               message indicating that 2 patterns
          #               matched.
          # myset.which(FIRST) returns 1.
          # myset.which(LAST) returns 3.
      }
      if (myset.match("quux")) {
          # ...
      }
      else {
          # myset.which() fails and returns 0, with either or
          # no value for the select ENUM, with a log message
          # indicating that the previous .match() call was
          # unsuccessful.
      }

Geoff Simmons's avatar
Geoff Simmons committed
1265

1266 1267
.. _func_set.string:

Geoff Simmons's avatar
Geoff Simmons committed
1268 1269
STRING xset.string(INT n, ENUM select)
--------------------------------------
1270 1271 1272

::

Geoff Simmons's avatar
Geoff Simmons committed
1273 1274 1275 1276
      STRING xset.string(
            INT n=0,
            ENUM {FIRST, LAST, UNIQUE} select=UNIQUE
      )
1277

1278 1279
Returns the string associated with the `nth` pattern added to the set,
or with the pattern in the set that matched in the most recent call to
1280 1281 1282 1283 1284 1285 1286 1287 1288 1289 1290 1291 1292 1293 1294 1295 1296 1297 1298 1299 1300
``.match()`` in the same task scope (client or backend context). The
string set with the ``string`` parameter of the ``.add()`` method
during ``vcl_init`` is returned.

The pattern is identified with the parameters ``n`` and ``select``
according to these rules, which also hold for all further ``set``
methods documented in the following.

* If ``n`` > 0, then select the `nth` pattern added to the set with
  the ``.add()`` method, counting from 1. This identifies the `nth`
  pattern in any context, regardless of whether ``.match()`` was
  called previously, or whether a previous call returned ``true`` or
  ``false``. The ``select`` parameter is ignored in this case.

* If ``n`` <= 0, then select a pattern in the set that matched
  successfully in the most recent call to ``.match()`` in the same
  task scope. Since ``n`` is 0 by default, ``n`` can be left out for
  this purpose.

* If ``n`` <= 0 and exactly one pattern in the set matched in the most
  recent invocation of ``.match()`` (and hence ``.nmatches()`` returns
1301 1302 1303 1304
  1), and ``select`` is set to ``UNIQUE``, then select that
  pattern. ``select`` defaults to ``UNIQUE``, so when exactly one
  pattern in the set matched, both ``n`` and ``select`` can be left
  out.
1305 1306 1307 1308 1309 1310 1311 1312 1313 1314

* If ``n`` <= 0 and more than one pattern matched in the most recent
  ``.match()`` call (``.nmatches()`` > 1), then the selection of a
  pattern is determined by the ``select`` parameter. As with
  ``.which()``, ``FIRST`` and ``LAST`` specify the first or last
  matching pattern added via the ``.add()`` method.

For the pattern selected by these rules, return the string that was
set with the ``string`` parameter in the ``.add()`` method that added
the pattern to the set.
1315 1316 1317 1318

``.string()`` fails, returning NULL with an a ``VCL_Error`` message in
the log, if:

1319 1320 1321
* The values of ``n`` and ``select`` are invalid:

  * ``n`` is greater than the number of patterns in the set.
1322

1323 1324
  * ``n`` <= 0 (or left to the default), but ``.match()`` was not
    called earlier in the same task scope (client or backend context).
1325

1326
  * ``n`` <= 0, but the previous ``.match()`` call returned ``false``.
1327

1328 1329
  * ``n`` <= 0 and the ``select`` ENUM is ``UNIQUE`` (or default), but
    more than one pattern matched in the previous ``.match()`` call.
1330

1331 1332 1333
* No string was associated with the pattern selected by ``n`` and
  ``select``; that is, the ``string`` parameter was not set in the
  ``.add()`` call that added the pattern.
1334 1335 1336 1337 1338 1339 1340 1341 1342 1343 1344 1345 1346 1347 1348 1349 1350 1351 1352 1353 1354 1355 1356 1357 1358 1359 1360 1361 1362 1363 1364 1365 1366 1367 1368 1369 1370 1371 1372 1373 1374 1375 1376 1377 1378 1379 1380 1381 1382 1383 1384 1385 1386 1387 1388 1389 1390 1391 1392 1393 1394 1395 1396 1397 1398 1399 1400 1401

Examples::

  # Match the request URL against a set of patterns, and generate
  # a synthetic redirect response with a Location header derived
  # from the string assoicated with the matching pattern.

  # In the first example, exactly one pattern in the set matches.

  sub vcl_init {
      # With anchor=both, we specify exact matches.
      new matcher = re2.set(anchor=both);
      matcher.add("/foo/bar", "/baz/quux");
      matcher.add("/baz/bar/foo", "/baz/quux/foo");
      matcher.add("/quux/bar/baz/foo", "/baz/quux/foo/bar");
      matcher.compile();
  }

  sub vcl_recv {
      if (matcher.match(req.url)) {
          # Confirm that there was exactly one match
          if (matcher.nmatches() != 1) {
              return(fail);
          }
          # Divert to vcl_synth, sending the string associated
          # with the matching pattern in the "reason" field.
          return(synth(1301, matcher.string()));
      }
  }

  sub vcl_synth {
      # Construct a redirect response, using the path set in
      # resp.reason.
      if (resp.status == 1301) {
          set resp.http.Location
              = "http://otherdomain.org" + resp.reason;
          set resp.status = 301;
          set resp.reason = "Moved Permanently";
          return(deliver);
      }
  }

  # In the second example, the patterns that may match have
  # common prefixes, and more than one pattern may match. We
  # add patterns to the set in a "more specific" to "less
  # specific" order, and we choose the most specific pattern
  # that matches, by specifying the first matching pattern in
  # the set.

  sub vcl_init {
      # With anchor=start, we specify matching prefixes.
      new matcher = re2.set(anchor=start);
      matcher.add("/foo/bar/baz/quux", "/baz/quux");
      matcher.add("/foo/bar/baz", "/baz/quux/foo");
      matcher.add("/foo/bar", "/baz/quux/foo/bar");
      matcher.add("/foo", "/baz");
      matcher.compile();
  }

  sub vcl_recv {
      if (matcher.match(req.url)) {
          # Select the first matching pattern
          return(synth(1301, matcher.string(select=FIRST)));
      }
  }

  # vcl_synth is implemented as shown above

Geoff Simmons's avatar
Geoff Simmons committed
1402

1403 1404
.. _func_set.backend:

Geoff Simmons's avatar
Geoff Simmons committed
1405 1406
BACKEND xset.backend(INT n, ENUM select)
----------------------------------------
1407 1408 1409

::

Geoff Simmons's avatar
Geoff Simmons committed
1410 1411 1412 1413
      BACKEND xset.backend(
            INT n=0,
            ENUM {FIRST, LAST, UNIQUE} select=UNIQUE
      )
1414

1415 1416 1417 1418 1419 1420
Returns the backend associated with the `nth` pattern added to the
set, or with the pattern in the set that matched in the most recent
call to ``.match()`` in the same task scope (client or backend
context).

The rules for selecting a pattern from the set and its associated
1421 1422
backend based on ``n`` and ``select`` are the same as described above
for ``.string()``.
1423 1424 1425

``.backend()`` fails, returning NULL with an a ``VCL_Error`` message
in the log, under the same conditions described for ``.string()``
1426 1427
above -- ``n`` and ``select`` are invalid, or no backend was
associated with the selected pattern with the ``.add()`` method.
1428 1429 1430 1431 1432 1433 1434 1435 1436 1437 1438 1439 1440 1441 1442 1443 1444 1445 1446 1447 1448 1449 1450 1451 1452 1453 1454 1455 1456 1457 1458

Example::

  # Choose a backend based on the URL prefix.

  # In this example, assume that backends b1 through b4
  # have been defined.

  sub vcl_init {
      # Use anchor=start to match prefixes.
      # The prefixes are unique, so exactly one will match.
      new matcher = re2.set(anchor=start);
      matcher.add("/foo", backend=b1);
      matcher.add("/bar", backend=b2);
      matcher.add("/baz", backend=b3);
      matcher.add("/quux", backend=b4);
      matcher.compile();
  }

  sub vcl_recv {
      if (matcher.match(req.url)) {
          # Confirm that there was exactly one match
          if (matcher.nmatches() != 1) {
              return(fail);
          }
          # Set the backend hint to the backend associated
          # with the matching pattern.
          set req.backend_hint = matcher.backend();
      }
  }

Geoff Simmons's avatar
Geoff Simmons committed
1459

1460 1461 1462 1463 1464 1465 1466 1467 1468 1469 1470 1471 1472 1473 1474 1475 1476 1477 1478 1479 1480 1481 1482 1483 1484 1485 1486 1487 1488 1489 1490 1491 1492 1493 1494 1495 1496 1497 1498 1499 1500 1501 1502 1503 1504 1505 1506 1507 1508 1509 1510 1511 1512 1513 1514 1515 1516 1517 1518 1519 1520 1521 1522 1523 1524 1525 1526 1527 1528 1529 1530 1531 1532 1533 1534
.. _func_set.integer:

INT xset.integer(INT n, ENUM select)
------------------------------------

::

      INT xset.integer(
            INT n=0,
            ENUM {FIRST, LAST, UNIQUE} select=UNIQUE
      )

Returns the integer associated with the `nth` pattern added to the
set, or with the pattern in the set that matched in the most recent
call to ``.match()`` in the same task scope.

The rules for selecting a pattern from the set and its associated
integer based on ``n`` and ``select`` are the same as described above
for ``.string()``.

``.integer()`` invokes VCL failure under the same error conditions
described for ``.string()`` above -- ``n`` and ``select`` are invalid,
or no integer was associated with the selected pattern with the
``.add()`` method.

Note that VCL failure differs from the failure mode for ``.string()``
and ``.backend()``, since there is no distinguished "error" value that
could be returned as the INT. VCL failure has the same effect as if
``return(fail)`` were called from a VCL subroutine; usually, control
directs immediately to ``vcl_synth``, with the response status set to
503, and the response reason set to "VCL failed".

You can avoid that, for example, by testing if ``.nmatches()==1``
after calling ``.match()``, if you need to ensure that calling
``.integer(select=UNIQUE)`` will not fail.

Example::

  # Generate redirect responses based on the Host header. In the
  # example, subdomains are removed in the new Location, and the
  # associated integer is used to set the redirect status code.

  sub vcl_init {
      # No more than one pattern can match the same string. So it
      # is safe to call .integer() with default select=UNIQUE in
      # vcl_recv below (no risk of VCL failure).
      new redir = re2.set(anchor=both);
      redir.add("www\.[^.]+\.foo\.com", integer=301, string="www.foo.com");
      redir.add("www\.[^.]+\.bar\.com", integer=302, string="www.bar.com");
      redir.add("www\.[^.]+\.baz\.com", integer=303, string="www.baz.com");
      redir.add("www\.[^.]+\.quux\.com", integer=307, string="www.quux.com");
      redir.compile();
  }

  sub vcl_recv {
      if (redir.match(req.http.Host)) {
          # Construct a Location header that will be used in the
          # synthetic redirect response.
          set req.http.Location = "http://" + redir.string() + req.url;

	  # Set the response status from the associated integer.
	  return( synth(redir.integer()) );
      }
  }

  sub vcl_synth {
      if (resp.status >= 301 && resp.status <= 307) {
          # We come here from the synth return for the redirect
	  # response. The status code was set from .integer().
          set resp.http.Location = req.http.Location;
	  return(deliver);
      }
  }


1535 1536
.. _func_set.sub:

Geoff Simmons's avatar
Geoff Simmons committed
1537 1538
set.sub(...)
------------
1539 1540 1541

::

Geoff Simmons's avatar
Geoff Simmons committed
1542 1543 1544 1545 1546 1547 1548
      STRING xset.sub(
            STRING text,
            STRING rewrite,
            STRING fallback="**SUB METHOD FAILED**",
            INT n=0,
            ENUM {FIRST, LAST, UNIQUE} select=UNIQUE
      )
1549 1550 1551 1552 1553 1554 1555 1556 1557 1558 1559 1560 1561 1562 1563 1564 1565 1566 1567 1568 1569 1570 1571 1572 1573 1574 1575 1576 1577 1578 1579 1580 1581 1582 1583 1584 1585 1586 1587 1588 1589 1590 1591 1592 1593 1594 1595 1596 1597 1598 1599 1600 1601 1602 1603 1604 1605 1606 1607 1608 1609 1610 1611

Returns the result of the method call ``.sub(text, rewrite, fallback)``,
as documented above for the ``regex`` interface, invoked on the `nth`
pattern added to the set, or on the pattern in the set that matched in
the most recent call to ``.match()`` in the same task scope.

``.sub()`` requires that the pattern it identifies was saved as an
internal ``regex`` object, by setting ``save`` to true when it was
added with the ``.add()`` method.

The associated pattern is determined by ``n`` and ``select`` according
to the rules given above. If an internal ``regex`` object was saved
for that pattern, then the result of the ``.sub()`` method invoked on
that object is returned.

``.sub()`` fails, returning NULL with a ``VCL_Error`` message in the
log, if:

* The values of ``n`` and ``select`` are invalid, according to the
  rules given above.

* ``save`` was false in the ``.add()`` method for the pattern
  identified by ``n`` and ``select``; that is, no internal ``regex``
  object was saved on which the ``.sub()`` method could have been
  invoked.

* The ``.sub()`` method invoked on the ``regex`` object fails for any
  of the reasons described for ``regex.sub()``.

Examples::

  # Generate synthethic redirect responses on URLs that match a set of
  # patterns, rewriting the URL according to the matched pattern.

  # In this example, we set the new URL in the redirect location to
  # the path that comes after the prefix of the original req.url.
  sub vcl_init {
      new matcher = re2.set(anchor=start);
      matcher.add("/foo/(.*)", save=true);
      matcher.add("/bar/(.*)", save=true);
      matcher.add("/baz/(.*)", save=true);
      matcher.compile();
  }

  sub vcl_recv {
      if (matcher.match(req.url)) {
          if (matcher.nmatches() != 1) {
              return(fail);
          }
          return(synth(1301));
      }
  }

  sub vcl_synth {
      if (resp.status == 1301) {
      	  # matcher.sub() rewrites the URL to the subpath after the
	  # original prefix.
          set resp.http.Location
              = "http://www.otherdomain.org" + matcher.sub(req.url, "/\1");
          return(deliver);
      }
  }

1612

1613 1614 1615 1616 1617 1618 1619 1620 1621 1622 1623 1624 1625 1626 1627 1628
.. _func_set.suball:

set.suball(...)
---------------

::

      STRING xset.suball(
            STRING text,
            STRING rewrite,
            STRING fallback="**SUBALL METHOD FAILED**",
            INT n=0,
            ENUM {FIRST, LAST, UNIQUE} select=UNIQUE
      )

Like the ``.sub()`` method, this returns the result of calling
1629
``.suball(text, rewrite, fallback)`` from the regex interface on the
1630 1631 1632 1633 1634 1635 1636 1637 1638 1639 1640 1641 1642 1643 1644 1645 1646 1647 1648 1649 1650 1651 1652 1653 1654 1655 1656 1657 1658 1659 1660 1661 1662 1663 1664 1665 1666 1667 1668 1669
`nth` pattern added to the set, or the pattern that most recently
matched in a ``.match()`` call.

``.suball()`` is subject to the same conditions as the ``.sub()`` method:

* The pattern to which it is applied is identified by ``n`` and
  ``select`` according to the rules given above.

* It fails if:

  * The pattern that it identifies was not saved with ``.add(save=true)``.

  * The values of ``n`` or ``select`` are invalid.

  * The ``.suball()`` method invoked on the saved ``regex`` object
    fails.

Example::

  # In any URL that matches one of the words given below, replace all
  # occurrences of the matching word with "quux" (for example to
  # rewrite path components or elements of query strings).
  sub vcl_init {
      new matcher = re2.set();
      matcher.add("\bfoo\b", save=true);
      matcher.add("\bbar\b", save=true);
      matcher.add("\bbaz\b", save=true);
      matcher.compile();
  }

  sub vcl_recv {
      if (matcher.match(req.url)) {
          if (matcher.nmatches() != 1) {
              return(fail);
          }
          set req.url = matcher.suball(req.url, "quux");
      }
  }


1670 1671 1672 1673 1674 1675 1676 1677 1678 1679 1680 1681 1682 1683 1684 1685 1686 1687 1688 1689 1690 1691 1692 1693 1694 1695 1696 1697 1698 1699 1700 1701 1702 1703 1704 1705 1706 1707 1708 1709 1710 1711 1712 1713 1714 1715 1716 1717 1718 1719 1720 1721 1722 1723 1724 1725 1726
.. _func_set.extract:

set.extract(...)
----------------

::

      STRING xset.extract(
            STRING text,
            STRING rewrite,
            STRING fallback="**EXTRACT METHOD FAILED**",
            INT n=0,
            ENUM {FIRST, LAST, UNIQUE} select=UNIQUE
      )

Like the ``.sub()`` and ``.suball()`` methods, this method returns the
result of calling ``.extract(text, rewrite, fallback)`` from the regex
interface on the `nth` pattern added to the set, or the pattern that most
recently matched in a ``.match()`` call.

``.extract()`` is subject to the same conditions as the other rewrite
methods:

* The pattern to which it is applied is identified by ``n`` and
  ``select`` according to the rules given above.

* It fails if:

  * The pattern that it identifies was not saved with ``.add(save=true)``.

  * The values of ``n`` or ``select`` are invalid.

  * The ``.extract()`` method invoked on the saved ``regex`` object
    fails.

Example::

  # Rewrite any URL that matches one of the patterns in the set
  # by exchanging the path components.
  sub vcl_init {
      new matcher = re2.set(anchor=both);
      matcher.add("/(foo)/(bar)/", save=true);
      matcher.add("/(bar)/(baz)/", save=true);
      matcher.add("/(baz)/(quux)/", save=true);
      matcher.compile();
  }

  sub vcl_recv {
      if (matcher.match(req.url)) {
          if (matcher.nmatches() != 1) {
              return(fail);
          }
          set req.url = matcher.extract(req.url, "/\2/\1/");
      }
  }


1727 1728 1729 1730 1731 1732 1733 1734
.. _func_set.saved:

BOOL xset.saved(ENUM which, INT n, ENUM select)
-----------------------------------------------

::

      BOOL xset.saved(
1735
            ENUM {REGEX, STR, BE, INT} which=REGEX,
1736 1737 1738 1739 1740 1741 1742 1743 1744
            INT n=0,
            ENUM {FIRST, LAST, UNIQUE} select=UNIQUE
      )

Returns true if and only if an object of the type indicated by
``which`` was saved at initialization time for the ``nth`` pattern
added to the set, or for the pattern indicated by ``select`` after the
most recent ``.match()`` call.

1745 1746 1747 1748 1749 1750 1751 1752 1753 1754 1755 1756 1757 1758 1759
In other words, ``.saved()`` returns true:

* for ``which=REGEX`` if the individual regex was saved with
  ``.add(save=true)`` for the indicated pattern

* for ``which=STR`` if a string was stored with the ``string``
  parameter in ``.add()``

* for ``which=BE`` if a backend was stored with the ``backend``
  attribute.

* for ``which=INT`` if an integer was stored with the ``integer``
  attribute.

The default value of ``which`` is ``REGEX``.
1760 1761 1762 1763 1764 1765 1766 1767 1768 1769 1770 1771 1772 1773 1774 1775 1776 1777 1778 1779 1780 1781 1782 1783 1784 1785 1786 1787 1788 1789 1790 1791 1792 1793 1794 1795 1796 1797 1798 1799 1800 1801 1802 1803 1804 1805 1806 1807 1808 1809 1810

The pattern in the set is identified by ``n`` and ``select`` according
to the rules given above. ``.saved()`` fails, returning false with a
``VCL_Error`` message in the log, if the values of ``n`` or ``select``
are invalid.

Example::

  sub vcl_init {
      new s = re2.set();
      s.add("1", save=true, string="1", backend=b1);
      s.add("2", save=true, string="2");
      s.add("3", save=true, backend=b3);
      s.add("4", save=true);
      s.add("5", string="5", backend=b5);
      s.add("6", string="6");
      s.add("7", backend=b7);
      s.add("8");
      s.compile();
  }

  # Then the following holds for this set:
  # s.saved(n=1) == true	# for any value of which
  # s.saved(which=REGEX, n=2) == true
  # s.saved(which=STR, n=2)   == true
  # s.saved(which=BE, n=2)    == false
  # s.saved(which=REGEX, n=3) == true
  # s.saved(which=STR, n=3)   == false
  # s.saved(which=BE, n=3)    == true
  # s.saved(which=REGEX, n=4) == true
  # s.saved(which=STR, n=4)   == false
  # s.saved(which=BE, n=4)    == false
  # s.saved(which=REGEX, n=5) == false
  # s.saved(which=STR, n=5)   == true
  # s.saved(which=BE, n=5)    == true
  # s.saved(which=REGEX, n=6) == false
  # s.saved(which=STR, n=6)   == true
  # s.saved(which=BE, n=6)    == false
  # s.saved(which=REGEX, n=7) == false
  # s.saved(which=STR, n=7)   == false
  # s.saved(which=BE, n=7)    == true
  # s.saved(n=8) == false	# for any value of which

  if (s.match("4")) {
     # The fourth pattern has been uniquely matched.
     # So in this context: s.saved() == true
     # Since save=true was used in .add() for the 4th pattern,
     # and which=REGEX by default.
  }


1811

1812

Geoff Simmons's avatar
Geoff Simmons committed
1813

1814 1815 1816 1817 1818 1819 1820 1821 1822 1823 1824 1825 1826 1827 1828 1829 1830 1831 1832 1833 1834 1835 1836 1837 1838 1839
.. _func_quotemeta:

STRING quotemeta(STRING, STRING fallback)
-----------------------------------------

::

   STRING quotemeta(
      STRING,
      STRING fallback="**QUOTEMETA FUNCTION FAILED**"
   )

Returns a copy of the argument string with all regex metacharacters
escaped via backslash. When the returned string is used as a regular
expression, it will exactly match the original string, regardless of
any special characters. This function has a purpose similar to a
``\Q..\E`` sequence within a regex, or the ``literal=true`` setting in
a regex constructor.

The function fails and returns ``fallback`` if there is insufficient
workspace for the return string.

Example::

  # The following are always true:
  re2.quotemeta("1.5-2.0?") == "1\.5\-2\.0\?"
Geoff Simmons's avatar
Geoff Simmons committed
1840
  re2.match(re2.quotemeta("1.5-2.0?"), "1.5-2.0?")
1841 1842


Geoff Simmons's avatar
Geoff Simmons committed
1843 1844 1845 1846
.. _func_version:

STRING version()
----------------
Geoff Simmons's avatar
Geoff Simmons committed
1847

1848
Return the version string for this VMOD.
Geoff Simmons's avatar
Geoff Simmons committed
1849 1850 1851 1852 1853 1854 1855 1856

Example::

  std.log("Using VMOD re2 version: " + re2.version());

REQUIREMENTS
============

Geoff Simmons's avatar
Geoff Simmons committed
1857 1858 1859
The VMOD requires the Varnish since version 6.1. See the source
repository for versions of the VMOD that are compatible with other
Varnish versions.
1860

1861
It requires the RE2 library, and has been tested against RE2 versions
Geoff Simmons's avatar
Geoff Simmons committed
1862
since 2015-06-01 (through 2019-08-01 at the time of writing).
Geoff Simmons's avatar
Geoff Simmons committed
1863

1864 1865
If the VMOD is built against versions of RE2 since 2017-12-01, it uses
a version of the set match operation that reports out-of-memory
1866 1867 1868 1869 1870 1871
conditions during a match. (Versions of RE2 since June 2019 no longer
have this error, but nevertheless the different internal call is used
for set matches.) In that case, the VMOD is not compatible with
earlier versions of RE2. This is only a problem if the runtime version
of the library differs from the version against which the VMOD was
built. If you encounter this error, consider re-building the VMOD
1872 1873 1874
against the runtime version of RE2, or installing a newer version of
RE2.

Geoff Simmons's avatar
Geoff Simmons committed
1875 1876 1877
INSTALLATION
============

1878
See `INSTALL.rst <INSTALL.rst>`_ in the source repository.
Geoff Simmons's avatar
Geoff Simmons committed
1879 1880 1881 1882 1883 1884 1885 1886 1887 1888 1889 1890 1891 1892 1893 1894 1895 1896 1897 1898 1899 1900 1901 1902 1903 1904 1905 1906 1907 1908 1909 1910 1911

LIMITATIONS
===========

The VMOD allocates Varnish workspace for captured groups and rewritten
strings. If operations fail with "insufficient workspace" error
messages in the Varnish log (with the ``VCL_Error`` tag), increase the
varnishd runtime parameters ``workspace_client`` and/or
``workspace_backend``.

The RE2 documentation states that successful matches are slowed quite
a bit when they also capture substrings. There is also additional
overhead from the VMOD, unless the ``never_capture`` flag is true, to
manage data about captured groups in the workspace. This overhead is
incurred even if there are no capturing expressions in a pattern,
since it is always possible to call ``backref(0)`` to obtain the
matched portion of a string.

So if you are using a pattern only to match against strings, and never
to capture subexpressions, consider setting the ``never_capture``
option to true, to eliminate the extra work for both RE2 and the VMOD.

AUTHOR
======

* Geoffrey Simmons <geoff@uplex.de>

UPLEX Nils Goroll Systemoptimierung

SEE ALSO
========

* varnishd(1)
1912

Geoff Simmons's avatar
Geoff Simmons committed
1913
* vcl(7)
1914

1915
* VMOD source repository: https://code.uplex.de/uplex-varnish/libvmod-re2
1916 1917 1918

  * Gitlab mirror: https://gitlab.com/uplex/varnish/libvmod-re2

Geoff Simmons's avatar
Geoff Simmons committed
1919
* RE2 git repo: https://github.com/google/re2
1920

Geoff Simmons's avatar
Geoff Simmons committed
1921
* RE2 syntax: https://github.com/google/re2/wiki/Syntax
1922

Geoff Simmons's avatar
Geoff Simmons committed
1923 1924 1925 1926 1927
* "Implementing Regular Expressions": https://swtch.com/~rsc/regexp/

  * Series of articles motivating the design of RE2, with discussion
    of how RE2 compares with PCRE

Geoff Simmons's avatar
Geoff Simmons committed
1928

Geoff Simmons's avatar
Geoff Simmons committed
1929 1930 1931
COPYRIGHT
=========

1932 1933
::

Geoff Simmons's avatar
Geoff Simmons committed
1934
  Copyright (c) 2016-2018 UPLEX Nils Goroll Systemoptimierung
1935 1936 1937 1938 1939 1940
  All rights reserved
 
  Author: Geoffrey Simmons <geoffrey.simmons@uplex.de>
 
  See LICENSE