Commit c9636944 authored by Geoff Simmons's avatar Geoff Simmons

Merge branch 'filter' into 'master'

Add filters: regex objects to perform substitutions on bodies

See merge request !3
parents 2c683cd4 a25e5b63
......@@ -31,6 +31,12 @@ SYNOPSIS
BOOL <obj>.match_body(req_body | bereq_body | resp_body
[, INT limit] [, INT limit_recursion])
# filter interface (includes all of the above)
new <obj> = re.regex(STRING [, INT limit] [, INT limit_recursion]
, forbody=true)
<obj>.substitute_match(INT, STRING)
set [be]resp.filters = "<obj>"
# function interface
BOOL re.match_dyn(STRING [, INT limit] [, INT limit_recursion])
STRING re.backref_dyn(INT [, STRING fallback])
......@@ -44,6 +50,10 @@ DESCRIPTION
.. _regsuball(): https://varnish-cache.org/docs/trunk/reference/vcl.html#regsuball-str-regex-sub
.. _beresp.filters: https://varnish-cache.org/docs/trunk/reference/vcl-var.html#beresp-filters
.. _resp.filters: https://varnish-cache.org/docs/trunk/reference/vcl-var.html#resp-filters
Varnish Module (VMOD) for matching strings against regular expressions,
and for extracting captured substrings after matches.
......@@ -96,6 +106,10 @@ the ``vcl_backend_*`` subroutines and returns ``true``, then
subsequent calls to ``backref`` in the same backend scope extract
substrings from the matched substring.
By setting the ``asfilter`` parameter to true, a regex object can also
be configured to add a filter for performing substitutions on
bodies. See `xregex.substitute_match()`_ for details and examples.
The VMOD also supports dynamic regex matching with the ``match_dyn``
and ``backref_dyn`` functions::
......@@ -125,8 +139,8 @@ never changes during the lifetime of VCL, use ``match``.
.. _re.regex():
new xregex = re.regex(STRING, INT limit, INT limit_recursion, BOOL forbody)
---------------------------------------------------------------------------
new xregex = re.regex(STRING, INT limit, INT limit_recursion, BOOL forbody, BOOL asfilter)
------------------------------------------------------------------------------------------
::
......@@ -134,7 +148,8 @@ new xregex = re.regex(STRING, INT limit, INT limit_recursion, BOOL forbody)
STRING,
INT limit=1000,
INT limit_recursion=1000,
BOOL forbody=0
BOOL forbody=0,
BOOL asfilter=0
)
Description
......@@ -154,6 +169,14 @@ Description
`xregex.match_body()`_ method is to be called on the
object.
If the optional ``asfilter`` parameter is true, the vmod
registers itself as a Varnish Fetch Processor (VFP) for use in
`beresp.filters`_ and as a Varnish Delivery Processor (VDP)
for use in `resp.filters`_. In this setup, the
`xregex.substitute_match()`_ and `xregex.substitute_all()`_
methods can be used to define replacements for matches on the
body.
Example
``new myregex = re.regex("\bmax-age\s*=\s*(\d+)");``
......@@ -227,6 +250,8 @@ Description
should first be cached by calling
``std.cache_req_body(<size>)``.
Lookarounds are not supported.
Example::
sub vcl_init {
......@@ -288,9 +313,94 @@ Description
is emitted to the Varnish log using the ``VCL_Error`` tag, and
the fallback string is returned.
Lookarounds are not supported.
Example
``set beresp.ttl = std.duration(myregex.backref(1, "120"), 120s);``
.. _xregex.substitute_match():
VOID xregex.substitute_match(INT, STRING)
-----------------------------------------
Description
This method defines substitutions for regular expression
replacement ("regsub") operations on HTTP bodies.
It can only be used on `re.regex()`_ objects initiated with
the ``asfilter`` argument set to ``true``, or a VCL failure
will be triggered.
The INT argument defines to which match the substitution is to
be applied: For ``1``, it applies to the first match, for
``2`` to the second etc. A value of ``0`` defines the default
substitution which is applied if a specific substitution is
not defined. Negative values trigger a VCL failure.
If no substitution is defined for a match (and there is no
default), the matched sub-string is left unchanged.
The STRING argument defines the substitution to apply, exactly
like the ``sub`` (third) argument of the `regsub()`_ built-in
VCL function: ``\0`` (which can also be spelled ``\&``) is
replaced with the entire matched string, and ``\n`` is
replaced with the contents of subgroup *n* in the matched
string.
To have any effect, the regex object must be used as a fetch
or delivery filter.
Example
For occurrences of the string "reiher" in the response body,
replace the first with "czapla", the second with "eier" and
all others with "heron". The response is returned uncompressed
even if the client supported compression because there
currently is no ``gzip`` VDP in Varnish-Cache::
sub vcl_init {
new reiher = re.regex("r(ei)h(er)", asfilter = true);
}
sub vcl_deliver {
unset req.http.Accept-Encoding;
set resp.filters += " reiher";
reiher.substitute_match(1, "czapla");
reiher.substitute_match(2, "\1\2");
reiher.substitute_match(0, "heron");
}
.. _xregex.substitute_all():
VOID xregex.substitute_all(STRING)
----------------------------------
Description
This method instructs the named filter object to replace all
matches with the STRING argument.
It is a shorthand for calling::
xregex.clear_substitutions();
xregex.substitute_match(0, STRING);
See `xregex.substitute_match()`_ for when to use this method.
.. _xregex.clear_substitutions():
VOID xregex.clear_substitutions()
---------------------------------
Description
This method clears all previous substitution definions through
`xregex.substitute_match()`_ and `xregex.substitute_all()`_.
It is not required because VCL code could always be written
sucht hat only one code patch ever calls
`xregex.substitute_match()`_ and `xregex.substitute_all()`_,
but it is provided to allow for simpler VCL for handling
exceptional cases.
See `xregex.substitute_match()`_ for when to use this method.
.. _re.match_dyn():
BOOL match_dyn(STRING, STRING, INT limit, INT limit_recursion)
......
......@@ -7,7 +7,8 @@ libvmod_re_la_LDFLAGS = -module -export-dynamic -avoid-version -shared
libvmod_re_la_SOURCES = \
vcc_if.c \
vcc_if.h \
vmod_re.c
vmod_re.c \
rvb.h
vcc_if.c: vcc_if.h
......
......@@ -4,7 +4,11 @@
-esym(818, task) // Parameter could be const
-esym(793, pcre2_*)
-emacro(160, container_of)
-emacro(826, container_of)
// must always be included to ensure sanity
-efile(766, config.h)
-efile(537, config.h)
-efile(451, config.h)
-e793
\ No newline at end of file
/*-
* Copyright 2023 UPLEX Nils Goroll Systemoptimierung
* All rights reserved
*
* Author: Nils Goroll <nils.goroll@uplex.de>
*
* Redistribution and use in source and binary forms, with or without
* modification, are permitted provided that the following conditions
* are met:
* 1. Redistributions of source code must retain the above copyright
* notice, this list of conditions and the following disclaimer.
* 2. Redistributions in binary form must reproduce the above copyright
* notice, this list of conditions and the following disclaimer in the
* documentation and/or other materials provided with the distribution.
*
* THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND
* ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
* IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
* ARE DISCLAIMED. IN NO EVENT SHALL AUTHOR OR CONTRIBUTORS BE LIABLE
* FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
* DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
* OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
* HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
* LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
* OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
* SUCH DAMAGE.
*
* RVB: Re Vfp Buffer
*
* stay sane managing pointers on buffers
*
*
* +---------------- (p)ointer: allocation start
* | +------------ (r)ead pointer
* | | +-------- (w)rite pointer - new data here
* | | | +-- (l)imit: byte after buffer
* p r w l
* v v v v
* ~~~~~~~~ returned from pull
* ====xxxx------
*/
#include "config.h"
struct rvb {
unsigned magic;
#define RVB_MAGIC 0x1f6f0031
unsigned flags;
#define RVB_F_WR 1 // is writable
#define RVB_F_MALLOC (1<<1)
#define RVB_F_STABLE (1<<2) // remaining across calls like VDP_NULL
#define RVB_F_END (1<<3)
union {
struct {
const char *p;
const char *r;
const char *w;
const char *l;
} ro;
struct {
char *p;
const char *r;
char *w;
const char *l;
} wr;
} u;
};
static inline void
rvb_assert(const struct rvb *b)
{
CHECK_OBJ_NOTNULL(b, RVB_MAGIC);
assert(b->u.ro.p && b->u.ro.l);
assert(b->u.ro.r >= b->u.ro.p);
assert(b->u.ro.r <= b->u.ro.w);
assert(b->u.ro.w >= b->u.ro.r);
assert(b->u.ro.w <= b->u.ro.l);
}
static inline enum vfp_status
rvb_vfp_status(const struct rvb *b)
{
rvb_assert(b);
return ((b->flags & RVB_F_END) ? VFP_END : VFP_OK);
}
#define RVB_INIT(ROWR, b, p, len) do { \
AN(b); \
AN(p); \
\
AZ(b->u.ROWR.p || b->u.ROWR.r || \
b->u.ROWR.w || b->u.ROWR.l); \
AZ(b->magic || b->flags); \
\
b->magic = RVB_MAGIC; \
b->u.ROWR.p = p; \
b->u.ROWR.r = p; \
b->u.ROWR.w = p; \
b->u.ROWR.l = b->u.ROWR.r + len; \
} while(0)
// init for available data
static inline void
rvb_init_ro(struct rvb *b, const void *p, ssize_t len)
{
RVB_INIT(ro, b, p, len);
b->u.ro.w = b->u.ro.l;
}
static inline void
rvb_init_wr(struct rvb *b, void *p, ssize_t len)
{
RVB_INIT(wr, b, p, len);
b->flags |= RVB_F_WR;
}
static inline void
rvb_reset(struct rvb *b)
{
rvb_assert(b);
b->u.ro.r = b->u.ro.p;
b->u.ro.w = b->u.ro.p;
}
static inline int
rvb_is(const struct rvb *b)
{
return (b != NULL && b->magic != 0);
}
static void
rvb_fini(struct rvb *b)
{
rvb_assert(b);
if (b->flags & RVB_F_MALLOC)
free(b->u.wr.p);
memset(b, 0, sizeof *b);
}
static inline const char *
rvb_r(const struct rvb *b)
{
return (b->u.ro.r);
}
/* read length available */
static inline size_t
rvb_rl(const struct rvb *b)
{
return (b->u.ro.w - b->u.ro.r);
}
/* write length available */
static inline size_t
rvb_wl(const struct rvb *b)
{
assert(b->flags & RVB_F_WR);
return (b->u.wr.l - b->u.wr.w);
}
static inline char *
rvb_w(const struct rvb *b)
{
assert(b->flags & RVB_F_WR);
return (b->u.wr.w);
}
/* total size */
static inline size_t
rvb_size(const struct rvb *b)
{
return (b->u.ro.l - b->u.ro.p);
}
static void
rvb_grow(struct rvb *b, size_t l)
{
const char *o;
rvb_assert(b);
assert(b->flags & RVB_F_MALLOC);
assert(b->flags & RVB_F_WR);
if (l == 0)
l = rvb_size(b) << 1;
else
l += rvb_size(b);
o = b->u.ro.p;
b->u.wr.p = realloc(b->u.wr.p, l);
AN(b->u.ro.p);
b->u.ro.l = b->u.ro.p + l;
if (b->u.ro.p == o)
return;
// realloc has relocated
b->u.ro.r = b->u.ro.p + (b->u.ro.r - o);
b->u.wr.w = b->u.wr.p + (b->u.ro.w - o);
}
static void
rvb_alloc(struct rvb *b, size_t l)
{
memset(b, 0, sizeof *b);
rvb_init_wr(b, malloc(l), l);
memset(b->u.wr.p, 0, l);
b->flags |= RVB_F_MALLOC;
}
static inline int
rvb_cangrow(const struct rvb *b)
{
return (b->flags & RVB_F_MALLOC);
}
// mark read up to ->u.ro.w
static inline void
rvb_consume_all(struct rvb *b)
{
rvb_assert(b);
b->u.ro.r = b->u.ro.w;
}
static inline void
rvb_consume(struct rvb *b, size_t n)
{
rvb_assert(b);
b->u.ro.r += n;
assert(b->u.ro.r <= b->u.ro.w);
}
// forget unread: reset w to r
static inline void
rvb_forget(struct rvb *b)
{
rvb_assert(b);
b->u.ro.w = (b->u.ro.p) + (b->u.ro.r - b->u.ro.p);
}
// append s to d
static inline void
rvb_append(struct rvb *d, struct rvb *s)
{
rvb_assert(d);
rvb_assert(s);
AZ(d->flags & RVB_F_END);
//lint -e{666}
size_t l = vmin(rvb_wl(d), rvb_rl(s));
memcpy(rvb_w(d), rvb_r(s), l);
d->u.wr.w += l;
s->u.ro.r += l;
if (rvb_rl(s) == 0 && s->flags & RVB_F_END)
d->flags |= RVB_F_END;
}
// transfer unread and END flag from s to d, grow if needed
// (so d must be MALLOC)
static inline void
rvb_transfer(struct rvb *d, struct rvb *s, size_t hint)
{
size_t rl;
rvb_assert(s);
rl = rvb_rl(s);
if (hint < rl)
hint = rl;
if (! rvb_is(d))
rvb_alloc(d, hint);
else if (rvb_wl(d) < rl)
rvb_grow(d, hint - rvb_wl(d));
else
rvb_assert(d);
AZ(d->flags & RVB_F_END);
AN(d->flags & RVB_F_MALLOC);
if (s->flags & RVB_F_END) {
d->flags |= RVB_F_END;
s->flags &= ~RVB_F_END;
}
assert(rvb_wl(d) >= rl);
memcpy(rvb_w(d), rvb_r(s), rl);
d->u.wr.w += rl;
rvb_forget(s);
}
// d references the first n bytes of s
static inline void
rvb_ref_prefix(struct rvb *d, const struct rvb *s, size_t n)
{
AZ(rvb_is(d));
rvb_assert(s);
*d = *s;
d->flags = s->flags & RVB_F_STABLE;
d->u.ro.w = d->u.ro.r + n;
}
// suck into an rvb
static enum vfp_status
rvb_suck(struct rvb *b, struct vfp_ctx *vc)
{
enum vfp_status r;
ssize_t l;
rvb_assert(b);
l = rvb_wl(b);
AN(l);
r = VFP_Suck(vc, rvb_w(b), &l);
b->u.wr.w += l;
if (r == VFP_END)
b->flags |= RVB_F_END;
rvb_assert(b);
return (r);
}
// return from vfp
static enum vfp_status
rvb_ret(const struct rvb *b, const void *p, ssize_t *lp)
{
rvb_assert(b);
AZ(b->flags & RVB_F_MALLOC);
assert(b->u.ro.p == p);
*lp = b->u.ro.w - b->u.ro.p;
return (rvb_vfp_status(b));
}
/* put as much as possible to d.
* if space on d is insufficient and it is malloc'ed,
* grow it
*/
static inline void
rvb_put(struct rvb *d, const char **p, size_t *lp)
{
size_t l, ll;
rvb_assert(d);
AN(p);
AN(*p);
AN(lp);
l = *lp;
if (l == 0)
return;
ll = rvb_wl(d);
if (l > ll && rvb_cangrow(d)) {
rvb_grow(d, l);
ll = rvb_wl(d);
assert(l <= ll);
}
else if (l > ll)
l = ll;
memcpy(rvb_w(d), *p, l);
d->u.wr.w += l;
rvb_assert(d);
(*p) += l;
*lp -= l;
}
/* put as much as possible to d, and the rest to (o)verrun.
* o will be alloc'ed if necessary, if already alloc'ed, it
* must be malloc
*/
static void
rvb_puto(struct rvb *d, struct rvb *o, const char *p, size_t l)
{
AN(d);
AN(o);
AN(p);
if (l == 0)
return;
rvb_put(d, &p, &l);
if (l == 0)
return;
if (rvb_is(o))
assert(rvb_cangrow(o));
else
rvb_alloc(o, l);
rvb_put(o, &p, &l);
AZ(l);
}
static inline void
rvb_take(struct rvb *d, struct rvb *s)
{
assert(! rvb_is(d));
*d = *s;
memset(s, 0, sizeof *s);
}
/* unclear: would it pay off to only memmove if the p..r distance
* is above a certain value?
*
* my guess is: no, because branch prediction and because caches.
*
* but this would need a benchmark.
*
*/
static void
rvb_compact(struct rvb *b)
{
size_t rl;
rvb_assert(b);
AN(b->flags & RVB_F_WR);
rl = rvb_rl(b);
if (rl == 0)
return;
memmove(b->u.wr.p, rvb_r(b), rl);
b->u.wr.r = b->u.wr.p;
b->u.wr.w = b->u.wr.p + rl;
}
static int
rvb_vdp_bytes(struct rvb *b, struct vdp_ctx *vdx)
{
enum vdp_action act;
int r;
if (b->flags & RVB_F_END)
act = VDP_END;
else if (b->flags & RVB_F_STABLE)
act = VDP_NULL;
else
act = VDP_FLUSH;
r = VDP_bytes(vdx, act, rvb_r(b), rvb_rl(b));
rvb_fini(b);
return (r);
}
varnishtest "beresp.filter (VFP)"
server s1 {
rxreq
expect req.url == "/cl"
txresp -body "reiherreiherreiherreiherreiherreiher"
rxreq
expect req.url == "/cl"
txresp -body " reiher reiher reiher reiher reiher reiher "
rxreq
expect req.url == "/nomatch"
txresp -bodylen 8192
rxreq
expect req.url == "/chunked"
txresp -nolen -hdr "Transfer-encoding: chunked"
# 1
chunked "rei"
chunked "he"
chunked "r"
# 2
chunked "reihe"
# 2/3
chunked "rrei"
# 3/4
chunked "herrei"
chunked "her"
#5/6
chunked "reiherreiher"
chunkedlen 0
} -start
varnish v1 -arg "-p debug=+processors" -vcl+backend {
import re from "${vmod_topbuild}/src/.libs/libvmod_re.so";
import std;
sub vcl_init {
new reiher = re.regex("r(ei)h(er)", asfilter = true);
}
sub vcl_recv {
return (pass);
}
sub vcl_backend_response {
set beresp.filters = "reiher";
reiher.substitute_match(0, "heron");
reiher.substitute_match(1, "czapla");
reiher.substitute_match(2, "\1\2");
}
} -start
client c1 {
txreq -url "/cl"
rxresp
expect resp.status == 200
expect resp.body == "czaplaeierheronheronheronheron"
txreq -url "/cl"
rxresp
expect resp.status == 200
expect resp.body == " czapla eier heron heron heron heron "
txreq -url "/nomatch"
rxresp
expect resp.status == 200
expect resp.bodylen == 8192
txreq -url "/chunked"
rxresp
expect resp.status == 200
expect resp.body == "czaplaeierheronheronheronheron"
} -run
varnishtest "beresp.filter (VFP) same as c12 but no default"
server s1 {
rxreq
expect req.url == "/cl"
txresp -body "reiherreiherreiherreiherreiherreiher"
rxreq
expect req.url == "/cl"
txresp -body " reiher reiher reiher reiher reiher reiher "
rxreq
expect req.url == "/nomatch"
txresp -bodylen 8192
rxreq
expect req.url == "/chunked"
txresp -nolen -hdr "Transfer-encoding: chunked"
# 1
chunked "rei"
chunked "he"
chunked "r"
# 2
chunked "reihe"
# 2/3
chunked "rrei"
# 3/4
chunked "herrei"
chunked "her"
#5/6
chunked "reiherreiher"
chunkedlen 0
} -start
varnish v1 -arg "-p debug=+processors" -vcl+backend {
import re from "${vmod_topbuild}/src/.libs/libvmod_re.so";
import std;
sub vcl_init {
new reiher = re.regex("r(ei)h(er)", asfilter = true);
}
sub vcl_recv {
return (pass);
}
sub vcl_backend_response {
set beresp.filters = "reiher";
reiher.substitute_match(1, "czapla");
reiher.substitute_match(2, "\1\2");
# reiher.substitute_match(0, "heron");
}
} -start
client c1 {
txreq -url "/cl"
rxresp
expect resp.status == 200
expect resp.body == "czaplaeierreiherreiherreiherreiher"
txreq -url "/cl"
rxresp
expect resp.status == 200
expect resp.body == " czapla eier reiher reiher reiher reiher "
txreq -url "/nomatch"
rxresp
expect resp.status == 200
expect resp.bodylen == 8192
txreq -url "/chunked"
rxresp
expect resp.status == 200
expect resp.body == "czaplaeierreiherreiherreiherreiher"
} -run
varnishtest "resp.filter (VDP)"
server s1 {
rxreq
expect req.url == "/cl"
txresp -body "reiherreiherreiherreiherreiherreiher"
rxreq
expect req.url == "/cl"
txresp -body " reiher reiher reiher reiher reiher reiher "
rxreq
expect req.url == "/nomatch"
txresp -bodylen 8192
rxreq
expect req.url == "/chunked"
txresp -nolen -hdr "Transfer-encoding: chunked"
# 1
chunked "rei"
chunked "he"
chunked "r"
# 2
chunked "reihe"
# 2/3
chunked "rrei"
# 3/4
chunked "herrei"
chunked "her"
#5/6
chunked "reiherreiher"
chunkedlen 0
} -start
varnish v1 -arg "-p debug=+processors" -vcl+backend {
import re from "${vmod_topbuild}/src/.libs/libvmod_re.so";
import std;
sub vcl_init {
new reiher = re.regex("r(ei)h(er)", asfilter = true);
}
sub vcl_recv {
return (pass);
}
sub vcl_backend_response {
set beresp.do_gzip = true;
}
sub vcl_deliver {
unset req.http.Accept-Encoding;
set resp.filters += " reiher";
reiher.substitute_match(2, "\1\2");
reiher.substitute_match(1, "czapla");
reiher.substitute_match(0, "heron");
}
} -start
client c1 {
txreq -url "/cl"
rxresp
expect resp.status == 200
expect resp.body == "czaplaeierheronheronheronheron"
txreq -url "/cl"
rxresp
expect resp.status == 200
expect resp.body == " czapla eier heron heron heron heron "
txreq -url "/nomatch"
rxresp
expect resp.status == 200
expect resp.bodylen == 8192
txreq -url "/chunked"
rxresp
expect resp.status == 200
expect resp.body == "czaplaeierheronheronheronheron"
} -run
varnishtest "resp.filter (VDP) same as c14 but no default"
server s1 {
rxreq
expect req.url == "/cl"
txresp -body "reiherreiherreiherreiherreiherreiher"
rxreq
expect req.url == "/cl"
txresp -body " reiher reiher reiher reiher reiher reiher "
rxreq
expect req.url == "/nomatch"
txresp -bodylen 8192
rxreq
expect req.url == "/chunked"
txresp -nolen -hdr "Transfer-encoding: chunked"
# 1
chunked "rei"
chunked "he"
chunked "r"
# 2
chunked "reihe"
# 2/3
chunked "rrei"
# 3/4
chunked "herrei"
chunked "her"
#5/6
chunked "reiherreiher"
chunkedlen 0
} -start
varnish v1 -arg "-p debug=+processors" -vcl+backend {
import re from "${vmod_topbuild}/src/.libs/libvmod_re.so";
import std;
sub vcl_init {
new reiher = re.regex("r(ei)h(er)", asfilter = true);
}
sub vcl_recv {
return (pass);
}
sub vcl_deliver {
set resp.filters = "reiher";
reiher.substitute_match(1, "czapla");
reiher.substitute_match(2, "\1\2");
# reiher.substitute_match(0, "heron");
}
} -start
client c1 {
txreq -url "/cl"
rxresp
expect resp.status == 200
expect resp.body == "czaplaeierreiherreiherreiherreiher"
txreq -url "/cl"
rxresp
expect resp.status == 200
expect resp.body == " czapla eier reiher reiher reiher reiher "
txreq -url "/nomatch"
rxresp
expect resp.status == 200
expect resp.bodylen == 8192
txreq -url "/chunked"
rxresp
expect resp.status == 200
expect resp.body == "czaplaeierreiherreiherreiherreiher"
} -run
This diff is collapsed.
......@@ -27,6 +27,12 @@ SYNOPSIS
BOOL <obj>.match_body(req_body | bereq_body | resp_body
[, INT limit] [, INT limit_recursion])
# filter interface (includes all of the above)
new <obj> = re.regex(STRING [, INT limit] [, INT limit_recursion]
, forbody=true)
<obj>.substitute_match(INT, STRING)
set [be]resp.filters = "<obj>"
# function interface
BOOL re.match_dyn(STRING [, INT limit] [, INT limit_recursion])
STRING re.backref_dyn(INT [, STRING fallback])
......@@ -40,6 +46,10 @@ DESCRIPTION
.. _regsuball(): https://varnish-cache.org/docs/trunk/reference/vcl.html#regsuball-str-regex-sub
.. _beresp.filters: https://varnish-cache.org/docs/trunk/reference/vcl-var.html#beresp-filters
.. _resp.filters: https://varnish-cache.org/docs/trunk/reference/vcl-var.html#resp-filters
Varnish Module (VMOD) for matching strings against regular expressions,
and for extracting captured substrings after matches.
......@@ -92,6 +102,10 @@ the ``vcl_backend_*`` subroutines and returns ``true``, then
subsequent calls to ``backref`` in the same backend scope extract
substrings from the matched substring.
By setting the ``asfilter`` parameter to true, a regex object can also
be configured to add a filter for performing substitutions on
bodies. See `xregex.substitute_match()`_ for details and examples.
The VMOD also supports dynamic regex matching with the ``match_dyn``
and ``backref_dyn`` functions::
......@@ -119,7 +133,8 @@ since it re-uses the compiled expression obtained at VCL
initialization. So if you are matching against a fixed pattern that
never changes during the lifetime of VCL, use ``match``.
$Object regex(STRING, INT limit=1000, INT limit_recursion=1000, BOOL forbody=0)
$Object regex(STRING, INT limit=1000, INT limit_recursion=1000, BOOL
forbody=0, BOOL asfilter=0)
Description
Create a regex object with the given regular expression. The
......@@ -138,6 +153,14 @@ Description
`xregex.match_body()`_ method is to be called on the
object.
If the optional ``asfilter`` parameter is true, the vmod
registers itself as a Varnish Fetch Processor (VFP) for use in
`beresp.filters`_ and as a Varnish Delivery Processor (VDP)
for use in `resp.filters`_. In this setup, the
`xregex.substitute_match()`_ and `xregex.substitute_all()`_
methods can be used to define replacements for matches on the
body.
Example
``new myregex = re.regex("\bmax-age\s*=\s*(\d+)");``
......@@ -194,6 +217,8 @@ Description
should first be cached by calling
``std.cache_req_body(<size>)``.
Lookarounds are not supported.
Example::
sub vcl_init {
......@@ -245,9 +270,85 @@ Description
is emitted to the Varnish log using the ``VCL_Error`` tag, and
the fallback string is returned.
Lookarounds are not supported.
Example
``set beresp.ttl = std.duration(myregex.backref(1, "120"), 120s);``
$Method VOID .substitute_match(INT, STRING)
Description
This method defines substitutions for regular expression
replacement ("regsub") operations on HTTP bodies.
It can only be used on `re.regex()`_ objects initiated with
the ``asfilter`` argument set to ``true``, or a VCL failure
will be triggered.
The INT argument defines to which match the substitution is to
be applied: For ``1``, it applies to the first match, for
``2`` to the second etc. A value of ``0`` defines the default
substitution which is applied if a specific substitution is
not defined. Negative values trigger a VCL failure.
If no substitution is defined for a match (and there is no
default), the matched sub-string is left unchanged.
The STRING argument defines the substitution to apply, exactly
like the ``sub`` (third) argument of the `regsub()`_ built-in
VCL function: ``\0`` (which can also be spelled ``\&``) is
replaced with the entire matched string, and ``\n`` is
replaced with the contents of subgroup *n* in the matched
string.
To have any effect, the regex object must be used as a fetch
or delivery filter.
Example
For occurrences of the string "reiher" in the response body,
replace the first with "czapla", the second with "eier" and
all others with "heron". The response is returned uncompressed
even if the client supported compression because there
currently is no ``gzip`` VDP in Varnish-Cache::
sub vcl_init {
new reiher = re.regex("r(ei)h(er)", asfilter = true);
}
sub vcl_deliver {
unset req.http.Accept-Encoding;
set resp.filters += " reiher";
reiher.substitute_match(1, "czapla");
reiher.substitute_match(2, "\1\2");
reiher.substitute_match(0, "heron");
}
$Method VOID .substitute_all(STRING)
Description
This method instructs the named filter object to replace all
matches with the STRING argument.
It is a shorthand for calling::
xregex.clear_substitutions();
xregex.substitute_match(0, STRING);
See `xregex.substitute_match()`_ for when to use this method.
$Method VOID .clear_substitutions()
Description
This method clears all previous substitution definions through
`xregex.substitute_match()`_ and `xregex.substitute_all()`_.
It is not required because VCL code could always be written
sucht hat only one code patch ever calls
`xregex.substitute_match()`_ and `xregex.substitute_all()`_,
but it is provided to allow for simpler VCL for handling
exceptional cases.
See `xregex.substitute_match()`_ for when to use this method.
$Function BOOL match_dyn(PRIV_TASK, STRING, STRING,
INT limit=1000, INT limit_recursion=1000)
......
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment