Commit c9636944 authored by Geoff Simmons's avatar Geoff Simmons

Merge branch 'filter' into 'master'

Add filters: regex objects to perform substitutions on bodies

See merge request !3
parents 2c683cd4 a25e5b63
......@@ -31,6 +31,12 @@ SYNOPSIS
BOOL <obj>.match_body(req_body | bereq_body | resp_body
[, INT limit] [, INT limit_recursion])
# filter interface (includes all of the above)
new <obj> = re.regex(STRING [, INT limit] [, INT limit_recursion]
, forbody=true)
<obj>.substitute_match(INT, STRING)
set [be]resp.filters = "<obj>"
# function interface
BOOL re.match_dyn(STRING [, INT limit] [, INT limit_recursion])
STRING re.backref_dyn(INT [, STRING fallback])
......@@ -44,6 +50,10 @@ DESCRIPTION
.. _regsuball(): https://varnish-cache.org/docs/trunk/reference/vcl.html#regsuball-str-regex-sub
.. _beresp.filters: https://varnish-cache.org/docs/trunk/reference/vcl-var.html#beresp-filters
.. _resp.filters: https://varnish-cache.org/docs/trunk/reference/vcl-var.html#resp-filters
Varnish Module (VMOD) for matching strings against regular expressions,
and for extracting captured substrings after matches.
......@@ -96,6 +106,10 @@ the ``vcl_backend_*`` subroutines and returns ``true``, then
subsequent calls to ``backref`` in the same backend scope extract
substrings from the matched substring.
By setting the ``asfilter`` parameter to true, a regex object can also
be configured to add a filter for performing substitutions on
bodies. See `xregex.substitute_match()`_ for details and examples.
The VMOD also supports dynamic regex matching with the ``match_dyn``
and ``backref_dyn`` functions::
......@@ -125,8 +139,8 @@ never changes during the lifetime of VCL, use ``match``.
.. _re.regex():
new xregex = re.regex(STRING, INT limit, INT limit_recursion, BOOL forbody)
---------------------------------------------------------------------------
new xregex = re.regex(STRING, INT limit, INT limit_recursion, BOOL forbody, BOOL asfilter)
------------------------------------------------------------------------------------------
::
......@@ -134,7 +148,8 @@ new xregex = re.regex(STRING, INT limit, INT limit_recursion, BOOL forbody)
STRING,
INT limit=1000,
INT limit_recursion=1000,
BOOL forbody=0
BOOL forbody=0,
BOOL asfilter=0
)
Description
......@@ -154,6 +169,14 @@ Description
`xregex.match_body()`_ method is to be called on the
object.
If the optional ``asfilter`` parameter is true, the vmod
registers itself as a Varnish Fetch Processor (VFP) for use in
`beresp.filters`_ and as a Varnish Delivery Processor (VDP)
for use in `resp.filters`_. In this setup, the
`xregex.substitute_match()`_ and `xregex.substitute_all()`_
methods can be used to define replacements for matches on the
body.
Example
``new myregex = re.regex("\bmax-age\s*=\s*(\d+)");``
......@@ -227,6 +250,8 @@ Description
should first be cached by calling
``std.cache_req_body(<size>)``.
Lookarounds are not supported.
Example::
sub vcl_init {
......@@ -288,9 +313,94 @@ Description
is emitted to the Varnish log using the ``VCL_Error`` tag, and
the fallback string is returned.
Lookarounds are not supported.
Example
``set beresp.ttl = std.duration(myregex.backref(1, "120"), 120s);``
.. _xregex.substitute_match():
VOID xregex.substitute_match(INT, STRING)
-----------------------------------------
Description
This method defines substitutions for regular expression
replacement ("regsub") operations on HTTP bodies.
It can only be used on `re.regex()`_ objects initiated with
the ``asfilter`` argument set to ``true``, or a VCL failure
will be triggered.
The INT argument defines to which match the substitution is to
be applied: For ``1``, it applies to the first match, for
``2`` to the second etc. A value of ``0`` defines the default
substitution which is applied if a specific substitution is
not defined. Negative values trigger a VCL failure.
If no substitution is defined for a match (and there is no
default), the matched sub-string is left unchanged.
The STRING argument defines the substitution to apply, exactly
like the ``sub`` (third) argument of the `regsub()`_ built-in
VCL function: ``\0`` (which can also be spelled ``\&``) is
replaced with the entire matched string, and ``\n`` is
replaced with the contents of subgroup *n* in the matched
string.
To have any effect, the regex object must be used as a fetch
or delivery filter.
Example
For occurrences of the string "reiher" in the response body,
replace the first with "czapla", the second with "eier" and
all others with "heron". The response is returned uncompressed
even if the client supported compression because there
currently is no ``gzip`` VDP in Varnish-Cache::
sub vcl_init {
new reiher = re.regex("r(ei)h(er)", asfilter = true);
}
sub vcl_deliver {
unset req.http.Accept-Encoding;
set resp.filters += " reiher";
reiher.substitute_match(1, "czapla");
reiher.substitute_match(2, "\1\2");
reiher.substitute_match(0, "heron");
}
.. _xregex.substitute_all():
VOID xregex.substitute_all(STRING)
----------------------------------
Description
This method instructs the named filter object to replace all
matches with the STRING argument.
It is a shorthand for calling::
xregex.clear_substitutions();
xregex.substitute_match(0, STRING);
See `xregex.substitute_match()`_ for when to use this method.
.. _xregex.clear_substitutions():
VOID xregex.clear_substitutions()
---------------------------------
Description
This method clears all previous substitution definions through
`xregex.substitute_match()`_ and `xregex.substitute_all()`_.
It is not required because VCL code could always be written
sucht hat only one code patch ever calls
`xregex.substitute_match()`_ and `xregex.substitute_all()`_,
but it is provided to allow for simpler VCL for handling
exceptional cases.
See `xregex.substitute_match()`_ for when to use this method.
.. _re.match_dyn():
BOOL match_dyn(STRING, STRING, INT limit, INT limit_recursion)
......
......@@ -7,7 +7,8 @@ libvmod_re_la_LDFLAGS = -module -export-dynamic -avoid-version -shared
libvmod_re_la_SOURCES = \
vcc_if.c \
vcc_if.h \
vmod_re.c
vmod_re.c \
rvb.h
vcc_if.c: vcc_if.h
......
......@@ -4,7 +4,11 @@
-esym(818, task) // Parameter could be const
-esym(793, pcre2_*)
-emacro(160, container_of)
-emacro(826, container_of)
// must always be included to ensure sanity
-efile(766, config.h)
-efile(537, config.h)
-efile(451, config.h)
-e793
\ No newline at end of file
/*-
* Copyright 2023 UPLEX Nils Goroll Systemoptimierung
* All rights reserved
*
* Author: Nils Goroll <nils.goroll@uplex.de>
*
* Redistribution and use in source and binary forms, with or without
* modification, are permitted provided that the following conditions
* are met:
* 1. Redistributions of source code must retain the above copyright
* notice, this list of conditions and the following disclaimer.
* 2. Redistributions in binary form must reproduce the above copyright
* notice, this list of conditions and the following disclaimer in the
* documentation and/or other materials provided with the distribution.
*
* THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND
* ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
* IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
* ARE DISCLAIMED. IN NO EVENT SHALL AUTHOR OR CONTRIBUTORS BE LIABLE
* FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
* DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
* OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
* HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
* LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
* OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
* SUCH DAMAGE.
*
* RVB: Re Vfp Buffer
*
* stay sane managing pointers on buffers
*
*
* +---------------- (p)ointer: allocation start
* | +------------ (r)ead pointer
* | | +-------- (w)rite pointer - new data here
* | | | +-- (l)imit: byte after buffer
* p r w l
* v v v v
* ~~~~~~~~ returned from pull
* ====xxxx------
*/
#include "config.h"
struct rvb {
unsigned magic;
#define RVB_MAGIC 0x1f6f0031
unsigned flags;
#define RVB_F_WR 1 // is writable
#define RVB_F_MALLOC (1<<1)
#define RVB_F_STABLE (1<<2) // remaining across calls like VDP_NULL
#define RVB_F_END (1<<3)
union {
struct {
const char *p;
const char *r;
const char *w;
const char *l;
} ro;
struct {
char *p;
const char *r;
char *w;
const char *l;
} wr;
} u;
};
static inline void
rvb_assert(const struct rvb *b)
{
CHECK_OBJ_NOTNULL(b, RVB_MAGIC);
assert(b->u.ro.p && b->u.ro.l);
assert(b->u.ro.r >= b->u.ro.p);
assert(b->u.ro.r <= b->u.ro.w);
assert(b->u.ro.w >= b->u.ro.r);
assert(b->u.ro.w <= b->u.ro.l);
}
static inline enum vfp_status
rvb_vfp_status(const struct rvb *b)
{
rvb_assert(b);
return ((b->flags & RVB_F_END) ? VFP_END : VFP_OK);
}
#define RVB_INIT(ROWR, b, p, len) do { \
AN(b); \
AN(p); \
\
AZ(b->u.ROWR.p || b->u.ROWR.r || \
b->u.ROWR.w || b->u.ROWR.l); \
AZ(b->magic || b->flags); \
\
b->magic = RVB_MAGIC; \
b->u.ROWR.p = p; \
b->u.ROWR.r = p; \
b->u.ROWR.w = p; \
b->u.ROWR.l = b->u.ROWR.r + len; \
} while(0)
// init for available data
static inline void
rvb_init_ro(struct rvb *b, const void *p, ssize_t len)
{
RVB_INIT(ro, b, p, len);
b->u.ro.w = b->u.ro.l;
}
static inline void
rvb_init_wr(struct rvb *b, void *p, ssize_t len)
{
RVB_INIT(wr, b, p, len);
b->flags |= RVB_F_WR;
}
static inline void
rvb_reset(struct rvb *b)
{
rvb_assert(b);
b->u.ro.r = b->u.ro.p;
b->u.ro.w = b->u.ro.p;
}
static inline int
rvb_is(const struct rvb *b)
{
return (b != NULL && b->magic != 0);
}
static void
rvb_fini(struct rvb *b)
{
rvb_assert(b);
if (b->flags & RVB_F_MALLOC)
free(b->u.wr.p);
memset(b, 0, sizeof *b);
}
static inline const char *
rvb_r(const struct rvb *b)
{
return (b->u.ro.r);
}
/* read length available */
static inline size_t
rvb_rl(const struct rvb *b)
{
return (b->u.ro.w - b->u.ro.r);
}
/* write length available */
static inline size_t
rvb_wl(const struct rvb *b)
{
assert(b->flags & RVB_F_WR);
return (b->u.wr.l - b->u.wr.w);
}
static inline char *
rvb_w(const struct rvb *b)
{
assert(b->flags & RVB_F_WR);
return (b->u.wr.w);
}
/* total size */
static inline size_t
rvb_size(const struct rvb *b)
{
return (b->u.ro.l - b->u.ro.p);
}
static void
rvb_grow(struct rvb *b, size_t l)
{
const char *o;
rvb_assert(b);
assert(b->flags & RVB_F_MALLOC);
assert(b->flags & RVB_F_WR);
if (l == 0)
l = rvb_size(b) << 1;
else
l += rvb_size(b);
o = b->u.ro.p;
b->u.wr.p = realloc(b->u.wr.p, l);
AN(b->u.ro.p);
b->u.ro.l = b->u.ro.p + l;
if (b->u.ro.p == o)
return;
// realloc has relocated
b->u.ro.r = b->u.ro.p + (b->u.ro.r - o);
b->u.wr.w = b->u.wr.p + (b->u.ro.w - o);
}
static void
rvb_alloc(struct rvb *b, size_t l)
{
memset(b, 0, sizeof *b);
rvb_init_wr(b, malloc(l), l);
memset(b->u.wr.p, 0, l);
b->flags |= RVB_F_MALLOC;
}
static inline int
rvb_cangrow(const struct rvb *b)
{
return (b->flags & RVB_F_MALLOC);
}
// mark read up to ->u.ro.w
static inline void
rvb_consume_all(struct rvb *b)
{
rvb_assert(b);
b->u.ro.r = b->u.ro.w;
}
static inline void
rvb_consume(struct rvb *b, size_t n)
{
rvb_assert(b);
b->u.ro.r += n;
assert(b->u.ro.r <= b->u.ro.w);
}
// forget unread: reset w to r
static inline void
rvb_forget(struct rvb *b)
{
rvb_assert(b);
b->u.ro.w = (b->u.ro.p) + (b->u.ro.r - b->u.ro.p);
}
// append s to d
static inline void
rvb_append(struct rvb *d, struct rvb *s)
{
rvb_assert(d);
rvb_assert(s);
AZ(d->flags & RVB_F_END);
//lint -e{666}
size_t l = vmin(rvb_wl(d), rvb_rl(s));
memcpy(rvb_w(d), rvb_r(s), l);
d->u.wr.w += l;
s->u.ro.r += l;
if (rvb_rl(s) == 0 && s->flags & RVB_F_END)
d->flags |= RVB_F_END;
}
// transfer unread and END flag from s to d, grow if needed
// (so d must be MALLOC)
static inline void
rvb_transfer(struct rvb *d, struct rvb *s, size_t hint)
{
size_t rl;
rvb_assert(s);
rl = rvb_rl(s);
if (hint < rl)
hint = rl;
if (! rvb_is(d))
rvb_alloc(d, hint);
else if (rvb_wl(d) < rl)
rvb_grow(d, hint - rvb_wl(d));
else
rvb_assert(d);
AZ(d->flags & RVB_F_END);
AN(d->flags & RVB_F_MALLOC);
if (s->flags & RVB_F_END) {
d->flags |= RVB_F_END;
s->flags &= ~RVB_F_END;
}
assert(rvb_wl(d) >= rl);
memcpy(rvb_w(d), rvb_r(s), rl);
d->u.wr.w += rl;
rvb_forget(s);
}
// d references the first n bytes of s
static inline void
rvb_ref_prefix(struct rvb *d, const struct rvb *s, size_t n)
{
AZ(rvb_is(d));
rvb_assert(s);
*d = *s;
d->flags = s->flags & RVB_F_STABLE;
d->u.ro.w = d->u.ro.r + n;
}
// suck into an rvb
static enum vfp_status
rvb_suck(struct rvb *b, struct vfp_ctx *vc)
{
enum vfp_status r;
ssize_t l;
rvb_assert(b);
l = rvb_wl(b);
AN(l);
r = VFP_Suck(vc, rvb_w(b), &l);
b->u.wr.w += l;
if (r == VFP_END)
b->flags |= RVB_F_END;
rvb_assert(b);
return (r);
}
// return from vfp
static enum vfp_status
rvb_ret(const struct rvb *b, const void *p, ssize_t *lp)
{
rvb_assert(b);
AZ(b->flags & RVB_F_MALLOC);
assert(b->u.ro.p == p);
*lp = b->u.ro.w - b->u.ro.p;
return (rvb_vfp_status(b));
}
/* put as much as possible to d.
* if space on d is insufficient and it is malloc'ed,
* grow it
*/
static inline void
rvb_put(struct rvb *d, const char **p, size_t *lp)
{
size_t l, ll;
rvb_assert(d);
AN(p);
AN(*p);
AN(lp);
l = *lp;
if (l == 0)
return;
ll = rvb_wl(d);
if (l > ll && rvb_cangrow(d)) {
rvb_grow(d, l);
ll = rvb_wl(d);
assert(l <= ll);
}
else if (l > ll)
l = ll;
memcpy(rvb_w(d), *p, l);
d->u.wr.w += l;
rvb_assert(d);
(*p) += l;
*lp -= l;
}
/* put as much as possible to d, and the rest to (o)verrun.
* o will be alloc'ed if necessary, if already alloc'ed, it
* must be malloc
*/
static void
rvb_puto(struct rvb *d, struct rvb *o, const char *p, size_t l)
{
AN(d);
AN(o);
AN(p);
if (l == 0)
return;
rvb_put(d, &p, &l);
if (l == 0)
return;
if (rvb_is(o))
assert(rvb_cangrow(o));
else
rvb_alloc(o, l);
rvb_put(o, &p, &l);
AZ(l);
}
static inline void
rvb_take(struct rvb *d, struct rvb *s)
{
assert(! rvb_is(d));
*d = *s;
memset(s, 0, sizeof *s);
}
/* unclear: would it pay off to only memmove if the p..r distance
* is above a certain value?
*
* my guess is: no, because branch prediction and because caches.
*
* but this would need a benchmark.
*
*/
static void
rvb_compact(struct rvb *b)
{
size_t rl;
rvb_assert(b);
AN(b->flags & RVB_F_WR);
rl = rvb_rl(b);
if (rl == 0)
return;
memmove(b->u.wr.p, rvb_r(b), rl);
b->u.wr.r = b->u.wr.p;
b->u.wr.w = b->u.wr.p + rl;
}
static int
rvb_vdp_bytes(struct rvb *b, struct vdp_ctx *vdx)
{
enum vdp_action act;
int r;
if (b->flags & RVB_F_END)
act = VDP_END;
else if (b->flags & RVB_F_STABLE)
act = VDP_NULL;
else
act = VDP_FLUSH;
r = VDP_bytes(vdx, act, rvb_r(b), rvb_rl(b));
rvb_fini(b);
return (r);
}
varnishtest "beresp.filter (VFP)"
server s1 {
rxreq
expect req.url == "/cl"
txresp -body "reiherreiherreiherreiherreiherreiher"
rxreq
expect req.url == "/cl"
txresp -body " reiher reiher reiher reiher reiher reiher "
rxreq
expect req.url == "/nomatch"
txresp -bodylen 8192
rxreq
expect req.url == "/chunked"
txresp -nolen -hdr "Transfer-encoding: chunked"
# 1
chunked "rei"
chunked "he"
chunked "r"
# 2
chunked "reihe"
# 2/3
chunked "rrei"
# 3/4
chunked "herrei"
chunked "her"
#5/6
chunked "reiherreiher"
chunkedlen 0
} -start
varnish v1 -arg "-p debug=+processors" -vcl+backend {
import re from "${vmod_topbuild}/src/.libs/libvmod_re.so";
import std;
sub vcl_init {
new reiher = re.regex("r(ei)h(er)", asfilter = true);
}
sub vcl_recv {
return (pass);
}
sub vcl_backend_response {
set beresp.filters = "reiher";
reiher.substitute_match(0, "heron");
reiher.substitute_match(1, "czapla");
reiher.substitute_match(2, "\1\2");
}
} -start
client c1 {
txreq -url "/cl"
rxresp
expect resp.status == 200
expect resp.body == "czaplaeierheronheronheronheron"
txreq -url "/cl"
rxresp
expect resp.status == 200
expect resp.body == " czapla eier heron heron heron heron "
txreq -url "/nomatch"
rxresp
expect resp.status == 200
expect resp.bodylen == 8192
txreq -url "/chunked"
rxresp
expect resp.status == 200
expect resp.body == "czaplaeierheronheronheronheron"
} -run
varnishtest "beresp.filter (VFP) same as c12 but no default"
server s1 {
rxreq
expect req.url == "/cl"
txresp -body "reiherreiherreiherreiherreiherreiher"
rxreq
expect req.url == "/cl"
txresp -body " reiher reiher reiher reiher reiher reiher "
rxreq
expect req.url == "/nomatch"
txresp -bodylen 8192
rxreq
expect req.url == "/chunked"
txresp -nolen -hdr "Transfer-encoding: chunked"
# 1
chunked "rei"
chunked "he"
chunked "r"
# 2
chunked "reihe"
# 2/3
chunked "rrei"
# 3/4
chunked "herrei"
chunked "her"
#5/6
chunked "reiherreiher"
chunkedlen 0
} -start
varnish v1 -arg "-p debug=+processors" -vcl+backend {
import re from "${vmod_topbuild}/src/.libs/libvmod_re.so";
import std;
sub vcl_init {
new reiher = re.regex("r(ei)h(er)", asfilter = true);
}
sub vcl_recv {
return (pass);
}
sub vcl_backend_response {
set beresp.filters = "reiher";
reiher.substitute_match(1, "czapla");
reiher.substitute_match(2, "\1\2");
# reiher.substitute_match(0, "heron");
}
} -start
client c1 {
txreq -url "/cl"
rxresp
expect resp.status == 200
expect resp.body == "czaplaeierreiherreiherreiherreiher"
txreq -url "/cl"
rxresp
expect resp.status == 200
expect resp.body == " czapla eier reiher reiher reiher reiher "
txreq -url "/nomatch"
rxresp
expect resp.status == 200
expect resp.bodylen == 8192
txreq -url "/chunked"
rxresp
expect resp.status == 200
expect resp.body == "czaplaeierreiherreiherreiherreiher"
} -run
varnishtest "resp.filter (VDP)"
server s1 {
rxreq
expect req.url == "/cl"
txresp -body "reiherreiherreiherreiherreiherreiher"
rxreq
expect req.url == "/cl"
txresp -body " reiher reiher reiher reiher reiher reiher "
rxreq
expect req.url == "/nomatch"
txresp -bodylen 8192
rxreq
expect req.url == "/chunked"
txresp -nolen -hdr "Transfer-encoding: chunked"
# 1
chunked "rei"
chunked "he"
chunked "r"
# 2
chunked "reihe"
# 2/3
chunked "rrei"
# 3/4
chunked "herrei"
chunked "her"
#5/6
chunked "reiherreiher"
chunkedlen 0
} -start
varnish v1 -arg "-p debug=+processors" -vcl+backend {
import re from "${vmod_topbuild}/src/.libs/libvmod_re.so";
import std;
sub vcl_init {
new reiher = re.regex("r(ei)h(er)", asfilter = true);
}
sub vcl_recv {
return (pass);
}
sub vcl_backend_response {
set beresp.do_gzip = true;
}
sub vcl_deliver {
unset req.http.Accept-Encoding;
set resp.filters += " reiher";
reiher.substitute_match(2, "\1\2");
reiher.substitute_match(1, "czapla");
reiher.substitute_match(0, "heron");
}
} -start
client c1 {
txreq -url "/cl"
rxresp
expect resp.status == 200
expect resp.body == "czaplaeierheronheronheronheron"
txreq -url "/cl"
rxresp
expect resp.status == 200
expect resp.body == " czapla eier heron heron heron heron "
txreq -url "/nomatch"
rxresp
expect resp.status == 200
expect resp.bodylen == 8192
txreq -url "/chunked"
rxresp
expect resp.status == 200
expect resp.body == "czaplaeierheronheronheronheron"
} -run
varnishtest "resp.filter (VDP) same as c14 but no default"
server s1 {
rxreq
expect req.url == "/cl"
txresp -body "reiherreiherreiherreiherreiherreiher"
rxreq
expect req.url == "/cl"
txresp -body " reiher reiher reiher reiher reiher reiher "
rxreq
expect req.url == "/nomatch"
txresp -bodylen 8192
rxreq
expect req.url == "/chunked"
txresp -nolen -hdr "Transfer-encoding: chunked"
# 1
chunked "rei"
chunked "he"
chunked "r"
# 2
chunked "reihe"
# 2/3
chunked "rrei"
# 3/4
chunked "herrei"
chunked "her"
#5/6
chunked "reiherreiher"
chunkedlen 0
} -start
varnish v1 -arg "-p debug=+processors" -vcl+backend {
import re from "${vmod_topbuild}/src/.libs/libvmod_re.so";
import std;
sub vcl_init {
new reiher = re.regex("r(ei)h(er)", asfilter = true);
}
sub vcl_recv {
return (pass);
}
sub vcl_deliver {
set resp.filters = "reiher";
reiher.substitute_match(1, "czapla");
reiher.substitute_match(2, "\1\2");
# reiher.substitute_match(0, "heron");
}
} -start
client c1 {
txreq -url "/cl"
rxresp
expect resp.status == 200
expect resp.body == "czaplaeierreiherreiherreiherreiher"
txreq -url "/cl"
rxresp
expect resp.status == 200
expect resp.body == " czapla eier reiher reiher reiher reiher "
txreq -url "/nomatch"
rxresp
expect resp.status == 200
expect resp.bodylen == 8192
txreq -url "/chunked"
rxresp
expect resp.status == 200
expect resp.body == "czaplaeierreiherreiherreiherreiher"
} -run
......@@ -35,12 +35,14 @@
#include <string.h>
#include "cache/cache.h"
#include "cache/cache_filter.h"
#include "vcl.h"
#include "vre.h"
#include "vre_pcre2.h"
#include "vsb.h"
#include "vcc_if.h"
#include "rvb.h"
#if !HAVE_PCRE2_SET_DEPTH_LIMIT
# define pcre2_set_depth_limit(r, d) pcre2_set_recursion_limit(r, d)
......@@ -49,11 +51,26 @@
#define MAX_MATCHES 11
#define MAX_OV ((MAX_MATCHES) * 2)
struct vmod_re_regex;
// lack of priv pointer
// https://github.com/varnishcache/varnish-cache/pull/3912
struct vdp_container {
unsigned magic;
#define VMOD_RE_VDP_MAGIC 0xa16a677f
struct vdp vdp;
struct vmod_re_regex *re;
// lack of VRT_CTX argument to _fini
struct vcl *vcl;
};
struct vmod_re_regex {
unsigned magic;
#define VMOD_RE_REGEX_MAGIC 0x955706ee
vre_t *vre;
struct vre_limits vre_limits;
struct vfp *vfp;
struct vdp_container *vdpc;
};
typedef struct ov_s {
......@@ -63,6 +80,14 @@ typedef struct ov_s {
int ovector[MAX_OV];
} ov_t;
static vfp_init_f re_vfp_init;
static vfp_pull_f re_vfp_pull;
static vfp_fini_f re_vfp_fini;
static vdp_init_f re_vdp_init;
static vdp_bytes_f re_vdp_bytes;
static vdp_fini_f re_vdp_fini;
// varnish-cache pre 7f28888779fd14f99eb34e50f6fb07ea6bbff999
#ifndef NO_VXID
#define NO_VXID (0U)
......@@ -104,7 +129,7 @@ re_compile(const char *pattern, unsigned options, char *errbuf,
VCL_VOID
vmod_regex__init(VRT_CTX, struct vmod_re_regex **rep, const char *vcl_name,
VCL_STRING pattern, VCL_INT limit, VCL_INT limit_recursion,
VCL_BOOL forbody)
VCL_BOOL forbody, VCL_BOOL asfilter)
{
struct vmod_re_regex *re;
vre_t *vre;
......@@ -130,6 +155,8 @@ vmod_regex__init(VRT_CTX, struct vmod_re_regex **rep, const char *vcl_name,
return;
}
forbody |= asfilter;
if (forbody)
options |= PCRE2_PARTIAL_HARD;
vre = re_compile(pattern, options, errbuf, sizeof errbuf, &erroffset);
......@@ -151,6 +178,43 @@ vmod_regex__init(VRT_CTX, struct vmod_re_regex **rep, const char *vcl_name,
re->vre_limits.match = limit;
re->vre_limits.depth = limit_recursion;
*rep = re;
if (asfilter == 0)
return;
struct vfp *vfp = malloc(sizeof *vfp);
AN(vfp);
struct vdp_container *vdpc;
ALLOC_OBJ(vdpc, VMOD_RE_VDP_MAGIC);
AN(vdpc);
vfp->name = vcl_name;
vfp->init = re_vfp_init;
vfp->pull = re_vfp_pull;
vfp->fini = re_vfp_fini;
vfp->priv1 = re;
vdpc->vdp.name = vcl_name;
vdpc->vdp.init = re_vdp_init;
vdpc->vdp.bytes = re_vdp_bytes;
vdpc->vdp.fini = re_vdp_fini;
vdpc->re = re;
vdpc->vcl = ctx->vcl;
re->vfp = vfp;
re->vdpc = vdpc;
if (! VRT_AddFilter(ctx, vfp, &vdpc->vdp))
return;
re->vfp = NULL;
re->vdpc = NULL;
free(vfp);
free(vdpc);
vmod_regex__fini(rep);
// VRT_fail() from VRT_AddFilter()
}
VCL_VOID
......@@ -160,11 +224,22 @@ vmod_regex__fini(struct vmod_re_regex **rep)
if (rep == NULL || *rep == NULL)
return;
re = *rep;
*rep = NULL;
CHECK_OBJ_NOTNULL(re, VMOD_RE_REGEX_MAGIC);
TAKE_OBJ_NOTNULL(re, rep, VMOD_RE_REGEX_MAGIC);
if (re->vre != NULL)
VRE_free(&re->vre);
if (re->vfp) {
struct vrt_ctx ctx[1];
AN(re->vdpc);
INIT_OBJ(ctx, VRT_CTX_MAGIC);
ctx->vcl = re->vdpc->vcl;
VRT_RemoveFilter(ctx, re->vfp, &re->vdpc->vdp);
free(re->vfp);
FREE_OBJ(re->vdpc);
}
FREE_OBJ(re);
}
......@@ -670,3 +745,873 @@ vmod_version(VRT_CTX __attribute__((unused)))
(void) ctx;
return VERSION;
}
/*
* ============================================================
* filter VCL interface
*
* our id for the priv_task is the vfp pointer
*/
struct re_filter_subst {
uint16_t magic;
#define RE_FILTER_SUBST_MAGIC 0x6559
uint16_t flags;
#define REFS_F_FIXED 1 // no \n in s
unsigned n;
VSLIST_ENTRY(re_filter_subst) list;
VCL_STRING s;
};
VSLIST_HEAD(re_filter_substhead, re_filter_subst);
static struct re_filter_subst *
re_filter_subst_insert(struct re_filter_substhead *head,
struct re_filter_subst *sub)
{
struct re_filter_subst *e, *prev;
if (VSLIST_EMPTY(head)) {
VSLIST_FIRST(head) = sub;
return (NULL);
}
e = VSLIST_FIRST(head);
AN(e);
if (sub->n < e->n) {
VSLIST_INSERT_HEAD(head, sub, list);
return (NULL);
}
prev = NULL;
VSLIST_FOREACH_FROM(e, head, list) {
if (sub->n > e->n) {
prev = e;
continue;
}
if (sub->n == e->n)
return (e);
break;
}
AN(prev);
VSLIST_INSERT_AFTER(prev, sub, list);
return (NULL);
}
VCL_VOID
vmod_regex_substitute_match(VRT_CTX,
struct VPFX(re_regex) *re, VCL_INT n, VCL_STRING s)
{
struct vmod_priv *task;
struct re_filter_substhead *head;
struct re_filter_subst *sub;
CHECK_OBJ_NOTNULL(ctx, VRT_CTX_MAGIC);
CHECK_OBJ_NOTNULL(re, VMOD_RE_REGEX_MAGIC);
if (! re->vfp) {
VRT_fail(ctx, "vmod re: .substitute*() methods require "
"construction with asfilter=true");
return;
}
task = VRT_priv_task(ctx, re->vfp);
WS_TASK_ALLOC_OBJ(ctx, sub, RE_FILTER_SUBST_MAGIC);
if (task == NULL || sub == NULL) {
VRT_fail(ctx, "vmod re: out of workspace?");
return;
}
//lint -e{740} the head is just a single pointer, as is the task priv
head = (struct re_filter_substhead *)&task->priv;
if (n < 0) {
VRT_fail(ctx, "vmod re: substitute number "
"must not be negative");
return;
}
if (n > UINT_MAX) {
VRT_fail(ctx, "vmod re: substitute number "
"too big");
return;
}
sub->n = (unsigned)n;
sub->s = s;
if (strchr(sub->s, '\\') == NULL)
sub->flags |= REFS_F_FIXED;
if (re_filter_subst_insert(head, sub))
VRT_fail(ctx, "vmod re: substitute n=%lu already defined. "
"use .clear_substitutions() ?", n);
}
VCL_VOID vmod_regex_substitute_all(VRT_CTX,
struct VPFX(re_regex) *re, VCL_STRING s)
{
vmod_regex_clear_substitutions(ctx, re);
vmod_regex_substitute_match(ctx, re, 0, s);
}
VCL_VOID vmod_regex_clear_substitutions(VRT_CTX,
struct VPFX(re_regex) *re)
{
struct vmod_priv *task;
CHECK_OBJ_NOTNULL(ctx, VRT_CTX_MAGIC);
CHECK_OBJ_NOTNULL(re, VMOD_RE_REGEX_MAGIC);
if (! re->vfp) {
VRT_fail(ctx, "vmod re: .clear_substitutions() requires "
"construction with asfilter=true");
return;
}
task = VRT_priv_task_get(ctx, re->vfp);
if (task == NULL)
return;
task->priv = NULL;
}
/*
* ============================================================
* filter
*
*/
struct re_flt_state {
unsigned magic;
#define RE_FLT_STATE_MAGIC 0x4624f390
unsigned seen;
uint32_t re_options;
struct vsl_log *vsl;
struct re_filter_substhead *head;
pcre2_code *re;
pcre2_match_context *re_ctx;
pcre2_match_data *re_data;
PCRE2_SIZE re_min_length;
struct rvb i[1], o[1];
};
// common for vfp and vdp
static struct re_flt_state *
re_flt_init(VRT_CTX, const struct vmod_re_regex *re,
struct re_filter_substhead *head)
{
struct re_flt_state *res;
WS_TASK_ALLOC_OBJ(ctx, res, RE_FLT_STATE_MAGIC);
if (res == NULL) {
VRT_fail(ctx, "vmod re: out of workspace?");
return (NULL);
}
res->re_options = PCRE2_PARTIAL_HARD;
res->vsl = ctx->vsl;
res->head = head;
res->re = VRE_unpack(re->vre);
AN(res->re);
res->re_ctx = pcre2_match_context_create(NULL);
if (res->re_ctx == NULL) {
VRT_fail(ctx, "vmod re_ctx create failed");
return (NULL);
}
res->re_data = pcre2_match_data_create_from_pattern(res->re, NULL);
if (res->re_data == NULL) {
pcre2_match_context_free(res->re_ctx);
VRT_fail(ctx, "vmod re_data create failed");
return (NULL);
}
AZ(pcre2_pattern_info(res->re, PCRE2_INFO_MINLENGTH,
&res->re_min_length));
AZ(pcre2_set_depth_limit(res->re_ctx, re->vre_limits.depth));
AZ(pcre2_set_match_limit(res->re_ctx, re->vre_limits.match));
return (res);
}
static void
re_flt_fini(struct re_flt_state **resp)
{
struct re_flt_state *res;
TAKE_OBJ_NOTNULL(res, resp, RE_FLT_STATE_MAGIC);
AN(res->re_ctx);
AN(res->re_data);
pcre2_match_context_free(res->re_ctx);
pcre2_match_data_free(res->re_data);
if (rvb_is(res->i))
rvb_fini(res->i);
if (rvb_is(res->o))
rvb_fini(res->o);
}
static enum vfp_status v_matchproto_(vfp_init_f)
re_vfp_init(VRT_CTX, struct vfp_ctx *vc, struct vfp_entry *vfe)
{
const struct vmod_re_regex *re;
struct re_flt_state *res;
struct vmod_priv *task;
CHECK_OBJ_NOTNULL(ctx, VRT_CTX_MAGIC);
CHECK_OBJ_NOTNULL(vc, VFP_CTX_MAGIC);
CHECK_OBJ_NOTNULL(vfe, VFP_ENTRY_MAGIC);
CAST_OBJ_NOTNULL(re, vfe->vfp->priv1, VMOD_RE_REGEX_MAGIC);
assert(re->vfp == vfe->vfp);
task = VRT_priv_task_get(ctx, re->vfp);
if (task == NULL || task->priv == NULL)
return (VFP_NULL);
//lint -e{740} the head is just a single pointer, as is the task priv
res = re_flt_init(ctx, re, (struct re_filter_substhead *)&task->priv);
if (res == NULL)
return (VFP_ERROR);
vfe->priv1 = res;
http_Unset(vc->resp, H_Content_Length);
http_Unset(vc->resp, H_ETag);
return (VFP_OK);
}
//#define RE_FLT_DEBUG 1
#ifdef RE_FLT_DEBUG
static inline void
rvb_dbg(struct vsl_log *vsl, const char *pfx, struct rvb *b)
{
if (rvb_is(b))
VSLb(vsl, SLT_Debug,
"%s: %.*s>%.*s< f=%zd fl=0x%02x", pfx,
(int)(b->u.ro.r - b->u.ro.p), b->u.ro.p,
(int)(b->u.ro.w - b->u.ro.r), b->u.ro.r,
b->u.ro.l - b->u.ro.w, b->flags);
else
VSLb(vsl, SLT_Debug, "%s: (nil)", pfx);
}
#define RVB_DBG(vsl, x) rvb_dbg(vsl, #x, x)
#define VFP_DBG(vsl, ...) VSLb(vsl, __VA_ARGS__)
#else
#define RVB_DBG(vsl, x) (void)0
#define VFP_DBG(vsl, ...) (void)0
#endif
// realing ovector to ovector[0] == 0
static void
realign_ovector(PCRE2_SIZE *ovector, int matches)
{
PCRE2_SIZE off;
int i;
off = ovector[0];
if (off == 0)
return;
for (i = 0; i < matches * 2; i++) {
assert(ovector[i] >= off);
ovector[i] -= off;
}
}
#define xisdigit(x) ((x) >= '0' && (x) <= '9')
// rvb_puto() the replacement with \n substituted
// basically VRE_sub()
static void
re_vfp_subst(struct rvb *out, struct rvb *overflow,
const char *replacement, PCRE2_SIZE *ovector,
const char *subj, unsigned matches)
{
const char *s, *e;
unsigned x;
for (s = e = replacement; *e != '\0'; e++ ) {
if (*e != '\\' || e[1] == '\0')
continue;
rvb_puto(out, overflow, s, pdiff(s, e));
s = ++e;
if (xisdigit(*e)) {
s++;
x = *e - '0';
if (x >= matches)
continue;
x *= 2;
rvb_puto(out, overflow, subj + ovector[x],
ovector[x + 1] - ovector[x]);
continue;
}
}
rvb_puto(out, overflow, s, pdiff(s, e));
}
/*
* return the right sub for the next match, if any
*
* there must be subs left (see "no replacements left" code)
*
*/
static const struct re_filter_subst *
re_flt_sub(struct re_flt_state *res)
{
struct re_filter_subst *sub0, *sub;
res->seen++;
sub = VSLIST_FIRST(res->head);
CHECK_OBJ_NOTNULL(sub, RE_FILTER_SUBST_MAGIC);
if (sub->n == 0) {
sub0 = sub;
sub = VSLIST_NEXT(sub, list);
}
else
sub0 = NULL;
if (sub != NULL) {
CHECK_OBJ(sub, RE_FILTER_SUBST_MAGIC);
if (sub->n != res->seen)
sub = NULL;
}
if (sub != NULL) {
if (sub0)
VSLIST_REMOVE_AFTER(sub0, list);
else
VSLIST_REMOVE_HEAD(res->head, list);
return (sub);
}
if (sub0 != NULL)
return (sub0);
return (NULL);
}
/*
* (set +xe ; for b in {1..40} ; do echo $b ; rm -f src/vmod_re.*o ; make CFLAGS='-Wall -Werror -g -DTEST_REDUCED_BUFFER='$b -j 20 check ; done; echo TEST_REDUCED_BUFFER done)
*/
//#define TEST_REDUCED_BUFFER 3
static enum vfp_status v_matchproto_(vfp_pull_f)
re_vfp_pull(struct vfp_ctx *vc, struct vfp_entry *vfe,
void *p, ssize_t *lp)
{
const struct re_filter_subst *sub;
struct re_flt_state *res;
struct rvb out[1] = {0}, *b;
PCRE2_SIZE *ovector;
enum vfp_status r;
int i;
CHECK_OBJ_NOTNULL(vc, VFP_CTX_MAGIC);
CHECK_OBJ_NOTNULL(vfe, VFP_ENTRY_MAGIC);
CAST_OBJ_NOTNULL(res, vfe->priv1, RE_FLT_STATE_MAGIC);
AN(p);
AN(lp);
#ifdef TEST_REDUCED_BUFFER
*lp = TEST_REDUCED_BUFFER;
VSLb(res->vsl, SLT_Debug, "TEST_REDUCED_BUFFER %u",
TEST_REDUCED_BUFFER);
#endif
rvb_init_wr(out, p, *lp);
*lp = 0;
if (rvb_is(res->i) && rvb_rl(res->i) > 0)
b = res->i;
else
b = out;
//lint --e{801} could replace goto with while/continue
suck:
VFP_DBG(res->vsl, SLT_Debug, "---");
RVB_DBG(res->vsl, res->i);
RVB_DBG(res->vsl, res->o);
// append any remaining output
if (rvb_is(res->o)) {
rvb_append(out, res->o);
if (rvb_rl(res->o) == 0)
rvb_fini(res->o); // XXX reuse?
// output data not to be matched again
rvb_consume_all(out);
}
// return to caller if buffer full or end of data
if (rvb_wl(out) == 0 || (out->flags & RVB_F_END)) {
AZ(rvb_rl(out));
RVB_DBG(res->vsl, out);
VFP_DBG(res->vsl, SLT_Debug, "=== ret fl 0x%02x",
out->flags);
return (rvb_ret(out, p, lp));
}
/* no replacements left == noop - copy from in buffer */
if (VSLIST_EMPTY(res->head) && rvb_is(res->i)) {
rvb_take(res->o, res->i);
b = out;
goto suck;
}
/* no replacements left == noop - suck remaining data */
if (VSLIST_EMPTY(res->head)) {
assert(b == out);
r = rvb_vfp_status(out);
while (r == VFP_OK && rvb_wl(out) > 0)
r = rvb_suck(out, vc);
VFP_DBG(res->vsl, SLT_Debug, "=== ret noop");
return (rvb_ret(out, p, lp));
}
// if the input buffer is consumed, reset it and read to out
if (b == res->i && rvb_rl(res->i) == 0) {
out->flags |= (res->i->flags & RVB_F_END);
rvb_reset(res->i);
rvb_consume_all(out);
b = out;
}
r = rvb_vfp_status(b);
if (r != VFP_END && rvb_wl(b) > 0) {
VFP_DBG(res->vsl, SLT_Debug, "suck");
r = rvb_suck(b, vc);
if (r == VFP_ERROR) {
VFP_DBG(res->vsl, SLT_Debug, "ERROR");
return (r);
}
if (rvb_rl(b) == 0) {
VFP_DBG(res->vsl, SLT_Debug, "=== END");
return (rvb_ret(out, p, lp));
}
}
RVB_DBG(res->vsl, b);
if (r == VFP_END)
res->re_options &= ~PCRE2_PARTIAL_HARD;
/* XXX lookbehind issue: we would need to keep some
* context around to fully support lookbehind
*
* make it work without first and maybe revisit later
*/
i = pcre2_match(res->re, (PCRE2_SPTR)rvb_r(b), rvb_rl(b), 0,
res->re_options, res->re_data, res->re_ctx);
if (i < PCRE2_ERROR_PARTIAL) {
VFP_DBG(res->vsl, SLT_VCL_Error,
"vmod re: regex match returned %d", i);
return (VFP_ERROR);
}
if (i == PCRE2_ERROR_PARTIAL) {
assert(r != VFP_END);
if (rvb_wl(b) > 0)
goto suck;
if (b == res->i) {
rvb_compact(b);
rvb_grow(b, 0);
goto suck;
}
assert(b == out);
if (rvb_is(res->i))
rvb_reset(res->i);
rvb_transfer(res->i, b, res->re_min_length << 1);
b = res->i;
goto suck;
}
res->re_options |= PCRE2_NOTBOL;
AZ(rvb_is(res->o));
if (i == PCRE2_ERROR_NOMATCH) {
if (b == out)
rvb_consume_all(b);
else if (b == res->i) {
rvb_take(res->o, res->i);
b = out;
}
else
WRONG("b must point to out or i");
goto suck;
}
assert(i >= 0);
ovector = pcre2_get_ovector_pointer(res->re_data);
assert ((unsigned)i <= pcre2_get_ovector_count(res->re_data));
VFP_DBG(res->vsl, SLT_Debug, "match %s %zd %zd", rvb_r(b),
ovector[0], ovector[1]);
AZ(rvb_is(res->o));
sub = re_flt_sub(res);
if (sub == NULL) {
if (b == res->i)
// leave copy to rvb_append() at suck:
rvb_ref_prefix(res->o, b, ovector[1]);
else
assert(b == out);
rvb_consume(b, ovector[1]);
goto suck;
}
/* keep bit before match
* consume match
* output replacement
*/
AZ(rvb_is(res->o));
if (b == out) {
rvb_consume(b, ovector[0]);
if (rvb_is(res->i))
AZ(rvb_rl(res->i));
rvb_transfer(res->i, b, res->re_min_length << 1);
b = res->i;
}
else if (b == res->i) {
rvb_puto(out, res->o, rvb_r(b), ovector[0]);
rvb_consume(b, ovector[0]);
}
else
WRONG("b");
// caller buffer is in output mode
assert(b == res->i);
realign_ovector(ovector, i);
VFP_DBG(res->vsl, SLT_Debug, "%d subj %.*s repl %s",
i, (int)ovector[1], rvb_r(b), sub->s);
if (sub->flags & REFS_F_FIXED)
rvb_puto(out, res->o, sub->s, strlen(sub->s));
else
re_vfp_subst(out, res->o, sub->s, ovector,
rvb_r(b), (unsigned)i);
AZ(ovector[0]);
rvb_consume(b, ovector[1]);
if (rvb_rl(b) == 0 && b->flags & RVB_F_END) {
if (rvb_is(res->o) && rvb_rl(res->o))
res->o->flags |= RVB_F_END;
else
out->flags |= RVB_F_END;
}
rvb_consume_all(out);
goto suck;
}
static void v_matchproto_(vfp_fini_f)
re_vfp_fini(struct vfp_ctx *vc, struct vfp_entry *vfe)
{
struct re_flt_state *res;
CHECK_OBJ_NOTNULL(vc, VFP_CTX_MAGIC);
CHECK_OBJ_NOTNULL(vfe, VFP_ENTRY_MAGIC);
if (vfe->priv1 == NULL)
return;
TAKE_OBJ_NOTNULL(res, &vfe->priv1, RE_FLT_STATE_MAGIC);
re_flt_fini(&res);
}
// from linux
#define container_of(ptr, type, member) ({ \
const typeof(((type *)0)->member) *__mptr = (ptr); \
(type *)((char *)__mptr - offsetof(type, member)) ; })
static int v_matchproto_(vdp_init_f)
re_vdp_init(VRT_CTX, struct vdp_ctx *vdx, void **priv, struct objcore *oc)
{
const struct vmod_re_regex *re;
struct vdp_container *vdpc;
struct re_flt_state *res;
struct vdp_entry *vde;
struct vmod_priv *task;
struct req *req;
CHECK_OBJ_NOTNULL(ctx, VRT_CTX_MAGIC);
CHECK_OBJ_NOTNULL(vdx, VDP_CTX_MAGIC);
CHECK_OBJ_ORNULL(oc, OBJCORE_MAGIC);
req = vdx->req;
CHECK_OBJ_NOTNULL(req, REQ_MAGIC);
/*
* making a mountain out of a molehill because we do not
* have a private pointer in the vdp... And not even a
* pointer to the vdp
*/
vde = container_of(priv, struct vdp_entry, priv);
CHECK_OBJ_NOTNULL(vde, VDP_ENTRY_MAGIC);
vdpc = container_of(vde->vdp, struct vdp_container, vdp);
CHECK_OBJ_NOTNULL(vdpc, VMOD_RE_VDP_MAGIC);
re = vdpc->re;
CHECK_OBJ_NOTNULL(re, VMOD_RE_REGEX_MAGIC);
task = VRT_priv_task_get(ctx, re->vfp);
if (task == NULL || task->priv == NULL)
return (1);
//lint -e{740} the head is just a single pointer, as is the task priv
res = re_flt_init(ctx, re, (struct re_filter_substhead *)&task->priv);
if (res == NULL)
return (-1);
AN(priv);
AZ(*priv);
*priv = res;
req->resp_len = -1;
return (0);
}
// we do not want to push the individual micro-matches:
// use the vfp subst function to write to a buffer/overflow and
// push those
static int
re_vdp_subst(struct vdp_ctx *vdx, enum vdp_action act,
const char *replacement, PCRE2_SIZE *ovector,
const char *subj, unsigned matches)
{
// should have PCRE2_JIT_STACK_DEFAULT == 32k available
char buf[16 * 1024];
struct rvb out[1] = {0};
struct rvb overflow[1] = {0};
int r;
rvb_init_wr(out, buf, sizeof *buf);
re_vfp_subst(out, overflow, replacement, ovector, subj, matches);
if (act == VDP_END) {
if (rvb_is(overflow))
overflow->flags |= RVB_F_END;
else
out->flags |= RVB_F_END;
}
r = rvb_vdp_bytes(out, vdx);
if (rvb_is(overflow)) {
if (r)
rvb_fini(overflow);
else
r = rvb_vdp_bytes(overflow, vdx);
}
return (r);
}
static int v_matchproto_(vdp_bytes_f)
re_vdp_bytes(struct vdp_ctx *vdx, enum vdp_action act, void **priv,
const void *ptr, ssize_t len)
{
const struct re_filter_subst *sub;
struct rvb in[1] = {0}, *b;
struct re_flt_state *res;
PCRE2_SIZE *ovector;
int i, r;
CHECK_OBJ_NOTNULL(vdx, VDP_CTX_MAGIC);
AN(priv);
CAST_OBJ_NOTNULL(res, *priv, RE_FLT_STATE_MAGIC);
#ifdef TEST_REDUCED_BUFFER
const char *p = ptr;
const ssize_t n = TEST_REDUCED_BUFFER;
VSLb(res->vsl, SLT_Debug, "TEST_REDUCED_BUFFER %u",
TEST_REDUCED_BUFFER);
while (len > n) {
if (re_vdp_bytes(vdx, VDP_NULL, priv, p, n))
return (vdx->retval);
p += n;
len -= n;
}
ptr = p;
#endif
/*
* cut short for VDP_FLUSH, VDP_END etc -
* process only if we have at least one byte
*/
if (len == 0 && (! rvb_is(res->i) || rvb_rl(res->i) == 0))
return (VDP_bytes(vdx, act, ptr, len));
if (len == 0 && ptr == NULL)
ptr = "";
rvb_init_ro(in, ptr, len);
if (act == VDP_END)
in->flags |= RVB_F_END;
else if (act == VDP_NULL)
in->flags |= RVB_F_STABLE;
VFP_DBG(res->vsl, SLT_Debug, "---");
RVB_DBG(res->vsl, res->i);
RVB_DBG(res->vsl, in);
// we never buffer output in the VDP
AZ(rvb_is(res->o));
// if we have an input buffer, transfer to it
if (rvb_is(res->i)) {
rvb_transfer(res->i, in, res->re_min_length << 1);
b = res->i;
}
else
b = in;
RVB_DBG(res->vsl, b);
if (b->flags & RVB_F_END)
res->re_options &= ~PCRE2_PARTIAL_HARD;
//lint --e{801} goto
more:
AN(rvb_rl(b));
// no replacements left == noop, push buffers
if (VSLIST_EMPTY(res->head))
return (rvb_vdp_bytes(b, vdx));
/* XXX lookbehind issue: we would need to keep some
* context around to fully support lookbehind
*
* make it work without first and maybe revisit later
*/
i = pcre2_match(res->re, (PCRE2_SPTR)rvb_r(b), rvb_rl(b), 0,
res->re_options, res->re_data, res->re_ctx);
if (i < PCRE2_ERROR_PARTIAL) {
VFP_DBG(res->vsl, SLT_VCL_Error,
"vmod re: regex match returned %d", i);
return (-1);
}
if (i == PCRE2_ERROR_PARTIAL) {
AZ(b->flags & RVB_F_END);
// save input if not already done above
if (b == in)
rvb_transfer(res->i, in, res->re_min_length << 1);
else if (b == res->i)
rvb_compact(res->i);
else
WRONG("b for partial match");
// we should be called again...
return (0);
}
res->re_options |= PCRE2_NOTBOL;
if (i == PCRE2_ERROR_NOMATCH)
return (rvb_vdp_bytes(b, vdx));
assert(i >= 0);
ovector = pcre2_get_ovector_pointer(res->re_data);
assert (i <= (int)pcre2_get_ovector_count(res->re_data));
VFP_DBG(res->vsl, SLT_Debug, "match %s %zd %zd", rvb_r(b),
ovector[0], ovector[1]);
sub = re_flt_sub(res);
if (sub == NULL) {
AZ(ovector[0]);
assert(rvb_rl(b) >= ovector[1]);
if (rvb_rl(b) == ovector[1] && b->flags & RVB_F_END)
act = VDP_END;
else
act = VDP_NULL;
r = VDP_bytes(vdx, act, rvb_r(b), ovector[1]);
rvb_consume(b, ovector[1]);
if (r || act == VDP_END || rvb_rl(b) == 0)
return (r);
goto more;
}
/* keep bit before match
* consume match
* output replacement
*/
if (ovector[0] > 0) {
rvb_ref_prefix(res->o, b, ovector[0]);
rvb_consume(b, ovector[0]);
if (rvb_vdp_bytes(res->o, vdx))
return (vdx->retval);
}
realign_ovector(ovector, i);
VFP_DBG(res->vsl, SLT_Debug, "%d subj %.*s repl %s",
i, (int)ovector[1], rvb_r(b), sub->s);
assert(rvb_rl(b) >= ovector[1]);
// more to come?
if (rvb_rl(b) == ovector[1] && b->flags & RVB_F_END)
act = VDP_END;
else
act = VDP_NULL;
if (sub->flags & REFS_F_FIXED)
r = VDP_bytes(vdx, act, sub->s, strlen(sub->s));
else
r = re_vdp_subst(vdx, act, sub->s, ovector, rvb_r(b), i);
AZ(ovector[0]);
rvb_consume(b, ovector[1]);
if (r || act == VDP_END || rvb_rl(b) == 0)
return (r);
goto more;
}
static int v_matchproto_(vdp_fini_f)
re_vdp_fini(struct vdp_ctx *vdx, void **priv)
{
struct re_flt_state *res;
CHECK_OBJ_NOTNULL(vdx, VDP_CTX_MAGIC);
AN(priv);
if (*priv == NULL)
return (0);
TAKE_OBJ_NOTNULL(res, priv, RE_FLT_STATE_MAGIC);
re_flt_fini(&res);
return (0);
}
......@@ -27,6 +27,12 @@ SYNOPSIS
BOOL <obj>.match_body(req_body | bereq_body | resp_body
[, INT limit] [, INT limit_recursion])
# filter interface (includes all of the above)
new <obj> = re.regex(STRING [, INT limit] [, INT limit_recursion]
, forbody=true)
<obj>.substitute_match(INT, STRING)
set [be]resp.filters = "<obj>"
# function interface
BOOL re.match_dyn(STRING [, INT limit] [, INT limit_recursion])
STRING re.backref_dyn(INT [, STRING fallback])
......@@ -40,6 +46,10 @@ DESCRIPTION
.. _regsuball(): https://varnish-cache.org/docs/trunk/reference/vcl.html#regsuball-str-regex-sub
.. _beresp.filters: https://varnish-cache.org/docs/trunk/reference/vcl-var.html#beresp-filters
.. _resp.filters: https://varnish-cache.org/docs/trunk/reference/vcl-var.html#resp-filters
Varnish Module (VMOD) for matching strings against regular expressions,
and for extracting captured substrings after matches.
......@@ -92,6 +102,10 @@ the ``vcl_backend_*`` subroutines and returns ``true``, then
subsequent calls to ``backref`` in the same backend scope extract
substrings from the matched substring.
By setting the ``asfilter`` parameter to true, a regex object can also
be configured to add a filter for performing substitutions on
bodies. See `xregex.substitute_match()`_ for details and examples.
The VMOD also supports dynamic regex matching with the ``match_dyn``
and ``backref_dyn`` functions::
......@@ -119,7 +133,8 @@ since it re-uses the compiled expression obtained at VCL
initialization. So if you are matching against a fixed pattern that
never changes during the lifetime of VCL, use ``match``.
$Object regex(STRING, INT limit=1000, INT limit_recursion=1000, BOOL forbody=0)
$Object regex(STRING, INT limit=1000, INT limit_recursion=1000, BOOL
forbody=0, BOOL asfilter=0)
Description
Create a regex object with the given regular expression. The
......@@ -138,6 +153,14 @@ Description
`xregex.match_body()`_ method is to be called on the
object.
If the optional ``asfilter`` parameter is true, the vmod
registers itself as a Varnish Fetch Processor (VFP) for use in
`beresp.filters`_ and as a Varnish Delivery Processor (VDP)
for use in `resp.filters`_. In this setup, the
`xregex.substitute_match()`_ and `xregex.substitute_all()`_
methods can be used to define replacements for matches on the
body.
Example
``new myregex = re.regex("\bmax-age\s*=\s*(\d+)");``
......@@ -194,6 +217,8 @@ Description
should first be cached by calling
``std.cache_req_body(<size>)``.
Lookarounds are not supported.
Example::
sub vcl_init {
......@@ -245,9 +270,85 @@ Description
is emitted to the Varnish log using the ``VCL_Error`` tag, and
the fallback string is returned.
Lookarounds are not supported.
Example
``set beresp.ttl = std.duration(myregex.backref(1, "120"), 120s);``
$Method VOID .substitute_match(INT, STRING)
Description
This method defines substitutions for regular expression
replacement ("regsub") operations on HTTP bodies.
It can only be used on `re.regex()`_ objects initiated with
the ``asfilter`` argument set to ``true``, or a VCL failure
will be triggered.
The INT argument defines to which match the substitution is to
be applied: For ``1``, it applies to the first match, for
``2`` to the second etc. A value of ``0`` defines the default
substitution which is applied if a specific substitution is
not defined. Negative values trigger a VCL failure.
If no substitution is defined for a match (and there is no
default), the matched sub-string is left unchanged.
The STRING argument defines the substitution to apply, exactly
like the ``sub`` (third) argument of the `regsub()`_ built-in
VCL function: ``\0`` (which can also be spelled ``\&``) is
replaced with the entire matched string, and ``\n`` is
replaced with the contents of subgroup *n* in the matched
string.
To have any effect, the regex object must be used as a fetch
or delivery filter.
Example
For occurrences of the string "reiher" in the response body,
replace the first with "czapla", the second with "eier" and
all others with "heron". The response is returned uncompressed
even if the client supported compression because there
currently is no ``gzip`` VDP in Varnish-Cache::
sub vcl_init {
new reiher = re.regex("r(ei)h(er)", asfilter = true);
}
sub vcl_deliver {
unset req.http.Accept-Encoding;
set resp.filters += " reiher";
reiher.substitute_match(1, "czapla");
reiher.substitute_match(2, "\1\2");
reiher.substitute_match(0, "heron");
}
$Method VOID .substitute_all(STRING)
Description
This method instructs the named filter object to replace all
matches with the STRING argument.
It is a shorthand for calling::
xregex.clear_substitutions();
xregex.substitute_match(0, STRING);
See `xregex.substitute_match()`_ for when to use this method.
$Method VOID .clear_substitutions()
Description
This method clears all previous substitution definions through
`xregex.substitute_match()`_ and `xregex.substitute_all()`_.
It is not required because VCL code could always be written
sucht hat only one code patch ever calls
`xregex.substitute_match()`_ and `xregex.substitute_all()`_,
but it is provided to allow for simpler VCL for handling
exceptional cases.
See `xregex.substitute_match()`_ for when to use this method.
$Function BOOL match_dyn(PRIV_TASK, STRING, STRING,
INT limit=1000, INT limit_recursion=1000)
......
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment