Commit 06610ae9 authored by Geoff Simmons's avatar Geoff Simmons

Add the set.matched() and .nmatches methods, with a bit of refactoring.

parent 0324e8cf
......@@ -48,6 +48,8 @@ import re2 [from "path"] ;
VOID <obj>.add(STRING)
VOID <obj>.compile()
BOOL <obj>.match(STRING)
INT <obj>.nmatches()
BOOL <obj>.matched(INT)
DESCRIPTION
===========
......@@ -116,14 +118,35 @@ example::
sub vcl_init {
new myset = re2.set();
myset.add("foo");
myset.add("bar");
myset.add("baz");
myset.add("foo"); # Pattern 1
myset.add("bar"); # Pattern 2
myset.add("baz"); # Pattern 3
myset.compile();
}
``myset.match(<string>)`` can now be used to match a string against the
pattern ``foo|bar|baz``.
``myset.match(<string>)`` can now be used to match a string against
the pattern ``foo|bar|baz``. When a match is successful, the matcher
has determined all of the patterns that matched. These can then be
retrieved with the method ``.nmatches()`` for the number of matched
patterns, and with ``.matched(n)``, which returns ``true`` if the
``nth`` pattern matched, where the patterns are numbered in the order
in which they were added::
if (myset.match("foobar")) {
std.log("Matched " + myset.nmatches() + " patterns");
if (myset.matched(1)) {
# Pattern /foo/ matched
call do_foo;
}
if (myset.matched(2)) {
# Pattern /bar/ matched
call do_bar;
}
if (myset.matched(3)) {
# Pattern /baz/ matched
call do_baz;
}
}
regex options
-------------
......@@ -717,6 +740,9 @@ VCL load will fail with an error message.
In other words, add all patterns to the set in ``vcl_init``, and
finally call ``.compile()`` when you're done.
When the ``.matched(INT)`` method is called after a successful match,
the numbering corresponds to the order in which patterns were added.
Example::
sub vcl_init {
......@@ -742,10 +768,10 @@ set.compile
Compile the compound pattern represented by the set -- an alternation
of all patterns added by ``.add()``.
``.compile()`` may fail if the ``max_mem`` setting is not large enough
for the composed pattern. In that case, the VCL load will fail with an
error message (then consider a larger value for ``max_mem`` in the set
constructor).
``.compile()`` fails if no patterns were added to the set. It may also
fail if the ``max_mem`` setting is not large enough for the composed
pattern. In that case, the VCL load will fail with an error message
(then consider a larger value for ``max_mem`` in the set constructor).
``.compile()`` MUST be called in ``vcl_init``, and MAY NOT be called
more than once for a set object. If it is called in any other
......@@ -768,6 +794,11 @@ Returns ``true`` if the given string matches the compound pattern
represented by the set, i.e. if it matches any of the patterns that
were added to the set.
The matcher identifies all of the patterns that were added to the set
and match the given string. These can be determined after a successful
match using the ``.matched(INT)`` and ``.nmatches()`` methods
described below.
``.match()`` MUST be called after ``.compile()``; otherwise the match
always fails.
......@@ -777,6 +808,83 @@ Example::
call do_when_a_host_matched;
}
.. _func_set.matched:
set.matched
-----------
::
BOOL set.matched(INT)
Returns ``true`` after a successful match if the ``nth`` pattern that
was added to the set is among the patterns that matched, ``false``
otherwise. The numbering of the patterns corresponds to the order in
which patterns were added in ``vcl_init``, counting from 1.
The method refers back to the most recent invocation of ``.match()``
for the same object in the same client or backend context. It always
returns ``false``, for every value of the parameter, if it is called
after an unsuccessful match (``.match()`` returned ``false``).
``.matched()`` fails and returns ``false`` if:
* The ``.match()`` method was not called for this object in the same
client or backend scope.
* The integer parameter is out of range; that is, if it is less than 1
or greater than the number of patterns added to the set.
On failure, the method writes an error message to the log with the tag
``VCL_Error``; if it fails during ``vcl_init``, then the VCL load
fails with the error message. In any other VCL subroutine, the method
returns ``false`` on failure and processing continues; since ``false``
is a legitimate return value, you should consider monitoring the log
for the error messages.
Example::
if (hostmatcher.match(req.http.Host)) {
if (hostmatcher.matched(1)) {
call do_domain1;
}
if (hostmatcher.matched(2)) {
call do_domain2;
}
if (hostmatcher.matched(3)) {
call do_domain3;
}
}
.. _func_set.nmatches:
set.nmatches
------------
::
INT set.nmatches()
Returns the number of patterns that were matched by the most recent
invocation of ``.match()`` for the same object in the same client or
backend context. The method always returns 0 after an unsuccessful
match (``.match()`` returned ``false``).
If ``.match()`` was not called for this object in the same client or
backend scope, ``.nmatches()`` fails and returns 0, writing an error
message with ``VCL_Error`` to the log. If this happens in
``vcl_init``, the VCL load fails with the error message. As with
``.matched()``, ``.nmatches()`` returns a legitimate value and VCL
processing continues when it fails in any other subroutine, so you
should monitor the log for the error messages.
Example::
if (myset.match(req.url)) {
std.log("URL matched " + myset.nmatches()
+ " patterns from the set");
}
.. _func_version:
version
......
......@@ -85,33 +85,6 @@ AC_PATH_PROG([VARNISHD], [varnishd], [],
PKG_CHECK_MODULES([RE2], [re2])
# RE2 versions up to 2016-03-01 require a pointer to vector<int> in
# Set::Match(), to identify the regex that was matched. Since commit
# df7a2dc in re2, the pointer may be NULL, if we just want to know
# whether there was a match. This check tests for that feature.
# Note: the test may cause a core dump if it fails.
AC_LANG_PUSH(C++)
SAVE_CXXFLAGS="$CXXFLAGS"
SAVE_LDFLAGS="$LDFLAGS"
CXXFLAGS+=" -std=c++11"
LDFLAGS+=" -lre2"
AC_RUN_IFELSE(
[AC_LANG_SOURCE([[
#include "re2/set.h"
main() {
re2::RE2::Set s(re2::RE2::DefaultOptions, re2::RE2::UNANCHORED);
s.Add("", NULL);
s.Compile();
s.Match("", NULL);
}
]])],
[AC_DEFINE([HAVE_SET_MATCH_NULL_VECTOR], [1],
[Define to 1 if RE2 Set::Match() permits a NULL vector])]
)
CXXFLAGS="$SAVE_CXXFLAGS"
LDFLAGS="$SAVE_LDFLAGS"
AC_LANG_POP()
# --enable-stack-protector
AC_ARG_ENABLE(stack-protector,
AS_HELP_STRING([--enable-stack-protector],[enable stack protector (default is YES)]),
......
This diff is collapsed.
......@@ -68,6 +68,7 @@ struct vmod_re2_set {
vre2set *set;
char *vcl_name;
unsigned compiled;
int npatterns;
};
typedef struct task_match_t {
......@@ -79,6 +80,13 @@ typedef struct task_match_t {
unsigned never_capture;
} task_match_t;
struct task_set_match {
unsigned magic;
#define TASK_SET_MATCH_MAGIC 0x7a24a90b
int *matches;
size_t nmatches;
};
static char c;
static const void *match_failed = (void *) &c;
......@@ -106,6 +114,7 @@ errmsg(VRT_CTX, const char *fmt, ...)
AN(ctx->msg);
va_start(args, fmt);
VSB_vprintf(ctx->msg, fmt, args);
VSB_putc(ctx->msg, '\n');
va_end(args);
VRT_handling(ctx, VCL_RET_FAIL);
}
......@@ -570,6 +579,7 @@ vmod_set__init(VRT_CTX, struct vmod_re2_set **setp, const char *vcl_name,
return;
}
set->vcl_name = strdup(vcl_name);
AZ(set->npatterns);
}
VCL_VOID
......@@ -615,6 +625,7 @@ vmod_set_add(VRT_CTX, struct vmod_re2_set *set, VCL_STRING pattern)
set->vcl_name, pattern, pattern, err);
return;
}
set->npatterns++;
}
#undef ERR_PREFIX
......@@ -633,6 +644,10 @@ vmod_set_compile(VRT_CTX, struct vmod_re2_set *set)
"vcl_init", set->vcl_name);
return;
}
if (set->npatterns == 0) {
VERR(ctx, ERR_PREFIX "no patterns were added", set->vcl_name);
return;
}
if (set->compiled) {
VERR(ctx, ERR_PREFIX "%s has already been compiled",
......@@ -655,6 +670,10 @@ VCL_BOOL
vmod_set_match(VRT_CTX, struct vmod_re2_set *set, VCL_STRING subject)
{
int match = 0;
struct vmod_priv *priv;
struct task_set_match *task;
char *buf;
size_t buflen;
const char *err;
CHECK_OBJ_NOTNULL(ctx, VRT_CTX_MAGIC);
......@@ -667,15 +686,98 @@ vmod_set_match(VRT_CTX, struct vmod_re2_set *set, VCL_STRING subject)
subject, set->vcl_name);
return 0;
}
if ((err = vre2set_match(set->set, subject, &match)) != NULL) {
priv = VRT_priv_task(ctx, set);
AN(priv);
if (priv->priv == NULL) {
if ((priv->priv = WS_Alloc(ctx->ws, sizeof(*task))) == NULL) {
VERRNOMEM(ctx, ERR_PREFIX "allocating match data",
set->vcl_name, subject);
return 0;
}
priv->len = sizeof(*task);
priv->free = NULL;
task = priv->priv;
task->magic = TASK_SET_MATCH_MAGIC;
}
else {
WS_Contains(ctx->ws, priv->priv, sizeof(*task));
CAST_OBJ(task, priv->priv, TASK_SET_MATCH_MAGIC);
}
buf = WS_Snapshot(ctx->ws);
buflen = WS_Reserve(ctx->ws, 0);
if ((err = vre2set_match(set->set, subject, &match, buf, buflen,
&task->nmatches)) != NULL) {
VERR(ctx, ERR_PREFIX "%s", set->vcl_name, subject, err);
WS_Release(ctx->ws, 0);
return 0;
}
if (match) {
task->matches = (int *)buf;
WS_Release(ctx->ws, task->nmatches * sizeof(int));
}
else
WS_Release(ctx->ws, 0);
return match;
}
#undef ERR_PREFIX
VCL_BOOL
vmod_set_matched(VRT_CTX, struct vmod_re2_set *set, VCL_INT n)
{
struct vmod_priv *priv;
struct task_set_match *task;
CHECK_OBJ_NOTNULL(ctx, VRT_CTX_MAGIC);
CHECK_OBJ_NOTNULL(set, VMOD_RE2_SET_MAGIC);
if (n < 1 || n > set->npatterns) {
VERR(ctx, "n=%d out of range in %s.matched() (%d patterns)", n,
set->vcl_name, set->npatterns);
return 0;
}
priv = VRT_priv_task(ctx, set);
AN(priv);
if (priv->priv == NULL) {
VERR(ctx, "%s.matched(%d) called without prior match",
set->vcl_name, n);
return 0;
}
WS_Contains(ctx->ws, priv->priv, sizeof(*task));
CAST_OBJ(task, priv->priv, TASK_SET_MATCH_MAGIC);
if (task->nmatches == 0)
return 0;
WS_Contains(ctx->ws, task->matches, task->nmatches * sizeof(int));
n--;
for (unsigned i = 0; i < task->nmatches; i++)
if (task->matches[i] == n)
return 1;
return 0;
}
VCL_INT
vmod_set_nmatches(VRT_CTX, struct vmod_re2_set *set)
{
struct vmod_priv *priv;
struct task_set_match *task;
CHECK_OBJ_NOTNULL(ctx, VRT_CTX_MAGIC);
CHECK_OBJ_NOTNULL(set, VMOD_RE2_SET_MAGIC);
priv = VRT_priv_task(ctx, set);
AN(priv);
if (priv->priv == NULL) {
VERR(ctx, "%s.nmatches() called without prior match",
set->vcl_name);
return 0;
}
WS_Contains(ctx->ws, priv->priv, sizeof(*task));
CAST_OBJ(task, priv->priv, TASK_SET_MATCH_MAGIC);
return task->nmatches;
}
/* Regex function interface */
#define ERR_PREFIX "re2.match(pattern=\"%.40s\", text=\"%.40s\"): "
......
......@@ -33,6 +33,8 @@ $Module re2 3 Varnish Module for access to the Google RE2 regular expression eng
VOID <obj>.add(STRING)
VOID <obj>.compile()
BOOL <obj>.match(STRING)
INT <obj>.nmatches()
BOOL <obj>.matched(INT)
DESCRIPTION
===========
......@@ -101,14 +103,35 @@ example::
sub vcl_init {
new myset = re2.set();
myset.add("foo");
myset.add("bar");
myset.add("baz");
myset.add("foo"); # Pattern 1
myset.add("bar"); # Pattern 2
myset.add("baz"); # Pattern 3
myset.compile();
}
``myset.match(<string>)`` can now be used to match a string against the
pattern ``foo|bar|baz``.
``myset.match(<string>)`` can now be used to match a string against
the pattern ``foo|bar|baz``. When a match is successful, the matcher
has determined all of the patterns that matched. These can then be
retrieved with the method ``.nmatches()`` for the number of matched
patterns, and with ``.matched(n)``, which returns ``true`` if the
``nth`` pattern matched, where the patterns are numbered in the order
in which they were added::
if (myset.match("foobar")) {
std.log("Matched " + myset.nmatches() + " patterns");
if (myset.matched(1)) {
# Pattern /foo/ matched
call do_foo;
}
if (myset.matched(2)) {
# Pattern /bar/ matched
call do_bar;
}
if (myset.matched(3)) {
# Pattern /baz/ matched
call do_baz;
}
}
regex options
-------------
......@@ -620,6 +643,9 @@ VCL load will fail with an error message.
In other words, add all patterns to the set in ``vcl_init``, and
finally call ``.compile()`` when you're done.
When the ``.matched(INT)`` method is called after a successful match,
the numbering corresponds to the order in which patterns were added.
Example::
sub vcl_init {
......@@ -638,10 +664,10 @@ $Method VOID .compile()
Compile the compound pattern represented by the set -- an alternation
of all patterns added by ``.add()``.
``.compile()`` may fail if the ``max_mem`` setting is not large enough
for the composed pattern. In that case, the VCL load will fail with an
error message (then consider a larger value for ``max_mem`` in the set
constructor).
``.compile()`` fails if no patterns were added to the set. It may also
fail if the ``max_mem`` setting is not large enough for the composed
pattern. In that case, the VCL load will fail with an error message
(then consider a larger value for ``max_mem`` in the set constructor).
``.compile()`` MUST be called in ``vcl_init``, and MAY NOT be called
more than once for a set object. If it is called in any other
......@@ -657,6 +683,11 @@ Returns ``true`` if the given string matches the compound pattern
represented by the set, i.e. if it matches any of the patterns that
were added to the set.
The matcher identifies all of the patterns that were added to the set
and match the given string. These can be determined after a successful
match using the ``.matched(INT)`` and ``.nmatches()`` methods
described below.
``.match()`` MUST be called after ``.compile()``; otherwise the match
always fails.
......@@ -666,6 +697,69 @@ Example::
call do_when_a_host_matched;
}
$Method BOOL .matched(INT)
Returns ``true`` after a successful match if the ``nth`` pattern that
was added to the set is among the patterns that matched, ``false``
otherwise. The numbering of the patterns corresponds to the order in
which patterns were added in ``vcl_init``, counting from 1.
The method refers back to the most recent invocation of ``.match()``
for the same object in the same client or backend context. It always
returns ``false``, for every value of the parameter, if it is called
after an unsuccessful match (``.match()`` returned ``false``).
``.matched()`` fails and returns ``false`` if:
* The ``.match()`` method was not called for this object in the same
client or backend scope.
* The integer parameter is out of range; that is, if it is less than 1
or greater than the number of patterns added to the set.
On failure, the method writes an error message to the log with the tag
``VCL_Error``; if it fails during ``vcl_init``, then the VCL load
fails with the error message. In any other VCL subroutine, the method
returns ``false`` on failure and processing continues; since ``false``
is a legitimate return value, you should consider monitoring the log
for the error messages.
Example::
if (hostmatcher.match(req.http.Host)) {
if (hostmatcher.matched(1)) {
call do_domain1;
}
if (hostmatcher.matched(2)) {
call do_domain2;
}
if (hostmatcher.matched(3)) {
call do_domain3;
}
}
$Method INT .nmatches()
Returns the number of patterns that were matched by the most recent
invocation of ``.match()`` for the same object in the same client or
backend context. The method always returns 0 after an unsuccessful
match (``.match()`` returned ``false``).
If ``.match()`` was not called for this object in the same client or
backend scope, ``.nmatches()`` fails and returns 0, writing an error
message with ``VCL_Error`` to the log. If this happens in
``vcl_init``, the VCL load fails with the error message. As with
``.matched()``, ``.nmatches()`` returns a legitimate value and VCL
processing continues when it fails in any other subroutine, so you
should monitor the log for the error messages.
Example::
if (myset.match(req.url)) {
std.log("URL matched " + myset.nmatches()
+ " patterns from the set");
}
$Function STRING version()
Return the version string for this VMOD.
......
......@@ -69,14 +69,9 @@ vre2set::compile() const
}
inline bool
vre2set::match(const char* subject) const
vre2set::match(const char* subject, vector<int>* m) const
{
#ifdef HAVE_SET_MATCH_NULL_VECTOR
return set_->Match(subject, NULL);
#else
vector<int> v;
return set_->Match(subject, &v);
#endif
return set_->Match(subject, m);
}
const char *
......@@ -151,10 +146,20 @@ vre2set_compile(vre2set *set)
}
const char *
vre2set_match(vre2set *set, const char * const subject, int * const match)
vre2set_match(vre2set *set, const char * const subject, int * const match,
void *buf, const size_t buflen, size_t * const nmatches)
{
try {
*match = set->match(subject);
vector<int> m;
*nmatches = 0;
*match = set->match(subject, &m);
if (*match) {
if (m.size() * sizeof(int) > buflen)
return "insufficient space to copy match data";
*nmatches = m.size();
memcpy(buf, m.data(), *nmatches * sizeof(int));
}
return NULL;
}
CATCHALL
......
......@@ -46,7 +46,7 @@ public:
virtual ~vre2set();
int add(const char* pattern, string* error) const;
bool compile() const;
bool match(const char* subject) const;
bool match(const char* subject, std::vector<int>* m) const;
};
#else
typedef struct vre2set vre2set;
......@@ -72,7 +72,8 @@ extern "C" {
const char *vre2set_add(vre2set *set, const char *pattern);
const char *vre2set_compile(vre2set *set);
const char *vre2set_match(vre2set *set, const char *subject,
int * const match);
int * const match, void *buf,
const size_t buflen, size_t * const nmatches);
#ifdef __cplusplus
}
......
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment