Add filters: regex objects to perform substitutions on bodies #8
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "filter"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
If the optional
asfilterparameter tore.regex()is true, thevmod registers itself as a Varnish Fetch Processor (VFP) for use in
beresp.filtersand as a Varnish Delivery Processor (VDP) for use inresp.filters. In this setup, thexregex.substitute_match()andxregex.substitute_all()methods can be used to define replacementsfor matches on the body.
Example:
added 1 commit
4e493c55- Update READMECompare with previous version
added 2 commits
cae4eaab- Add filters: regex objects to perform substitutions on bodies4420ca08- Update READMECompare with previous version
assigned to @geoff
added 5 commits
uplex-varnish:masterbffebc04- Add filters: regex objects to perform substitutions on bodiesf06afb07- Update READMEd0bceb53- Remove assertion on buffer length > 0Compare with previous version
added 5 commits
uplex-varnish:masterf8fb9d5c- Add filters: regex objects to perform substitutions on bodiesa25e5b63- Update READMECompare with previous version
Sorry, the previous comment was invalid, I was looking at the wrong branch.
@slink I think the code is fine. I'll press the green button, and that's all that needs to be said for an MR review.
A few thoughts that might be the subject of further documentation, or at least to consider moving forward.
Do I understand correctly that the memory management that the filters might do (malloc, realloc, memmove, etc) could happen because a partial match might extend across discontinuous buffers? And that the bet is that this will be relatively uncommon, especially since a matched text is unlikely to extend across several buffers at a time?
It would be interesting to know how well the filters perform when the memory management becomes substantial. And to just see what happens on a pathological case, say when a match extends across more than 32K.
This and match_body have reminded me of why the re2 library was originally developed -- they were going to try Google code search with PCRE, but bad cases of backtracking became very severe when matches were run against whole files.
PCRE(2) is optimized for strings about as long as header contents, or a URL, or a line in a text file. Even if the regex is not optimally efficient, it still doesn't (usually) get that bad on a short string. When users of the VMOD start running matches against request and response bodies, performance issues might become noticeable, especially for users who aren't skilled at writing efficient regexen (which is unfortunately much more common than I like).
Should we add some kind of note about that subject in the documentation?
AFAICT the re2 library doesn't have something like PCRE2_PARTIAL, so I don't see a way to do something similar in VMOD re2.
merged
mentioned in commit
c963694419@geoff thank you for the review and merge.
I actually did ponder this a bit: This is similar to the partial match with
match_iter_f(), but there we can reserve the workspace (because we are inside a VMOD function/method call from VCL), which, IIUC, we can not for the filter, because other filters could also request workspace.So with
match_iter_f(), one could run into out-of-workspace conditions (and I wonder if also usingmalloc()there would be the better option), while with the filter, we could request arbitrarily big buffers on the heap.At least the code uses
memmove()to avoid alloc/free cycles except for one case.Yes. When pcre tells us there is a partial match at the end of the buffer, we need to save that and call pcre again when we have more data or "EOF" at the next filter invocation.
I did spend quite some time with these pathological cases, but by artificially decreasing the buffer length rather than increasing the span of the pcre match. See
uplex-varnish/libvmod-re@c963694419/src/vmod_re.c (L1096-1100)IIUC for now, the RE2 devs dismissed this use case.
Feel free!