1. 03 Aug, 2021 5 commits
  2. 02 Aug, 2021 2 commits
    • Nils Goroll's avatar
      VRE: bounds check back references in VRE_sub() · 3fdce6cf
      Nils Goroll authored
      Before 6014912e, VRE_sub() used an
      ovector of size 30, which always containted sufficient space to store
      the 10 possible back- references \0 thorugh \9.
      
      Now that we use pcre2_match_data_create_from_pattern() and later
      pcre2_get_ovector_pointer(), we only get space for the number of
      substrings in the pattern, see pcre2api(3):
      
      	The ovector is created to be exactly the right size to hold
      	all the substrings a pattern might capture.
      
      Consequently, we need to check that back references do not exceed the
      maximum ovector.
      3fdce6cf
    • Poul-Henning Kamp's avatar
      069b78a4
  3. 19 Jul, 2021 5 commits
  4. 15 Jul, 2021 1 commit
  5. 13 Jul, 2021 4 commits
  6. 07 Jul, 2021 2 commits
  7. 06 Jul, 2021 3 commits
    • Dridi Boukelmoune's avatar
      vre: Migrate to pcre2 · 6014912e
      Dridi Boukelmoune authored
      Now that VRE is the only regular expression API we use, we can migrate
      its backend to pcre2. The existing 'pcre_*' parameters are also renamed
      to reflect this migration, and 'pcre_match_limit_recursion' gets special
      treatment and is renamed to pcre2_depth_limit.
      
      This creates an additional API breakage in VRE: the `match_recursion`
      field in `struct vre_limits` is also renamed. One last breakage is the
      removal of `VRE_has_jit` used by only one undocumented varnishtest
      feature, and the pcre_jit feature is only used by one test case that no
      longer fails.
      
      The pcre jit compilation feature was broken anyway: sealing it at
      compile time will not reflect what VRE actually links to. Once we have
      a test case needing the jit feature, we can introduce a better API for
      that check.
      
      There is one outstanding performance problem, the ovector that was
      previously allocated on the stack now needs to be allocated from the
      heap. It might be possible to implement a pcre2 context to fix that or
      maybe pool them, but for now we have heap allocations on the critical
      path. The VRE_sub() function makes sure to make a single ovector
      allocation (technically a pcre2_match_data allocation) since it's the
      only one guaranteed to loop on a single regular expression for the
      `regsuball()` use case.
      
      On the documentation front, the SmartOS installation instructions are
      hidden for lack of a pcre2 package.
      
      Closes #3616
      Closes #3559
      6014912e
    • Dridi Boukelmoune's avatar
      Revert "circleci: Disable developer warnings for alpine" · e34dbe1c
      Dridi Boukelmoune authored
      This reverts commit f254fff7.
      
      Conflicts:
      	.circleci/config.yml
      e34dbe1c
    • Dridi Boukelmoune's avatar
      build: Don't warn on system headers · f633c8e5
      Dridi Boukelmoune authored
      Since we turn warnings into errors, that means failing the build because
      something we have no control over in /usr/include does not have a strict
      prototype or some other shenanigan.
      
      Just in case it might be added by Wall or Wextra in a future clang or
      GCC release, we may disable it explicitly too.
      
      Refs #3565
      f633c8e5
  8. 05 Jul, 2021 16 commits
    • Dridi Boukelmoune's avatar
      ban: Migrate to VRE · 9219b0dc
      Dridi Boukelmoune authored
      Bans using regular expressions will consume slightly more space, but
      more importantly that breaks persistence binary compatibility. That's
      not a concern because we are both planning for a major release where
      that kind of breakage is acceptable, and in the context of a pcre2
      migration we would also break ban persistence.
      
      And now, VRE is the sole pcre consumer.
      9219b0dc
    • Dridi Boukelmoune's avatar
      vre: New VRE_export() function · a635df72
      Dridi Boukelmoune authored
      It packs a vre_t and a pcre in a single allocation that can be used by
      both VRE_match() and VRE_sub().
      a635df72
    • Dridi Boukelmoune's avatar
      vre: Add a jit argument to VRE_compile() · 8cdba637
      Dridi Boukelmoune authored
      We always use it internally except:
      
      - where the pcre2_jit_compilation parameter applies
      - when libvcc verifies that a regular expression compiles
      
      For the latter, the verification would attempt a jit compilation as well
      until now, but we no longer need to waste cycles on that.
      8cdba637
    • Dridi Boukelmoune's avatar
      6715fb68
    • Dridi Boukelmoune's avatar
      vre: Extract VRE_error() from VRE_compile() · 3e254078
      Dridi Boukelmoune authored
      This is a major step back in terms of error reporting, but I haven't
      found anything in libpcre to translate an error code to an error
      message. "pcre error %d" should still be a good enough hint.
      
      Looking forward, we are going to need this for libpcre2 but again the
      API leaves some to be desired and only works by writing error messages
      to buffers.
      
      The return value implies that we got a valid error code, which will be
      verified with pcre2. I couldn't find how to verify error codes with pcre.
      Writing to the VSB is however fail-safe, it's the caller's job to deal
      with a VSB error.
      3e254078
    • Dridi Boukelmoune's avatar
      vre: Replace VRE_exec() with a simpler VRE_match() · 520cd8af
      Dridi Boukelmoune authored
      With the introduction of VRE_sub() many VRE_exec() use cases went away:
      
      - VRE_NOTEMPTY flag
      - start offset
      - ovector for capture groups
      
      The subject string length is now optional and zero can be passed to signal
      that the subject string is null-terminated.
      
      There are no options for VRE_match() but an option argument is kept in
      case we start exposing some in the future.
      520cd8af
    • Dridi Boukelmoune's avatar
      vre: Extract a VRE_sub() function from VRT_regsub() · 0e3667c5
      Dridi Boukelmoune authored
      This gives us a clean separation of VCL and pcre interactions.
      0e3667c5
    • Dridi Boukelmoune's avatar
    • Dridi Boukelmoune's avatar
      varnishtest: New ${string,<action>[,<args>...]} macro · 5c248efc
      Dridi Boukelmoune authored
      Its first action ${string,repeat,<uint>,<string>} helps simplify many
      unwieldy test cases that will hopefully be easier to edit from now on.
      5c248efc
    • Dridi Boukelmoune's avatar
      1824ba01
    • Dridi Boukelmoune's avatar
      varnishtest: Allow macros to be backed by functions · 12cd341e
      Dridi Boukelmoune authored
      Instead of having a mere value, these would be able to compute a macro
      expansion. We parse the contents inside the ${...} delimiters as a VAV,
      but there can't be (yet?) nested curly {braces}, even quoted.
      
      The first argument inside the delimiters is the macro name, and other
      VAV arguments are treated as arguments to the macro's function.
      
      For example ${foo,bar,baz} would call the a macro "foo"'s function with
      arguments "bar" and "baz". Simple macros don't take arguments and work
      as usual.
      12cd341e
    • Dridi Boukelmoune's avatar
      varnishtest: Replace macro_get() with macro_cat() · eb71eae7
      Dridi Boukelmoune authored
      The latter operates on a VSB, which is always what call sites are doing
      anyway. It also takes the responsibility of ignoring unknown macros, in
      preparation for more responsibilities that will also require the ability
      to fail a test case.
      eb71eae7
    • Dridi Boukelmoune's avatar
      vav: Tighten arguments parsing · a2ab44de
      Dridi Boukelmoune authored
      Commas unlike spaces are hard separators, so a trailing comma leads to a
      last empty parameter.
      
      A comma may appear after an argument's "trailing" spaces and should not
      result in an additional parameter. In <foo  , bar> we should expect two
      fields, not three.
      
      Comments are only treated as such at arguments boundaries: <foo #bar>
      parses one field <foo> and <foo#bar> parses one field <foo#bar>, taking
      the shell word splitting as the model, cementing what was the existing
      VAV behavior in the first place.
      
      Unlike the shell, we don't expect quotes to start in the middle of a
      token so <foo"bar> is invalid unless escaping was disabled.
      
      Fields that are quoted need a separator: <"foo""bar"> is therefore
      invalid unless escaping was disabled.
      a2ab44de
    • Dridi Boukelmoune's avatar
      vav: Apparently we can't trust sscanf(3) · 4f1fd412
      Dridi Boukelmoune authored
      At least not on my system, where "x%02x" doesn't strictly require 2
      hexadecimal digits.
      4f1fd412
    • Dridi Boukelmoune's avatar
      97e6818a
    • Dridi Boukelmoune's avatar
      git: Ignore vav_test · ec2c30ac
      Dridi Boukelmoune authored
      ec2c30ac
  9. 02 Jul, 2021 2 commits
    • Dridi Boukelmoune's avatar
      Revert "vav: Treat a trailing comma as a separator" · 17d773c3
      Dridi Boukelmoune authored
      This reverts commit d3df1f64.
      
      I didn't mean to push it, it predates VAV's test driver.
      17d773c3
    • Dridi Boukelmoune's avatar
      vav: Treat a trailing comma as a separator · d3df1f64
      Dridi Boukelmoune authored
      Let's consider the following VAV strings:
      
          "foo bar baz"
          "foo,bar,baz"
          " foo bar baz "
          " foo,bar,baz "
          "  foo  bar  baz  "
      
      They are all equivalent because consecutive spaces are considered to
      form a single separator. However, consecutive commas aren't:
      
          "foo,bar,baz"
          "foo,,bar,,baz"
      
      In the example above the first string has 3 arguments while the second
      has 5 of them. This behavior was however inconsistent with trailing
      commas:
      
          "foo,bar,baz"
          "foo,bar,baz,"
          "foo,bar,baz,,"
      
      When it comes to trailing commas the first two strings above would
      contain 3 arguments, and the last string would contain 4 arguments.
      
      With this change, they respectively contain 3, 4 and 5 arguments.
      d3df1f64