1. 17 Jan, 2020 3 commits
    • Nils Goroll's avatar
      TAKE_OBJ_NOTNULLification: cases with the first AN(objp) missing · 65cd2e05
      Nils Goroll authored
      in these cases, the initial AN(objp) was missing, so we would
      potentially dereference a NULL pointer without an explicit assertion
      failure.
      
      in VCL_Rel(), AN(*vcc) was likely just a typo resulting in a semantic
      noop (the check was a duplication of the NULL check in
      CHECK_OBJ_NOTNULL).
      
      ... more examples why using the macro is a good idea
      65cd2e05
    • Nils Goroll's avatar
      more TAKE_OBJ_NOTNULLification · 7ce983ec
      Nils Goroll authored
      command:
      
      spatch -I include/ -I bin/varnishd/ --dir . --in-place \
      	--sp-file tools/coccinelle/take_obj_notnull.cocci
      7ce983ec
    • Dridi Boukelmoune's avatar
      Stabilize s10, again · 4fe88185
      Dridi Boukelmoune authored
      The previous stabilization turned out not to reliably work with FreeBSD
      on aarch64 hardware. It was still an improvement overall, but the test
      case turned out to be a bit cryptic and when [idle_]send_timeout support
      landed in VCL it wasn't obvious how to add coverage.
      
      This attempt bites the bullet and defines one pair of client/logexpect
      instances per use case and clearly (I hope) indicates why it is doing
      things or not doing them.
      
      Since we now have two clients that aren't expected to complete before
      the test case itself the server is in dispatch mode instead of repeating
      its scenario. Using barriers in dispatch mode should raise a red flag
      for any reviewer, but in this case the barriers outside the server are
      properly serialized, are systematically used by every single client, and
      as a result should be safe.
      
      As usual, personal testing limited to x86_64 Linux and FreeBSD.
      4fe88185
  2. 16 Jan, 2020 8 commits
    • Nils Goroll's avatar
      s00010: try a slight variation · 55ef8e00
      Nils Goroll authored
      when we stall sending on the backend side, we _have_ to run into a
      timeout on the client side, haven't we?
      55ef8e00
    • Nils Goroll's avatar
      even more session overflow test tweaking · f5985938
      Nils Goroll authored
      and enable debug output for better failure analysis
      f5985938
    • Nils Goroll's avatar
      even more s00010.vtc tweaking · 66b27ae1
      Nils Goroll authored
      Furhter lighten the load for the test to succeed on FreeBSD vtest,
      hopefully.
      
      Also, when stars align, we do not git the busy object (streaming) and do
      not see chunked encoding.
      66b27ae1
    • Nils Goroll's avatar
      more numbers fiddling · 3d4bb424
      Nils Goroll authored
      we need a way for vtc to reach into session setup I guess
      3d4bb424
    • Nils Goroll's avatar
      s00010.vtc: lower pressure from timeout handling · 5b90890f
      Nils Goroll authored
      freebsd vtest boxes cannot complete the c2 response within the 20s
      extended send_timeout, apparently for too much overhead from timeout
      handling.
      
      Try to fix with less pressure
      5b90890f
    • Nils Goroll's avatar
      83b238ed
    • Nils Goroll's avatar
      panic handling can take a couple seconds · e733fce8
      Nils Goroll authored
      e733fce8
    • Nils Goroll's avatar
      bump default workspace_session · 0f3bb354
      Nils Goroll authored
      02bc0a68 requires 24 bytes more on
      64bit, which brought o00005.vtc just over the top on smartos 64bit
      machines (the vtc tests PROXY TLVs, which get stored on the session
      workspace).
      
      Increase the default workspace_session by 256bytes (50%) to accomodate
      sane proxy TLV lengths by default.
      
      This is about 10x more than the minimum increase sufficient to
      accomodate the increased workspace usage. I went for 768 bytes because
      
      - this gives us the next least common multiple (12k) with the common 4K
        page size unless we went up to 1k
      
      - we were already quite tight with the default and could not accomodate
        common proxy TLV use (see for example #3131)
      
      As a ballpart reality check, I do not think that spending an additional
      256MB should be an issue for users supporting 1M sessions.
      0f3bb354
  3. 15 Jan, 2020 19 commits
    • Nils Goroll's avatar
      Add vcl control over timeout_linger, send_timeout, idle_send_timeout · 02bc0a68
      Nils Goroll authored
      and test that they work for H1
      
      To @Dridi, I would appreciate if you could have another look at s10.vtc.
      I *tried* to keep your good ideas intact, and hope you can live with
      these changes.
      02bc0a68
    • Nils Goroll's avatar
      A request body now might have up to two references · 4e6e4eed
      Nils Goroll authored
      since d4b6228e the busyobj might also
      gain one, so depending on who deref's first, there may be one or zero
      references left.
      4e6e4eed
    • Nils Goroll's avatar
      7a1f7b2a
    • Nils Goroll's avatar
      simplify as suggested by @Dridi · 8dd0a73f
      Nils Goroll authored
      thank you!
      8dd0a73f
    • Nils Goroll's avatar
      Add information about vcl object instances to the panic output · 5df27a08
      Nils Goroll authored
      In the absence of a core dump, we do not have any information yet in the
      panic output about vcl object instances, for example to find out which
      object a priv belongs to when the instance address is used for
      per-instance priv state.
      
      To make this information available at the time of a panic, we add the
      following:
      
      * A struct vrt_ii (for instance info), of which a static gets
        filled in by VCC to contain the pointers to the C global variable
        instance pointers at VCC time
      
      * A pointer to this struct from the VCL_conf to make it available to
        the varnishd worker
      
      * dumps of the instance info for panics
      5df27a08
    • Geoff Simmons's avatar
      Add capability to send the authority TLV in the PROXY header · 99ec7796
      Geoff Simmons authored
      This gives the receiver of the PROXY header (usually the ssl-onloader)
      the opportunity to set the SNI (HostName field) from the TLV value, for
      the TLS handshake with the remote backend.
      
      From
      https://github.com/nigoroll/varnish-cache/commit/e0eb7d0a9c65cdc3c58978656b4c71f4ab8aabca
      edited by @nigoroll to split out the proxy header functionality.
      
      Add vmod_debug access to the proxy header formatting and test it
      99ec7796
    • Nils Goroll's avatar
      format proxy header on the stack · a74315bc
      Nils Goroll authored
      a74315bc
    • Nils Goroll's avatar
    • Nils Goroll's avatar
      wrap VPX_Format_Proxy for VRT · de50fefc
      Nils Goroll authored
      de50fefc
    • Nils Goroll's avatar
      Add VPX_Format_Proxy · 461a22a4
      Nils Goroll authored
      461a22a4
    • Nils Goroll's avatar
      split out proxyv2 formatting · 519737ac
      Nils Goroll authored
      519737ac
    • Nils Goroll's avatar
      split out proxyv1 formatting · 6a08beba
      Nils Goroll authored
      Note: this partially reverts cf14a0fd
      to prepare for bytes accounting in a later patch
      6a08beba
    • Nils Goroll's avatar
      remove a now pointless vtc · 5fe2a46d
      Nils Goroll authored
      This test is to detect a deadlock which does not exist any more. IMHO,
      the only sensible way to test for the lack of it now is to do a load
      test, which is not what we want in vtc.
      5fe2a46d
    • Nils Goroll's avatar
      generalize the worker pool reserve to avoid deadlocks · 3bb8b84c
      Nils Goroll authored
      Previously, we used a minimum number of idle threads (the reserve) to
      ensure that we do not assign all threads with client requests and no
      threads left over for backend requests.
      
      This was actually only a special case of the more general issue
      exposed by h2: Lower priority tasks depend on higher priority tasks
      (for h2, sessions need streams, which need requests, which may need
      backend requests).
      
      To solve this problem, we divide the reserve by the number of priority
      classes and schedule lower priority tasks only if there are enough
      idle threads to run higher priority tasks eventually.
      
      This change does not guarantee any upper limit on the amount of time
      it can take for a task to be scheduled (e.g. backend requests could be
      blocking on arbitrarily long timeouts), so the thread pool watchdog is
      still warranted. But this change should guarantee that we do make
      progress eventually.
      
      With the reserves, thread_pool_min needs to be no smaller than the
      number of priority classes (TASK_QUEUE__END). Ideally, we should have
      an even higher minimum (@Dridi rightly suggested to make it 2 *
      TASK_QUEUE__END), but that would prevent the very useful test
      t02011.vtc.
      
      For now, the value of TASK_QUEUE__END (5) is hardcoded as such for the
      parameter configuration and documentation because auto-generating it
      would require include/macro dances which I consider over the top for
      now. Instead, the respective places are marked and an assert is in
      place to ensure we do not start a worker with too small a number of
      workers. I dicided against checks in the manager to avoid include
      pollution from the worker (cache.h) into the manager.
      
      Fixes #2418 for real
      3bb8b84c
    • Dridi Boukelmoune's avatar
      Remove varnishd -C coverage · 75cca3cd
      Dridi Boukelmoune authored
      This check is not deterministic because vmod_std may indeed be found in
      the default vmod_path defined at configure time.
      75cca3cd
    • Dridi Boukelmoune's avatar
      Whitespace OCD · 1833d7dd
      Dridi Boukelmoune authored
      1833d7dd
    • Martin Blix Grydeland's avatar
      Fail fetch retries when uncached request body has been released · f88b4795
      Martin Blix Grydeland authored
      Currently we allow fetch retries with body even after we have released the
      request that initiated the fetch, and the request body with it. The
      attached test case demonstrates this, where s2 on the retry attempt gets
      stuck waiting for 3 bytes of body data that is never sent.
      
      Fix this by keeping track of what the initial request body status was, and
      failing the retry attempt if the request was already released
      (BOS_REQ_DONE) and the request body was not cached.
      f88b4795
    • Martin Blix Grydeland's avatar
      Fetch thread reference count and keep cached request bodies · d4b6228e
      Martin Blix Grydeland authored
      With this patch fetch threads will for completely cached request bodies
      keep a reference to it for the entire duration of the fetch. This extends
      the retry window of backend requests with request body beyond the
      BOS_REQ_DONE point.
      
      Patch by: Poul-Henning Kamp
      d4b6228e
    • Dridi Boukelmoune's avatar
      Assert · eb14a0b6
      Dridi Boukelmoune authored
      eb14a0b6
  4. 14 Jan, 2020 5 commits
    • Nils Goroll's avatar
      avoid the STV_close() race for now · 4df4d2a4
      Nils Goroll authored
      See #3190
      4df4d2a4
    • Nils Goroll's avatar
      stop the expiry thread before closing stevedores · 34b687e6
      Nils Goroll authored
      This should fix the panic mentioned in
      309e807d
      34b687e6
    • Nils Goroll's avatar
      4c7108b1
    • Nils Goroll's avatar
      try to narrow down a umem panic observed in vtest b00035.vtc · 309e807d
      Nils Goroll authored
      is it a race with _close ?
      
      ***  v1   debug|Child (369) Panic at: Tue, 14 Jan 2020 12:06:12 GMT
      ***  v1   debug|Wrong turn at
      ../../../bin/varnishd/cache/cache_main.c:284:
      ***  v1   debug|Signal 11 (Segmentation Fault) received at b4 si_code 1
      ***  v1   debug|version = varnish-trunk revision
      b8b798a0, vrt api = 10.0
      ***  v1   debug|ident = -jsolaris,-sdefault,-sdefault,-hcritbit,ports
      ***  v1   debug|now = 2786648.903965 (mono), 1579003571.310573 (real)
      ***  v1   debug|Backtrace:
      ***  v1   debug|  80e1bd8: /tmp/vtest.o32_su12.4/varnish-cache/varnish-trunk/_build/bin/varnishd/varnishd'pan_backtrace+0x18 [0x80e1bd8]
      ***  v1   debug|  80e2147: /tmp/vtest.o32_su12.4/varnish-cache/varnish-trunk/_build/bin/varnishd/varnishd'pan_ic+0x2c7 [0x80e2147]
      ***  v1   debug|  81b9a6f: /tmp/vtest.o32_su12.4/varnish-cache/varnish-trunk/_build/bin/varnishd/varnishd'VAS_Fail+0x4f [0x81b9a6f]
      ***  v1   debug|  80d7fba: /tmp/vtest.o32_su12.4/varnish-cache/varnish-trunk/_build/bin/varnishd/varnishd'child_signal_handler+0x27a [0x80d7fba]
      ***  v1   debug|  fed92695: /lib/libc.so.1'__sighndlr+0x15 [0xfed92695]
      ***  v1   debug|  fed86c8b: /lib/libc.so.1'call_user_handler+0x298 [0xfed86c8b]
      ***  v1   debug|  fda8a93e: /lib/libumem.so.1'umem_cache_free
      ***  v1   debug|+0x23 [0xfda8a93e]
      ***  v1   debug|  817f3bc: /tmp/vtest.o32_su12.4/varnish-cache/varnish-trunk/_build/bin/varnishd/varnishd'smu_free+0x35c [0x817f3bc]
      ***  v1   debug|  817aa21: /tmp/vtest.o32_su12.4/varnish-cache/varnish-trunk/_build/bin/varnishd/varnishd'sml_stv_free+0x101 [0x817aa21]
      ***  v1   debug|  817b4eb: /tmp/vtest.o32_su12.4/varnish-cache/varnish-trunk/_build/bin/varnishd/varnishd'sml_slim+0x2cb [0x817b4eb]
      ***  v1   debug|thread = (cache-exp)
      ***  v1   debug|thr.req = 0 {
      ***  v1   debug|},
      ***  v1   debug|thr.busyobj = 0 {
      ***  v1   debug|},
      ***  v1   debug|vmods = {
      ***  v1   debug|},
      ***  v1   debug|
      ***  v1   debug|
      ***  v1   debug|Info: Child (369) said Child dies
      ***  v1   debug|Debug:
      ***  v1   debug| Child cleanup complete
      ***  v1   debug|
      309e807d
    • Nils Goroll's avatar
      Revert "does the umem backend affect the amount of malloc NULL returns in vtest?" · b8b798a0
      Nils Goroll authored
      This reverts commit 8ea006ee.
      
      does not seem to make a difference, trying to narrow down using other
      means (different platforms)
      b8b798a0
  5. 13 Jan, 2020 5 commits