1. 11 Feb, 2019 1 commit
  2. 03 Jan, 2019 1 commit
  3. 02 Jan, 2019 1 commit
    • Pål Hermunn Johansen's avatar
      Fix a panic in VRB_Cache · 49ad1b21
      Pål Hermunn Johansen authored
      This adds error handling for STV_NewObject(.., TRANSIENT) in VRB_Cache,
      which would fail when transient is full.
      
      This is a back port of 6045eaaa.
      
      Fixes: #2831
      
      Conflicts:
      	bin/varnishd/cache/cache_req_body.c
      	bin/varnishtest/tests/r02831.vtc
      49ad1b21
  4. 05 Dec, 2018 1 commit
  5. 05 Oct, 2018 2 commits
  6. 26 Sep, 2018 2 commits
  7. 18 Sep, 2018 1 commit
  8. 11 Sep, 2018 1 commit
  9. 10 Sep, 2018 2 commits
  10. 07 Sep, 2018 1 commit
    • Pål Hermunn Johansen's avatar
      Reintroduce the req.grace variable, change keep behavior · d2d09318
      Pål Hermunn Johansen authored
      This is a back port of ff38535a
      
      The req.grace variable can be set in vcl_recv to cap the grace
      of objects in the cache, in the same way as in 3.0.x
      
      The "keep" behavior changes with this patch. We now always go
      to vcl_miss when the expired object is out of grace, or we go
      to the waiting list. The result is that it is no longer
      possible to deliver a "keep" object in vcl_hit.
      
      Note that when we get to vcl_miss, we will still have the 304
      candidate, but without the detour by vcl_hit.
      
      This commit changes VCL, but only slightly, so we aim to back
      port this to earlier versions of Varnish Cache.
      
      Refs: #1799 and #2519
      
      Conflicts:
      	bin/varnishd/cache/cache_hash.c
      	bin/varnishd/cache/cache_req_fsm.c
      	bin/varnishd/cache/cache_varnishd.h
      	bin/varnishd/cache/cache_vrt_var.c
      	doc/sphinx/reference/vcl_var.rst
      d2d09318
  11. 05 Jun, 2018 1 commit
  12. 01 Jun, 2018 1 commit
    • Nils Goroll's avatar
      ban lurker should back off on seeing a busy object · 7906a417
      Nils Goroll authored
      HSH_Unbusy() calls BAN_NewObjCore() not holding the objhead
      lock, so the ban lurker may race and grab the ban mtx just
      after the new oc has been inserted, but the busy flag not
      yet cleared.
      
      While it would be correct to call BAN_NewObjCore() with the
      objhead mtx held, doing so would increase the pressure on the
      combined ban & objhead mtx.
      
      If the ban lurker encounters a busy object, we know that there
      must be an unbusy in progress and it would be wiser to rather
      back off in favor of the it.
      
      Fixes #2681
      7906a417
  13. 25 Apr, 2018 2 commits
    • Pål Hermunn Johansen's avatar
      Prepare for 4.1.10 release · 1d090c5a
      Pål Hermunn Johansen authored
      1d090c5a
    • Nils Goroll's avatar
      Pass delivery abandoned does not qualify as an error · f609870a
      Nils Goroll authored
      ... so log it under the Debug tag.
      
      FetchErrors should be actual errors which can be addressed. In this case,
      nothing is wrong in any way, the fact that we abort a fetch if we don't
      need the body is a varnish internal optimization (which makes sense, but
      comes at the cost of closing a connection).
      
      Merges #2450
      
      Conflicts:
      	bin/varnishd/cache/cache_fetch.c
      f609870a
  14. 24 Apr, 2018 2 commits
  15. 19 Apr, 2018 1 commit
  16. 12 Apr, 2018 1 commit
  17. 11 Apr, 2018 1 commit
    • Nils Goroll's avatar
      Do not possibly underflow rlen · 50bfa24e
      Nils Goroll authored
      for i < 0, rlen could underflow. We are safe because of the check for
      i < 0 further down, so this change is just a minor cleanup.
      
      Fixes #2444
      50bfa24e
  18. 25 Feb, 2018 1 commit
  19. 24 Feb, 2018 1 commit
  20. 23 Feb, 2018 3 commits
    • Poul-Henning Kamp's avatar
      EPIPE is a documented errno in tcp(7) on linux · 24ac14da
      Poul-Henning Kamp authored
      Fixes: #2582
      24ac14da
    • Federico G. Schwindt's avatar
      Fix crash under MacOS while investigating #2332 · ef430670
      Federico G. Schwindt authored
      MacOS will return EINVAL under e.g. setsockopt if the connection was
      reset.
      ef430670
    • Dridi Boukelmoune's avatar
      Don't test gunzip for partial responses · 94a7f427
      Dridi Boukelmoune authored
      Some user agents like Safari may "probe" specific resources like medias
      before getting the full resources usually asking for the first 2 or 11
      bytes, probably to peek at magic numbers to figure early whether a
      potentially large resource may not be supported (read: video).
      
      If the user agent also advertises gzip support, and the transaction is
      known beforehand to not be cacheable, varnishd will forward the Range
      header to the backend:
      
          Accept-Encoding: gzip (when http_gzip_support is on)
          Range: bytes=0-1
      
      If the response happens to be both encoded and partial, the gunzip test
      cannot be performed. Otherwise we systematically end up with a broken
      transaction closed prematuraly:
      
          FetchError b tGunzip failed
          Gzip b u F - 2 0 0 0 0
      
      Refs #2530
      Refs #2554
      94a7f427
  21. 22 Feb, 2018 2 commits
    • Dag Haavi Finstad's avatar
      Stabilize r01764.vtc · df906159
      Dag Haavi Finstad authored
      The non-fatal ("non_fatal" in master) was omitted when this was first
      backported.
      df906159
    • Dag Haavi Finstad's avatar
      Avoid leaking an OH ref on reembark failure · bf752eff
      Dag Haavi Finstad authored
      This is a backport of 5cc47eaa.
      
      With this commit hsh_rush has been split into hsh_rush and
      hsh_rush_clean. The former needs to be called while holding the OH lock,
      and the latter needs to be called without holding the lock.
      
      The reason for this added complexity is that we can't hold the lock
      while calling HSH_DerefObjHead.
      
      Fixes: #2495
      bf752eff
  22. 21 Feb, 2018 1 commit
    • Pål Hermunn Johansen's avatar
      Fix issue #1799 for keep · 5c24e1f8
      Pål Hermunn Johansen authored
      This fixes the long-standing #1799 for "keep" objects, and this commit
      message suggests a way of working around #1799 in the remaining
      cases. The following is a (long) explanation on how grace and keep
      works at the moment, how this relates to #1799, and how this commit
      changes things.
      
      1. How does it work now, before this commit?
      
      Objects in cache can outlive their TTL, and the typical reason for
      this is grace. Objects in cache can also linger because of obj.keep or
      in the (rare but observed) case where the expiry thread have not yet
      evicted an object. Grace and keep are here to minimize backend load,
      but #1799 shows that we are not successful in doing this in some
      important cases.
      
      Whenever sub vcl_recv has ended with return (lookup) (which is the
      default action), we arrive at HSH_Lookup, where varnish sometimes only
      finds an expired object (that match Vary logic, is not banned,
      etc). When this happens, we will initiate a background fetch (by
      adding a "busy object") if and only if there is no busy object on the
      oh already. Then the expired object is returned with HSH_EXP or
      HSH_EXPBUSY, depending on whether a busy object was inserted.
      
      2. What makes us run into #1799?
      
      When we have gotten an expired object, we generally hope that it is in
      grace, and that sub vcl_hit will return(deliver). However, if grace
      has expired, then the default action (ie the action from builtin.vcl)
      is return (miss). It is also possible that the user vcl, for some
      reason, decides that the stale object should not be delivered, and
      does return (miss) explicitly. In these cases it is common that the
      current request is not the one to insert a busy object, and then we
      run into the issue with a message "vcl_hit{} returns miss without busy
      object. Doing pass.".
      
      Note that normally, if a resource is very popular and has a positive
      grace, it is unlikely that #1799 will happen. Then a new version will
      always be available before the grace has run out, and everybody get
      the latest fetched version with no #1799 problems.
      
      However, if a resource is very popular (like a manifest file in a live
      streaming setup) and has 0s grace, and the expiry thread lags a little
      bit behind, then vcl_hit can get an expired object even when obj.keep
      is zero. In these circumstances we can get a surge of requests to the
      backend, and this is especially bad on a very busy server.
      
      Another real world example is where grace is initially set high (48h
      or similar) and vcl_hit considers the health of the backend, and, if
      the backend is healthy, explicitly does a return(miss) ensure that the
      client gets a fresh object. This has been a recommended use of
      vcl_hit, but, because of #1799, can cause considerable load on the
      backend.
      
      Similarly, we can get #1799 if we use "keep" to facilitate IMS
      requests to the backend, and we have a stale object for which several
      requests arrive before the first completes.
      
      3. How do we fix this?
      
      The main idea is to teach varnish to consider grace during lookup.
      
      To be specific, the following changes with this commit: If an expired
      object is found, the ttl+grace has expired and there already is an
      ongoing request for the object (ie. there exists a busy object), then
      the request is put on the waiting list instead of simply returning the
      object ("without a busy object") to vcl_hit. This choice is made
      because we anticipate that vcl_hit will do (the default) return (miss)
      and that it is better to wait for the ongoing request than to initiate
      a new one with "pass" behavior.
      
      The result is that when the ongoing request finishes, we will either
      be able to go to vcl_hit, start a new request (can happen if there was
      a Vary mismatch) by inserting a new "busy object", or we lose the race
      and have to go back to the waiting list (typically unlikely).
      
      When grace is in effect we go to vcl_hit even when we did not insert a
      busy object, anticipating that vcl_hit will return (deliver).
      
      This will will fix the cases where the user does not explicitly do a
      return(miss) in vcl_hit for object where ttl+grace has not
      expired. However, since this is not an uncommon practice, we also have
      to change our recommendation on how to use grace and keep. The new
      recommendation will be:
      
      * Set grace to the "normal value" for a working varnish+backend.
      
      * Set keep to a high value if the backend is not 100% reliable and you
        want to use stale objects as a fallback.
      
      * Do not explicitly return(miss) in sub vcl_hit{}. The exception is
        when this only can happen now and then and you are really sure that
        this is the right thing to do.
      
      * In vcl_hit, check if the backend is sick, and then explicitly
        return(deliver) when appropriate (ie you want an stale object
        delivered instead of an error message).
      
      A test case is included.
      5c24e1f8
  23. 20 Feb, 2018 1 commit
    • Pål Hermunn Johansen's avatar
      Introduce ttl_now and the new way of calculating TTLs in VCL · a02e4f27
      Pål Hermunn Johansen authored
      This is a back port of 33143e05 in
      master, and for this reason it is a little strange.
      
      The strangeness is due to the fact that obj.ttl is not available
      in vcl_deliver here, but it is in master. This commit could have
      been much simpler without ttl_now, but the function is taken to
      4.1 regardles. The reason is that introducing obj.ttl in
      vcl_deliver is straightforward, and if someone is to do that in
      the future, the code in ttl_now(VRT_CTX) makes sure that obj.ttl
      will behave as in master, also in vcl_deliver.
      
      If obj.ttl is introduced in vcl_deliver, than also the two test
      cases s00008.vtc and s00009.vtc should be brought in, to make sure
      that obj.ttl works as expected.
      
      The following is the test from the commit in master:
      
      A new fucntion, ttl_now(VRT_CTX), defines what "now" is when ttl
      and age are calculated in various VCL subs. To sum up,
      
      * Before a backend fetch on the client side (vcl_recv, vcl_hit,
        vcl_miss) we use t_req from the request. This is the significance
        in this commit, and fixes the bug demonstrated by r02555.vtc.
      * On the backend side, most notably vcl_backend_responce, we keep
        the old "now" by simply using ctx->now.
      * In vcl_deliver we use ctx->now, as before.
      
      It was necessary to make all purges use t_req as their base time.
      Then, to not break c00041.vtc it was necessary to change from ">="
      to ">" in HSH_Lookup.
      
      All VMODs that currently use HSH_purge must change to using
      VRT_purge.
      
      Conflicts:
      	bin/varnishd/cache/cache_hash.c
      	bin/varnishd/cache/cache_objhead.h
      	bin/varnishd/cache/cache_req_fsm.c
      	bin/varnishd/cache/cache_vrt.c
      	bin/varnishd/cache/cache_vrt_var.c
      a02e4f27
  24. 14 Feb, 2018 1 commit
  25. 18 Dec, 2017 2 commits
  26. 14 Dec, 2017 2 commits
  27. 30 Nov, 2017 1 commit
    • Pål Hermunn Johansen's avatar
      Add cache_hit_grace counter · a33bf527
      Pål Hermunn Johansen authored
      The counter cache_hit_grace counts the number of grace hits. To be
      precise, it counts the number of times lookup returns an expired
      object, but vcl_hit is called and decides to return(deliver).
      
      Every time cache_hit_grace is incremented, cache_hit is also
      incremented (so this commit does not change the cache_hit counter).
      
      This is a back port of 1d62f5da.
      a33bf527
  28. 27 Nov, 2017 3 commits