Commits · 49ad1b2119eb984280b26551d702cf122fd9eb24 · varnishcache / varnish-cache

02 Jan, 2019 1 commit

Pål Hermunn Johansen authored Jan 02, 2019

This adds error handling for STV_NewObject(.., TRANSIENT) in VRB_Cache,
which would fail when transient is full.

This is a back port of 6045eaaa.

Fixes: #2831

Conflicts:
	bin/varnishd/cache/cache_req_body.c
	bin/varnishtest/tests/r02831.vtc

49ad1b21

05 Dec, 2018 1 commit
- Stabilize b000064.vtc for real · 2c1b624e
  Pål Hermunn Johansen authored Aug 31, 2018
```
Fixes: #2753

Conflicts:
	bin/varnishtest/tests/b00064.vtc
```
  2c1b624e
05 Oct, 2018 2 commits
- Set the task arguments under the lock · 2fdd6821
  Federico G. Schwindt authored Jul 13, 2018
```
I've been torturing varnish with this change for some time and was
not able to reproduce the problem.

Should fix #2719.
```
  2fdd6821
- Fix copy&paste sloppyness in http_resp_size documentation · 24f57e04
  Poul-Henning Kamp authored Jun 06, 2018
```
Fixes	#2684

Reported by:	ernestojpg@github
```
  24f57e04
26 Sep, 2018 2 commits
- clarify that we handle references correctly · 44bf9f3c
  Nils Goroll authored Sep 26, 2018
```
This is no semantic change, but rather than indirectly checking
via the retval, we might also check the reason for keeping a
reference (or rather, not).
```
  44bf9f3c
- stabilize b63.vtc · 00ad6d20
  Poul-Henning Kamp authored Jul 02, 2018
  
  00ad6d20
18 Sep, 2018 1 commit
- On startup, tell what varnish version this is · 1d767f46
  Federico G. Schwindt authored Apr 26, 2018
```
Fixes #2661.

Conflicts:
	bin/varnishd/mgt/mgt_main.c
```
  1d767f46
11 Sep, 2018 1 commit

Simplify cnt_lookup · 62163f4b

Nils Goroll authored Jun 19, 2018

This is a back port of 4a370dc4

Conflicts:
	bin/varnishd/cache/cache_req_fsm.c

62163f4b

10 Sep, 2018 2 commits

Test case that shows return(abandon) to avoid cache insertion · a8dc8780
Pål Hermunn Johansen authored Jun 15, 2018

a8dc8780

Update the documentation on grace and keep · 721448ef

Pål Hermunn Johansen authored Jun 07, 2018

This is a back port of 7494d6ad,
in which the main point is to clearly recomend using req.grace
for the most common use case - using different grace time when
the backend is healthy. To simplify things, the vcl-grace.rst
file is just copied from master. It should be accurate also for
the 4.1 branch.

721448ef

07 Sep, 2018 1 commit

Reintroduce the req.grace variable, change keep behavior · d2d09318

Pål Hermunn Johansen authored Jun 12, 2018

This is a back port of ff38535a

The req.grace variable can be set in vcl_recv to cap the grace
of objects in the cache, in the same way as in 3.0.x

The "keep" behavior changes with this patch. We now always go
to vcl_miss when the expired object is out of grace, or we go
to the waiting list. The result is that it is no longer
possible to deliver a "keep" object in vcl_hit.

Note that when we get to vcl_miss, we will still have the 304
candidate, but without the detour by vcl_hit.

This commit changes VCL, but only slightly, so we aim to back
port this to earlier versions of Varnish Cache.

Refs: #1799 and #2519

Conflicts:
	bin/varnishd/cache/cache_hash.c
	bin/varnishd/cache/cache_req_fsm.c
	bin/varnishd/cache/cache_varnishd.h
	bin/varnishd/cache/cache_vrt_var.c
	doc/sphinx/reference/vcl_var.rst

d2d09318

05 Jun, 2018 1 commit
- Clear bo->was_304 on return(retry) · 49e6b53c
  Dag Haavi Finstad authored Jun 05, 2018
```
Fixes: #2700
```
  49e6b53c
01 Jun, 2018 1 commit

ban lurker should back off on seeing a busy object · 7906a417

Nils Goroll authored May 17, 2018

HSH_Unbusy() calls BAN_NewObjCore() not holding the objhead
lock, so the ban lurker may race and grab the ban mtx just
after the new oc has been inserted, but the busy flag not
yet cleared.

While it would be correct to call BAN_NewObjCore() with the
objhead mtx held, doing so would increase the pressure on the
combined ban & objhead mtx.

If the ban lurker encounters a busy object, we know that there
must be an unbusy in progress and it would be wiser to rather
back off in favor of the it.

Fixes #2681

7906a417

25 Apr, 2018 2 commits

Prepare for 4.1.10 release · 1d090c5a
Pål Hermunn Johansen authored Apr 25, 2018

1d090c5a

Pass delivery abandoned does not qualify as an error · f609870a

Nils Goroll authored Oct 06, 2017

... so log it under the Debug tag.

FetchErrors should be actual errors which can be addressed. In this case,
nothing is wrong in any way, the fact that we abort a fetch if we don't
need the body is a varnish internal optimization (which makes sense, but
comes at the cost of closing a connection).

Merges #2450

Conflicts:
	bin/varnishd/cache/cache_fetch.c

f609870a

24 Apr, 2018 2 commits
- Update changelog: #2609 · 618c3bc5
  Pål Hermunn Johansen authored Apr 24, 2018
  
  618c3bc5
- Fix memory leak of vary string on stevedore alloc fail · 26d52615
  Martin Blix Grydeland authored Mar 14, 2018
```
If the stevedore failed the object creation, we would leak the temporary
VSB holding the computed vary string. This patch frees it.

Problem exists in 4.1 and later.
```
  26d52615
19 Apr, 2018 1 commit
- Try to hotfix documentation builds on varnish-cache.org · 3d85f730
  Pål Hermunn Johansen authored Apr 19, 2018
```
What can possibly go wrong?
```
  3d85f730
12 Apr, 2018 1 commit

Adapt some test cases from master to avoid regression · f108ad62

Pål Hermunn Johansen authored Apr 12, 2018

Back port two VCC test cases from the master branch. The latter is from
4b48f886 by Nils Goroll <nils.goroll@uplex.de>

f108ad62

11 Apr, 2018 1 commit

Do not possibly underflow rlen · 50bfa24e

Nils Goroll authored Oct 12, 2017

for i < 0, rlen could underflow. We are safe because of the check for
i < 0 further down, so this change is just a minor cleanup.

Fixes #2444

50bfa24e

25 Feb, 2018 1 commit
- Update changelog · 74b65479
  Pål Hermunn Johansen authored Feb 25, 2018
  
  74b65479
24 Feb, 2018 1 commit
- Add n_lru_limited counter · ace6f49d
  Shohei Tanaka(@xcir) authored Feb 16, 2018
```
This is a back port of the commits submitted in #2569 and merged in
dc6c6520.
```
  ace6f49d
23 Feb, 2018 3 commits

EPIPE is a documented errno in tcp(7) on linux · 24ac14da
Poul-Henning Kamp authored Feb 23, 2018
```
Fixes: #2582
```
24ac14da
Fix crash under MacOS while investigating #2332 · ef430670
Federico G. Schwindt authored Jun 04, 2017
```
MacOS will return EINVAL under e.g. setsockopt if the connection was
reset.
```
ef430670

Don't test gunzip for partial responses · 94a7f427

Dridi Boukelmoune authored Jan 09, 2018

Some user agents like Safari may "probe" specific resources like medias
before getting the full resources usually asking for the first 2 or 11
bytes, probably to peek at magic numbers to figure early whether a
potentially large resource may not be supported (read: video).

If the user agent also advertises gzip support, and the transaction is
known beforehand to not be cacheable, varnishd will forward the Range
header to the backend:

    Accept-Encoding: gzip (when http_gzip_support is on)
    Range: bytes=0-1

If the response happens to be both encoded and partial, the gunzip test
cannot be performed. Otherwise we systematically end up with a broken
transaction closed prematuraly:

    FetchError b tGunzip failed
    Gzip b u F - 2 0 0 0 0

Refs #2530
Refs #2554

94a7f427

22 Feb, 2018 2 commits

Stabilize r01764.vtc · df906159

Dag Haavi Finstad authored Feb 22, 2018

The non-fatal ("non_fatal" in master) was omitted when this was first
backported.

df906159

Avoid leaking an OH ref on reembark failure · bf752eff

Dag Haavi Finstad authored Feb 22, 2018

This is a backport of 5cc47eaa.

With this commit hsh_rush has been split into hsh_rush and
hsh_rush_clean. The former needs to be called while holding the OH lock,
and the latter needs to be called without holding the lock.

The reason for this added complexity is that we can't hold the lock
while calling HSH_DerefObjHead.

Fixes: #2495

bf752eff

21 Feb, 2018 1 commit

Fix issue #1799 for keep · 5c24e1f8

Pål Hermunn Johansen authored Dec 13, 2017

This fixes the long-standing #1799 for "keep" objects, and this commit
message suggests a way of working around #1799 in the remaining
cases. The following is a (long) explanation on how grace and keep
works at the moment, how this relates to #1799, and how this commit
changes things.

1. How does it work now, before this commit?

Objects in cache can outlive their TTL, and the typical reason for
this is grace. Objects in cache can also linger because of obj.keep or
in the (rare but observed) case where the expiry thread have not yet
evicted an object. Grace and keep are here to minimize backend load,
but #1799 shows that we are not successful in doing this in some
important cases.

Whenever sub vcl_recv has ended with return (lookup) (which is the
default action), we arrive at HSH_Lookup, where varnish sometimes only
finds an expired object (that match Vary logic, is not banned,
etc). When this happens, we will initiate a background fetch (by
adding a "busy object") if and only if there is no busy object on the
oh already. Then the expired object is returned with HSH_EXP or
HSH_EXPBUSY, depending on whether a busy object was inserted.

2. What makes us run into #1799?

When we have gotten an expired object, we generally hope that it is in
grace, and that sub vcl_hit will return(deliver). However, if grace
has expired, then the default action (ie the action from builtin.vcl)
is return (miss). It is also possible that the user vcl, for some
reason, decides that the stale object should not be delivered, and
does return (miss) explicitly. In these cases it is common that the
current request is not the one to insert a busy object, and then we
run into the issue with a message "vcl_hit{} returns miss without busy
object. Doing pass.".

Note that normally, if a resource is very popular and has a positive
grace, it is unlikely that #1799 will happen. Then a new version will
always be available before the grace has run out, and everybody get
the latest fetched version with no #1799 problems.

However, if a resource is very popular (like a manifest file in a live
streaming setup) and has 0s grace, and the expiry thread lags a little
bit behind, then vcl_hit can get an expired object even when obj.keep
is zero. In these circumstances we can get a surge of requests to the
backend, and this is especially bad on a very busy server.

Another real world example is where grace is initially set high (48h
or similar) and vcl_hit considers the health of the backend, and, if
the backend is healthy, explicitly does a return(miss) ensure that the
client gets a fresh object. This has been a recommended use of
vcl_hit, but, because of #1799, can cause considerable load on the
backend.

Similarly, we can get #1799 if we use "keep" to facilitate IMS
requests to the backend, and we have a stale object for which several
requests arrive before the first completes.

3. How do we fix this?

The main idea is to teach varnish to consider grace during lookup.

To be specific, the following changes with this commit: If an expired
object is found, the ttl+grace has expired and there already is an
ongoing request for the object (ie. there exists a busy object), then
the request is put on the waiting list instead of simply returning the
object ("without a busy object") to vcl_hit. This choice is made
because we anticipate that vcl_hit will do (the default) return (miss)
and that it is better to wait for the ongoing request than to initiate
a new one with "pass" behavior.

The result is that when the ongoing request finishes, we will either
be able to go to vcl_hit, start a new request (can happen if there was
a Vary mismatch) by inserting a new "busy object", or we lose the race
and have to go back to the waiting list (typically unlikely).

When grace is in effect we go to vcl_hit even when we did not insert a
busy object, anticipating that vcl_hit will return (deliver).

This will will fix the cases where the user does not explicitly do a
return(miss) in vcl_hit for object where ttl+grace has not
expired. However, since this is not an uncommon practice, we also have
to change our recommendation on how to use grace and keep. The new
recommendation will be:

* Set grace to the "normal value" for a working varnish+backend.

* Set keep to a high value if the backend is not 100% reliable and you
  want to use stale objects as a fallback.

* Do not explicitly return(miss) in sub vcl_hit{}. The exception is
  when this only can happen now and then and you are really sure that
  this is the right thing to do.

* In vcl_hit, check if the backend is sick, and then explicitly
  return(deliver) when appropriate (ie you want an stale object
  delivered instead of an error message).

A test case is included.

5c24e1f8

20 Feb, 2018 1 commit

Introduce ttl_now and the new way of calculating TTLs in VCL · a02e4f27

Pål Hermunn Johansen authored Feb 20, 2018

This is a back port of 33143e05 in
master, and for this reason it is a little strange.

The strangeness is due to the fact that obj.ttl is not available
in vcl_deliver here, but it is in master. This commit could have
been much simpler without ttl_now, but the function is taken to
4.1 regardles. The reason is that introducing obj.ttl in
vcl_deliver is straightforward, and if someone is to do that in
the future, the code in ttl_now(VRT_CTX) makes sure that obj.ttl
will behave as in master, also in vcl_deliver.

If obj.ttl is introduced in vcl_deliver, than also the two test
cases s00008.vtc and s00009.vtc should be brought in, to make sure
that obj.ttl works as expected.

The following is the test from the commit in master:

A new fucntion, ttl_now(VRT_CTX), defines what "now" is when ttl
and age are calculated in various VCL subs. To sum up,

* Before a backend fetch on the client side (vcl_recv, vcl_hit,
  vcl_miss) we use t_req from the request. This is the significance
  in this commit, and fixes the bug demonstrated by r02555.vtc.
* On the backend side, most notably vcl_backend_responce, we keep
  the old "now" by simply using ctx->now.
* In vcl_deliver we use ctx->now, as before.

It was necessary to make all purges use t_req as their base time.
Then, to not break c00041.vtc it was necessary to change from ">="
to ">" in HSH_Lookup.

All VMODs that currently use HSH_purge must change to using
VRT_purge.

Conflicts:
	bin/varnishd/cache/cache_hash.c
	bin/varnishd/cache/cache_objhead.h
	bin/varnishd/cache/cache_req_fsm.c
	bin/varnishd/cache/cache_vrt.c
	bin/varnishd/cache/cache_vrt_var.c

a02e4f27

14 Feb, 2018 1 commit

Make sure that oc->last_lru has sufficient precision · dceb4cf6

Martin Blix Grydeland authored Jan 08, 2018

Since the last_lru tracks epoch time, it needs the double precision
floating point type to accurately track the time.

This is simply the test case from 2261dcfd.

dceb4cf6

18 Dec, 2017 2 commits
- Update changelog: First 4.1.10 changes · bc993dec
  Pål Hermunn Johansen authored Dec 18, 2017
  
  bc993dec
- Deref the objcore before switching to synth on error. · 23dcf55b
  Martin Blix Grydeland authored Nov 24, 2017
```
Also add asserts for the references held in req->objcore and
req->stale_oc.

The test case for #1807 catches this bug after adding the asserts.

Fixes: #2502
```
  23dcf55b
14 Dec, 2017 2 commits

Specify that time is in seconds · f364d4aa
Federico G. Schwindt authored Oct 12, 2017
```
Addresses #2456 in a different way.
```
f364d4aa

PPC requires a larger file storage · fd089d6f

Martin Blix Grydeland authored Nov 16, 2017

I'm guessing this is due to rounding. All test cases involving file
stevedore has a minimum 10m file in the test, was silly to attempt a
smaller one in this test.

Fixes: #2496

fd089d6f

30 Nov, 2017 1 commit

Add cache_hit_grace counter · a33bf527

Pål Hermunn Johansen authored Nov 30, 2017

The counter cache_hit_grace counts the number of grace hits. To be
precise, it counts the number of times lookup returns an expired
object, but vcl_hit is called and decides to return(deliver).

Every time cache_hit_grace is incremented, cache_hit is also
incremented (so this commit does not change the cache_hit counter).

This is a back port of 1d62f5da.

a33bf527

27 Nov, 2017 5 commits

Honor first_byte_timeout for recycled backend connections · 9754715a
Dag Haavi Finstad authored Nov 24, 2017
```
Fixes: #1772
```
9754715a

Drop VBT_Wait call in vbe_dir_finish · c36ed165

Dag Haavi Finstad authored Nov 24, 2017

With VBT_Close now being capable of dealing with STOLEN connections, we
no longer need to VBT_Wait for them prior to close.

c36ed165

Make VBT_Close capable of dealing with STOLEN vtps · 68a35e10

Dag Haavi Finstad authored Nov 23, 2017

The change from shutdown(.., SHUT_WR) to shutdown(.., SHUT_RDWR) is
required to make it trigger a waiter event.

68a35e10

r02135.vtc test case fixes · 9b614844
Dag Haavi Finstad authored Nov 23, 2017
```
This test case should not rely on
first_byte_timeout/between_bytes_timeout.
```
9b614844

Limit backend connection retries to a single retry · a8da6ead

Dag Haavi Finstad authored Nov 16, 2017

The second time around, we force a fresh connection.

The VCL user may choose to do 'return (retry);' in vcl_backend_error{}
if further attempts are deemed warranted.

Fixes: #2135

a8da6ead