Commits · 5df27a080f2965f6cac9c9768027d57ab5b7567e · varnishcache / varnish-cache

15 Jan, 2020 15 commits

Add information about vcl object instances to the panic output · 5df27a08

Nils Goroll authored Apr 06, 2019

In the absence of a core dump, we do not have any information yet in the
panic output about vcl object instances, for example to find out which
object a priv belongs to when the instance address is used for
per-instance priv state.

To make this information available at the time of a panic, we add the
following:

* A struct vrt_ii (for instance info), of which a static gets
  filled in by VCC to contain the pointers to the C global variable
  instance pointers at VCC time

* A pointer to this struct from the VCL_conf to make it available to
  the varnishd worker

* dumps of the instance info for panics

5df27a08

Add capability to send the authority TLV in the PROXY header · 99ec7796

Geoff Simmons authored Aug 21, 2019

This gives the receiver of the PROXY header (usually the ssl-onloader)
the opportunity to set the SNI (HostName field) from the TLV value, for
the TLS handshake with the remote backend.

From
https://github.com/nigoroll/varnish-cache/commit/e0eb7d0a9c65cdc3c58978656b4c71f4ab8aabca
edited by @nigoroll to split out the proxy header functionality.

Add vmod_debug access to the proxy header formatting and test it

99ec7796

format proxy header on the stack · a74315bc
Nils Goroll authored Apr 16, 2019

a74315bc
add error checking and accounting for sending proxy headers · d4a28b98
Nils Goroll authored Nov 30, 2018

d4a28b98
wrap VPX_Format_Proxy for VRT · de50fefc
Nils Goroll authored Nov 23, 2018

de50fefc
Add VPX_Format_Proxy · 461a22a4
Nils Goroll authored Nov 23, 2018

461a22a4
split out proxyv2 formatting · 519737ac
Nils Goroll authored Nov 23, 2018

519737ac

split out proxyv1 formatting · 6a08beba

Nils Goroll authored Nov 23, 2018

Note: this partially reverts cf14a0fd
to prepare for bytes accounting in a later patch

6a08beba

remove a now pointless vtc · 5fe2a46d

Nils Goroll authored Oct 28, 2019

This test is to detect a deadlock which does not exist any more. IMHO,
the only sensible way to test for the lack of it now is to do a load
test, which is not what we want in vtc.

5fe2a46d

generalize the worker pool reserve to avoid deadlocks · 3bb8b84c

Nils Goroll authored Oct 09, 2018

Previously, we used a minimum number of idle threads (the reserve) to
ensure that we do not assign all threads with client requests and no
threads left over for backend requests.

This was actually only a special case of the more general issue
exposed by h2: Lower priority tasks depend on higher priority tasks
(for h2, sessions need streams, which need requests, which may need
backend requests).

To solve this problem, we divide the reserve by the number of priority
classes and schedule lower priority tasks only if there are enough
idle threads to run higher priority tasks eventually.

This change does not guarantee any upper limit on the amount of time
it can take for a task to be scheduled (e.g. backend requests could be
blocking on arbitrarily long timeouts), so the thread pool watchdog is
still warranted. But this change should guarantee that we do make
progress eventually.

With the reserves, thread_pool_min needs to be no smaller than the
number of priority classes (TASK_QUEUE__END). Ideally, we should have
an even higher minimum (@Dridi rightly suggested to make it 2 *
TASK_QUEUE__END), but that would prevent the very useful test
t02011.vtc.

For now, the value of TASK_QUEUE__END (5) is hardcoded as such for the
parameter configuration and documentation because auto-generating it
would require include/macro dances which I consider over the top for
now. Instead, the respective places are marked and an assert is in
place to ensure we do not start a worker with too small a number of
workers. I dicided against checks in the manager to avoid include
pollution from the worker (cache.h) into the manager.

Fixes #2418 for real

3bb8b84c

Remove varnishd -C coverage · 75cca3cd

Dridi Boukelmoune authored Jan 15, 2020

This check is not deterministic because vmod_std may indeed be found in
the default vmod_path defined at configure time.

75cca3cd

Whitespace OCD · 1833d7dd
Dridi Boukelmoune authored Jan 15, 2020

1833d7dd

Fail fetch retries when uncached request body has been released · f88b4795

Martin Blix Grydeland authored Oct 10, 2019

Currently we allow fetch retries with body even after we have released the
request that initiated the fetch, and the request body with it. The
attached test case demonstrates this, where s2 on the retry attempt gets
stuck waiting for 3 bytes of body data that is never sent.

Fix this by keeping track of what the initial request body status was, and
failing the retry attempt if the request was already released
(BOS_REQ_DONE) and the request body was not cached.

f88b4795

Fetch thread reference count and keep cached request bodies · d4b6228e

Martin Blix Grydeland authored Nov 04, 2019

With this patch fetch threads will for completely cached request bodies
keep a reference to it for the entire duration of the fetch. This extends
the retry window of backend requests with request body beyond the
BOS_REQ_DONE point.

Patch by: Poul-Henning Kamp

d4b6228e

Assert · eb14a0b6
Dridi Boukelmoune authored Jan 15, 2020

eb14a0b6

14 Jan, 2020 5 commits

avoid the STV_close() race for now · 4df4d2a4
Nils Goroll authored Jan 14, 2020
```
See #3190
```
4df4d2a4
stop the expiry thread before closing stevedores · 34b687e6
Nils Goroll authored Jan 14, 2020
```
This should fix the panic mentioned in
309e807d
```
34b687e6
allow bgthreads to terminate only during shutdown · 4c7108b1
Nils Goroll authored Jan 14, 2020

4c7108b1

try to narrow down a umem panic observed in vtest b00035.vtc · 309e807d

Nils Goroll authored Jan 14, 2020

is it a race with _close ?

***  v1   debug|Child (369) Panic at: Tue, 14 Jan 2020 12:06:12 GMT
***  v1   debug|Wrong turn at
../../../bin/varnishd/cache/cache_main.c:284:
***  v1   debug|Signal 11 (Segmentation Fault) received at b4 si_code 1
***  v1   debug|version = varnish-trunk revision
b8b798a0, vrt api = 10.0
***  v1   debug|ident = -jsolaris,-sdefault,-sdefault,-hcritbit,ports
***  v1   debug|now = 2786648.903965 (mono), 1579003571.310573 (real)
***  v1   debug|Backtrace:
***  v1   debug|  80e1bd8: /tmp/vtest.o32_su12.4/varnish-cache/varnish-trunk/_build/bin/varnishd/varnishd'pan_backtrace+0x18 [0x80e1bd8]
***  v1   debug|  80e2147: /tmp/vtest.o32_su12.4/varnish-cache/varnish-trunk/_build/bin/varnishd/varnishd'pan_ic+0x2c7 [0x80e2147]
***  v1   debug|  81b9a6f: /tmp/vtest.o32_su12.4/varnish-cache/varnish-trunk/_build/bin/varnishd/varnishd'VAS_Fail+0x4f [0x81b9a6f]
***  v1   debug|  80d7fba: /tmp/vtest.o32_su12.4/varnish-cache/varnish-trunk/_build/bin/varnishd/varnishd'child_signal_handler+0x27a [0x80d7fba]
***  v1   debug|  fed92695: /lib/libc.so.1'__sighndlr+0x15 [0xfed92695]
***  v1   debug|  fed86c8b: /lib/libc.so.1'call_user_handler+0x298 [0xfed86c8b]
***  v1   debug|  fda8a93e: /lib/libumem.so.1'umem_cache_free
***  v1   debug|+0x23 [0xfda8a93e]
***  v1   debug|  817f3bc: /tmp/vtest.o32_su12.4/varnish-cache/varnish-trunk/_build/bin/varnishd/varnishd'smu_free+0x35c [0x817f3bc]
***  v1   debug|  817aa21: /tmp/vtest.o32_su12.4/varnish-cache/varnish-trunk/_build/bin/varnishd/varnishd'sml_stv_free+0x101 [0x817aa21]
***  v1   debug|  817b4eb: /tmp/vtest.o32_su12.4/varnish-cache/varnish-trunk/_build/bin/varnishd/varnishd'sml_slim+0x2cb [0x817b4eb]
***  v1   debug|thread = (cache-exp)
***  v1   debug|thr.req = 0 {
***  v1   debug|},
***  v1   debug|thr.busyobj = 0 {
***  v1   debug|},
***  v1   debug|vmods = {
***  v1   debug|},
***  v1   debug|
***  v1   debug|
***  v1   debug|Info: Child (369) said Child dies
***  v1   debug|Debug:
***  v1   debug| Child cleanup complete
***  v1   debug|

309e807d

Revert "does the umem backend affect the amount of malloc NULL returns in vtest?" · b8b798a0
Nils Goroll authored Jan 14, 2020
```
This reverts commit 8ea006ee.

does not seem to make a difference, trying to narrow down using other
means (different platforms)
```
b8b798a0

13 Jan, 2020 7 commits
- Don't report send timeouts as REM_CLOSE errors · c186423b
  Dridi Boukelmoune authored Jan 03, 2020
```
V1L_Close() is now in charge of returning a specific enum sess_close
instead of an error flag where SC_NULL implies that everything went
well.

This otherwise maintains the status quo regarding prior handling of
HTTP/1 write errors. The calling code falls back to what it used to
default to when an error occurred somewhere else.
```
  c186423b
- Don't ignore OA_GZIPBITS if there is a boc · f82b5f15
  Dridi Boukelmoune authored Jan 10, 2020
```
Under load, client c4 from g00005.vtc may fail with a 200 response
instead of the expected 206 partial response.

There is a window during which we might still see a boc, but because
c4 sets beresp.do_stream to false, the fetch has to be over. To close
this race we can instead reference the boc as suggested in #2904 and
keep track of the boc state.
```
  f82b5f15
- does the umem backend affect the amount of malloc NULL returns in vtest? · 8ea006ee
  Nils Goroll authored Jan 13, 2020
  
  8ea006ee
- test vmod_path for -C · 21d3e8e8
  Nils Goroll authored Jan 13, 2020
  
  21d3e8e8
- Include proper prototype for daemon() · e527020a
  Poul-Henning Kamp authored Jan 13, 2020
  
  e527020a
- Move the 304 logic into a separate function for clarity. · 5c1fea75
  Poul-Henning Kamp authored Jan 13, 2020
  
  5c1fea75
- Monday morning FlexeLinting · 7de321e9
  Poul-Henning Kamp authored Jan 13, 2020
  
  7de321e9
10 Jan, 2020 1 commit
- Hint compilers we won't be returning · e0d10f4e
  Federico G. Schwindt authored Jan 09, 2020
  
  e0d10f4e
09 Jan, 2020 3 commits
- Fix 821a1c27 on sunos · 19c67fcb
  Federico G. Schwindt authored Jan 09, 2020
  
  19c67fcb
- Botched in previous commit · df51d3c4
  Federico G. Schwindt authored Jan 09, 2020
  
  df51d3c4
- Mark functions that won't return as such · 821a1c27
  Federico G. Schwindt authored Jan 09, 2020
  
  821a1c27
08 Jan, 2020 3 commits

Linux documents SO_SNDTIMEO in socket(7) · 9a7dc49b
Dridi Boukelmoune authored Jan 08, 2020
```
Closes #3178
```
9a7dc49b

Stabilize s10 · bcba9649

Dridi Boukelmoune authored Jan 08, 2020

Contrary to previous attempts this one takes a different route that
is much more reliable and faster.

First, it sets things up so that we can predicatbly lock varnish when
it's trying to send the first (and only) part of the body. Instead of
assuming a delay that is sometimes not enough under load, we wait for
the timeout to show up in the log.

We can't put the barrier in l1 or l2 because logexpect spec evaluation
is eager, in order to cope with the VSL API.

Because we bypass the cache, we can afford letting c1 bail out before
completing the transaction, which is necessary because otherwise the
second c1 run would take forever on FreeBSD that takes our request to
limit the send buffer to 128 octets very seriously (on Linux we get
around 4k).

Because we use barriers, the send and receive buffers were bumped to
256 to ensure c1 doesn't fail (on FreeBSD) before it reaches barrier
statements.

bcba9649

Polish · 8f38a64f
Dridi Boukelmoune authored Jan 08, 2020

8f38a64f

05 Jan, 2020 2 commits
- Plug minor leak · c46b4850
  Federico G. Schwindt authored Jan 05, 2020
  
  c46b4850
- Don't report on smp_thread for now · f34f5a94
  Federico G. Schwindt authored Jan 05, 2020
  
  f34f5a94
03 Jan, 2020 2 commits

Revert "Attempt at stabilizing s10" · 388e89f5

Dridi Boukelmoune authored Jan 03, 2020

This reverts commit df9f3489.

It succeeded in terms of determinism but there is another underlying bug
to fix so it will be submitted again via #3178 instead.

388e89f5

Stabilize r2964 · 79ae2fd8

Dridi Boukelmoune authored Jan 03, 2020

The fetch may be interrupted before s1 has time to buffer the complete
response.

79ae2fd8

02 Jan, 2020 2 commits

Attempt at stabilizing s10 · df9f3489

Dridi Boukelmoune authored Jan 02, 2020

This test has been relying on SLT_Debug records from day one and now
that we have SLT_Notice we could perpetuate this information and at
the same time grant ourselves the freedom to explain each case and
which parameters may be used to try to improve the situation.

df9f3489

Attempt at stabilizing e19 · 895810cb
Dridi Boukelmoune authored Jan 02, 2020
```
I'm no longer able to time it out under load.
```
895810cb