Commit 20362bf8 authored by Nils Goroll's avatar Nils Goroll

Add a regression test for #1762

Further investigating into root cause scenarios resulted in the
following insights:

* the bad vxid must have got into vtx->key.vxid by way of
  `vtx_parse_link`

* which is only called for `SLT_Begin` (`vtx_scan_begin()`) and
  `SLT_Link` (`vtx_scan_link()`)

(actually this was known before, but I am now confident that these are
the only cases)

There is no case in the code as of 4.0.3 release where `SLT_Begin` is
emitted with an unmasked vxid, so the issue must be root casue in an
`SLT_Link` link record.

In both cases where unmasked vxids are emitted for `SLT_Link`, the id
comes directly from `VXID_Get()`:

* `cache_fetch.c`

  wid = VXID_Get(&wrk->vxid_pool);
  VSLb(bo->vsl, SLT_Link, "bereq %u retry", wid);

* `cache_req_fsm.c`

  wid = VXID_Get(&wrk->vxid_pool);
  // XXX: ReqEnd + ReqAcct ?
  VSLb_ts_req(req, "Restart", W_TIM_real(wrk));
  VSLb(req->vsl, SLT_Link, "req %u restart", wid);

So unless I have overseen anything significant, the root cause must
have been a vxid spill, which was fixed with
0dd8c0b8 (master) /
171f3ac5 (4.0)

`VXID()` masking would have avoided the issue to surface.

This insight is consistent with two observations:

* the issue only surfaced after `varnishd` running for longer periods
  of time

* the issue didn't go away after a restart of the vsl client, a
  `varnishd` restart was required

This gives confidence that the issue has really been understood
completely and that the root cause has been fixed.
parent 65f9cb7a
varnishtest "test vsl api handling of incomplete vtxes combined with bad vxids"
server s1 {
rxreq
txresp
accept
rxreq
delay 5
txresp
} -start
varnish v1 -vcl+backend {
sub vcl_backend_response {
if (bereq.url == "/retry" &&
bereq.retries == 0) {
return (retry);
}
}
sub vcl_deliver {
if (req.url == "/restart" &&
req.restarts == 0) {
return (restart);
}
}
} -start
# xid spill into client marker emitting a bad SLT_Link for bereq retry
# triggering vtx_force by way of a timeout
#
# VSLb(bo->vsl, SLT_Link, "bereq %u retry", wid);
varnish v1 -cliok "param.set debug +syncvsl"
# vxid wrap at 1<<30
varnish v1 -cliok "debug.xid 1073741823"
logexpect l1 -v v1 -g request -T 2 {
expect 0 1 Begin "req 0"
expect * = ReqStart
expect 0 = ReqMethod GET
expect 0 = ReqURL /
expect 0 = ReqProtocol HTTP/1.1
expect * = ReqHeader "Foo: bar"
expect * = Link "bereq 2 fetch"
expect * = VSL "timeout"
expect * = End "synth"
expect 0 2 Begin "bereq 1"
expect * 2 Link "bereq 3 retry"
expect * = End
expect 0 3 Begin "bereq 2 retry"
expect * = End
} -start
client c1 {
txreq -url "/retry" -hdr "Foo: bar"
rxresp
expect resp.status == 200
} -run
logexpect l1 -wait
################################################################################
# case xid spill into client marker emitting a bad SLT_Link for restart
#
# VSLb(req->vsl, SLT_Link, "req %u restart", wid);
server s1 {
rxreq
txresp
} -start
varnish v1 -cliok "param.set debug +syncvsl"
# vxid wrap at 1<<30
varnish v1 -cliok "debug.xid 1073741823"
logexpect l1 -v v1 -g request {
expect 0 1 Begin "req 0"
expect * = ReqStart
expect 0 = ReqMethod GET
expect 0 = ReqURL /
expect 0 = ReqProtocol HTTP/1.1
expect * = ReqHeader "Foo: bar"
expect * = Link "bereq 2 fetch"
expect * = Link "req 3 restart"
expect * = End
expect 0 2 Begin "bereq 1"
expect * = End
expect 0 3 Begin "req 1 restart"
expect * = End
} -start
client c1 {
txreq -url "/restart" -hdr "Foo: bar"
rxresp
expect resp.status == 200
} -run
logexpect l1 -wait
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment