-
Nils Goroll authored
Further investigating into root cause scenarios resulted in the following insights: * the bad vxid must have got into vtx->key.vxid by way of `vtx_parse_link` * which is only called for `SLT_Begin` (`vtx_scan_begin()`) and `SLT_Link` (`vtx_scan_link()`) (actually this was known before, but I am now confident that these are the only cases) There is no case in the code as of 4.0.3 release where `SLT_Begin` is emitted with an unmasked vxid, so the issue must be root casue in an `SLT_Link` link record. In both cases where unmasked vxids are emitted for `SLT_Link`, the id comes directly from `VXID_Get()`: * `cache_fetch.c` wid = VXID_Get(&wrk->vxid_pool); VSLb(bo->vsl, SLT_Link, "bereq %u retry", wid); * `cache_req_fsm.c` wid = VXID_Get(&wrk->vxid_pool); // XXX: ReqEnd + ReqAcct ? VSLb_ts_req(req, "Restart", W_TIM_real(wrk)); VSLb(req->vsl, SLT_Link, "req %u restart", wid); So unless I have overseen anything significant, the root cause must have been a vxid spill, which was fixed with 0dd8c0b8 (master) / 171f3ac5 (4.0) `VXID()` masking would have avoided the issue to surface. This insight is consistent with two observations: * the issue only surfaced after `varnishd` running for longer periods of time * the issue didn't go away after a restart of the vsl client, a `varnishd` restart was required This gives confidence that the issue has really been understood completely and that the root cause has been fixed.
20362bf8