- 07 Feb, 2024 40 commits
-
-
Nils Goroll authored
It was too complicated and limited by waiting for flushes to finish. Now that we can issue multiple flushes, we can simplify it substantially. As a result from intermediate efforts, there is now also a facility to base nuking on the amount of data currently in the process of freeing. Leaving it in #ifdef'ed out in case we'll need it again.
-
Nils Goroll authored
with more than once flush finish, writing a header from an old flush could race the logbuffer_ref() from a more recent one, leading to an inconsistent log where a logblock with next_off == 0 became reachable.
-
Nils Goroll authored
To avoid having to wait for a previous flush to finish (in most cases), we now allocate the flush finish state dynamically (and asynchronously). For ordinary flushes, we can now start the next flush while a previous one is still in flight, ordering the flush finish in a list to preserve log consistency.
-
Nils Goroll authored
-
Nils Goroll authored
as there is only one thread waiting
-
Nils Goroll authored
the logwatcher has now been, for a long time, the only thread waiting on it
-
Nils Goroll authored
-
Nils Goroll authored
buddy_reqs are not relocatable, so we need to finish them when moving logbuffers.
-
Nils Goroll authored
regionlists are updated during DLE submit under the logmtx. Thus, we should avoid synchronous memory allocations. We change the strategy as follows: * Memory for the top regionlist (which has one regl embedded) _is_ allocated synchronously, but with maximum cram to reduce latencies at the expense of memory efficiency. The case where the allocation does block will not hit us for the most critical path in fellow_log_dle_submit(), because we pre-allocate there outside the logmtx. * When we create the top regionlist, we make two asynchronous memory allocation requests for our hard-coded size (16KB for prod), one crammed and one not. The crammed request is made such that we get _any_ memory rather than waiting. * When we need to extend the regionlist, we should already have an allocation available (if not, we need to wait, bad luck). The next allocation available is either [1] (uncrammed) left over after the previous extension, or [0], which is potentially crammed. If it is and we have an uncrammed [1], then we use that and return the crammed allocation. If there are no allocations left, we issue the next asynchronous request.
-
Nils Goroll authored
-
Nils Goroll authored
-
Nils Goroll authored
-
Nils Goroll authored
-
Nils Goroll authored
-
Nils Goroll authored
-
Nils Goroll authored
When adding log blocks, trigger flush also based on available disk blocks, that is, do not add blocks to the logbuffer which we can not also flush. Also flush with reference: I think the capability was originally limited in order to do full flushes with reference only from the logwatcher thread, in order to not hold the logmtx for too long. But now that we have the extra flush finish thread, I do not think this is necessary any more, and we need to handle tight storage better.
-
Nils Goroll authored
-
Nils Goroll authored
-
Nils Goroll authored
-
Nils Goroll authored
... such that LRU, which is operating on the temporary log, can make room. Ref #28
-
Nils Goroll authored
Ref #28
-
Nils Goroll authored
Hopefully, this also contributes to a solution for #28
-
Nils Goroll authored
Otherwise it looks like a rewrite would leak log blocks.
-
Nils Goroll authored
-
Nils Goroll authored
-
Nils Goroll authored
it is more important than objects Should also contribute to a fix for #28
-
Nils Goroll authored
This, hopefully, is part of a possible solution to the nasty issue #28: When we do not have a sufficiently large pre-allocated log (log region) as determined by objsize_hint in relation to the storage size, we need to dynamically allocate disk blocks while we flush the log. When the log flush includes object deletions (in particular when triggered from the disk LRU), we run into a typical deadlock: To complete the transaction to free space, we need the space... This commit is part of an attempt to make this work by allocating space early on: When we only have 20% of the log region left, we start to reserve more blocks for the log. The problem can, for example, be reproduced with an objsize_hint of 1MB and an actual object size in the oder of 32KB. Ref #28
-
Nils Goroll authored
Manually tested with this modification: diff --git a/src/fellow_log.c b/src/fellow_log.c index 6075d81..45da269 100644 --- a/src/fellow_log.c +++ b/src/fellow_log.c @@ -1696,6 +1696,9 @@ fellow_io_regions_discard(struct fellow_fd *ffd, void *ioctx, r = fallocate(ffd->fd, FALLOC_FL_KEEP_SIZE | FALLOC_FL_PUNCH_HOLE, (off_t)todo->offset, (off_t)todo->len); + // XXX TEST + r = 1; + errno = EOPNOTSUPP; if (r == 0) { if ((ffd->cap & FFD_CAN_FALLOCATE_PUNCH_URING) == 0) { ffd->diag("fellow: fallocate punch" Fixes #38
-
Nils Goroll authored
to make clear that we understand exactly what is happening.
-
Nils Goroll authored
For streaming busy objects, we basically rely on the varnish-cache ObjExtend() / ObjWaitExtend() API to never read past the object: In fellow_stream_f(), we always wait for more data (or the end of the object) before returning, such that fellow_cache_obj_iter(), which iterates over segments, should never touch a segment past the final FCS_BUSY segment. Yet - it did, by means of the read-ahead and the peek-ahead to determine whether or not OBJ_ITER_END should be signaled. We fix this issue by reading/peeking ahead only for segments with a state beyond FCS_BUSY. There is now also extensive test infrastructure to specifically test concurrent access ti busy objects. To keep layers separate, fellow_cache_test uses a lightweight signal/wait implementation analogous to the ObjExtend() / ObjWaitExtend() Varnish-Cache interface. An earlier version of t_busyobj() had run on my dev laptop for 3.5 hours without crashing, while without the fixes it had run into assertion failures within seconds. Fixes #35 and #36 (I hope)
-
Nils Goroll authored
-
Nils Goroll authored
-
Nils Goroll authored
... to make it easier to follow the code in fellow_cache_test motivated by #35
-
Nils Goroll authored
-
Nils Goroll authored
... such that the total reserve is no less than 2MB. This is required for stable operation of LRU when the log is full. Ref #28
-
Nils Goroll authored
-
Nils Goroll authored
Should be irrelevant in practice, because we would not flush a single block during startup.
-
Nils Goroll authored
When some blocks were already allocated, we would fail to use all of the log region, that is, the newly added assertion if (n > 0) AZ(logreg->free_n); would fail This left some blocks of the logregion unused, but was insignificant otherwise.
-
Nils Goroll authored
Unfortunately, this was present even in the initial public release 58ec40f9 This issue should have had no production impact, but it made hunting down bugs unnecessary hard.
-
Nils Goroll authored
When we work on the last segment, the remaining length is zero, but we still have a current pointer and length. This was a particularly annoying glitch because I wrote almost the same code for varnish-cache with the equivalent assertion in the right place :( Sorry Ref https://github.com/varnishcache/varnish-cache/pull/4013/commits/8ec77190d91603c8f0dead0cee013e3c9ca8fa78#diff-f79cfeda8456789ae873270aefa58e8f1e94213ee16d32ea96b8db8a7013ebf8R790 Closes #34
-