Commits · 075b7a93a322431566ac6923b8c618ee16c44122 · uplex-varnish / slash

07 Feb, 2024 40 commits

Nils Goroll authored Nov 02, 2023

It was too complicated and limited by waiting for flushes to finish.

Now that we can issue multiple flushes, we can simplify it
substantially.

As a result from intermediate efforts, there is now also a facility to
base nuking on the amount of data currently in the process of freeing.
Leaving it in #ifdef'ed out in case we'll need it again.

075b7a93

Store new active block offsets to be updated at just the right time · 8855b141

Nils Goroll authored Nov 03, 2023

with more than once flush finish, writing a header from an
old flush could race the logbuffer_ref() from a more recent one,
leading to an inconsistent log where a logblock with next_off == 0
became reachable.

8855b141

Rework logbuffer flushing · 5725a391

Nils Goroll authored Nov 03, 2023

To avoid having to wait for a previous flush to finish (in most cases),
we now allocate the flush finish state dynamically (and asynchronously).

For ordinary flushes, we can now start the next flush while a previous
one is still in flight, ordering the flush finish in a list to preserve
log consistency.

5725a391

Async allocation for logbuffer flush finish · 5c13fe66
Nils Goroll authored Oct 29, 2023

5c13fe66
logwatcher: Use cond_signal instead of cond_broadcast on watcher_cond · bf9c6d6c
Nils Goroll authored Nov 02, 2023
```
as there is only one thread waiting
```
bf9c6d6c
logwatcher: gc superfluous broadcasts on watcher_cond · 12471981
Nils Goroll authored Nov 02, 2023
```
the logwatcher has now been, for a long time, the only thread
waiting on it
```
12471981
New flush policy from getblks · 7c86b106
Nils Goroll authored Nov 02, 2023

7c86b106
Finish logbuffer mem request when relocating · 0968c091
Nils Goroll authored Oct 29, 2023
```
buddy_reqs are not relocatable, so we need to finish them when
moving logbuffers.
```
0968c091

Regionlist overhaul: Allocate regionlists asynchronously · d122cdcf

Nils Goroll authored Oct 28, 2023

regionlists are updated during DLE submit under the logmtx. Thus, we
should avoid synchronous memory allocations.

We change the strategy as follows:

* Memory for the top regionlist (which has one regl embedded) _is_
  allocated synchronously, but with maximum cram to reduce latencies
  at the expense of memory efficiency.

  The case where the allocation does block will not hit us for the
  most critical path in fellow_log_dle_submit(), because we
  pre-allocate there outside the logmtx.

* When we create the top regionlist, we make two asynchronous memory
  allocation requests for our hard-coded size (16KB for prod), one
  crammed and one not. The crammed request is made such that we get
  _any_ memory rather than waiting.

* When we need to extend the regionlist, we should already have an
  allocation available (if not, we need to wait, bad luck). The next
  allocation available is either [1] (uncrammed) left over after the
  previous extension, or [0], which is potentially crammed. If it is
  and we have an uncrammed [1], then we use that and return the
  crammed allocation. If there are no allocations left, we issue the
  next asynchronous request.

d122cdcf

Regionlist overhaul: Change regl_alloc signature · df77cff2
Nils Goroll authored Oct 27, 2023

df77cff2
Regionlist overhaul: Refactor regionlist init · 74d8d623
Nils Goroll authored Oct 27, 2023

74d8d623
Regionlist overhaul: Refactor regionlist cram calculation · 349e9d7d
Nils Goroll authored Oct 27, 2023

349e9d7d
Regionlist overhaul: Size accounting for regionlist · 1d387952
Nils Goroll authored Oct 27, 2023

1d387952
Regionlist overhaul: Shrink struct regl · 263f15a5
Nils Goroll authored Oct 27, 2023

263f15a5
Regionlist overhaul: Remove regionlist bughunt code · 8672597b
Nils Goroll authored Oct 28, 2023

8672597b

Refactor logbuffer_addblks() to flush when running out of disk blocks · 201ebee5

Nils Goroll authored Oct 26, 2023

When adding log blocks, trigger flush also based on available disk blocks,
that is, do not add blocks to the logbuffer which we can not also flush.

Also flush with reference:

I think the capability was originally limited in order to do
full flushes with reference only from the logwatcher thread, in
order to not hold the logmtx for too long.

But now that we have the extra flush finish thread, I do not think
this is necessary any more, and we need to handle tight storage
better.

201ebee5

prep dskreqs when adding blocks · fe001da4
Nils Goroll authored Oct 26, 2023

fe001da4
Get dskreqs based on logbuffer size · 6a7eb003
Nils Goroll authored Oct 26, 2023

6a7eb003
Refactor out logbuffer_prep_dskreqs · a98560b3
Nils Goroll authored Oct 26, 2023

a98560b3
When rewriting, flush the new log earlier · 787f2279
Nils Goroll authored Oct 26, 2023
```
... such that LRU, which is operating on the temporary log, can
make room.

Ref #28
```
787f2279
Start getting the disk log block reserve earlier · 274d76fe
Nils Goroll authored Oct 26, 2023
```
Ref #28
```
274d76fe
Raise dsk alloc priority of logbuffer flushes with frees · b9d5b64a
Nils Goroll authored Oct 25, 2023
```
Hopefully, this also contributes to a solution for #28
```
b9d5b64a
Return the log's reserved disk blocks also for recycle · 5ebad71a
Nils Goroll authored Oct 25, 2023
```
Otherwise it looks like a rewrite would leak log blocks.
```
5ebad71a
refactor out logbuffer_fini_dskreqs · accb3f03
Nils Goroll authored Oct 25, 2023

accb3f03
Sort prios table · fc0d299a
Nils Goroll authored Oct 25, 2023

fc0d299a
Rename allocation priorities, Raise logblk (dsk) priority by one · 4abc6ad5
Nils Goroll authored Oct 25, 2023
```
it is more important than objects

Should also contribute to a fix for #28
```
4abc6ad5

Allocate additional log blocks early · 2d497b1a

Nils Goroll authored Oct 24, 2023

This, hopefully, is part of a possible solution to the nasty issue #28:

When we do not have a sufficiently large pre-allocated log (log region)
as determined by objsize_hint in relation to the storage size, we need
to dynamically allocate disk blocks while we flush the log.

When the log flush includes object deletions (in particular when
triggered from the disk LRU), we run into a typical deadlock: To
complete the transaction to free space, we need the space...

This commit is part of an attempt to make this work by allocating
space early on: When we only have 20% of the log region left, we start
to reserve more blocks for the log.

The problem can, for example, be reproduced with an objsize_hint of 1MB
and an actual object size in the oder of 32KB.

Ref #28

2d497b1a

Fix wrong assertion hitting when all discard methods fail · 11d86e62

Nils Goroll authored Nov 09, 2023

Manually tested with this modification:

diff --git a/src/fellow_log.c b/src/fellow_log.c
index 6075d81..45da269 100644
--- a/src/fellow_log.c
+++ b/src/fellow_log.c
@@ -1696,6 +1696,9 @@ fellow_io_regions_discard(struct fellow_fd *ffd, void *ioctx,
                r = fallocate(ffd->fd,
                    FALLOC_FL_KEEP_SIZE | FALLOC_FL_PUNCH_HOLE,
                    (off_t)todo->offset, (off_t)todo->len);
+               // XXX TEST
+               r = 1;
+               errno = EOPNOTSUPP;
                if (r == 0) {
                        if ((ffd->cap & FFD_CAN_FALLOCATE_PUNCH_URING) == 0) {
                                ffd->diag("fellow: fallocate punch"

Fixes #38

11d86e62

fellow_stream_f(): Improve comment and assertion · e1017adc
Nils Goroll authored Nov 07, 2023
```
to make clear that we understand exactly what is happening.
```
e1017adc

Fix races for streaming busy objects · aca69dac

Nils Goroll authored Nov 07, 2023

For streaming busy objects, we basically rely on the varnish-cache
ObjExtend() / ObjWaitExtend() API to never read past the object: In
fellow_stream_f(), we always wait for more data (or the end of the
object) before returning, such that fellow_cache_obj_iter(), which
iterates over segments, should never touch a segment past the final
FCS_BUSY segment.

Yet - it did, by means of the read-ahead and the peek-ahead to determine
whether or not OBJ_ITER_END should be signaled.

We fix this issue by reading/peeking ahead only for segments with a
state beyond FCS_BUSY.

There is now also extensive test infrastructure to specifically test
concurrent access ti busy objects. To keep layers separate,
fellow_cache_test uses a lightweight signal/wait implementation
analogous to the ObjExtend() / ObjWaitExtend() Varnish-Cache
interface.

An earlier version of t_busyobj() had run on my dev laptop for 3.5
hours without crashing, while without the fixes it had run into
assertion failures within seconds.

Fixes #35 and #36 (I hope)

aca69dac

Extend b62.vtc by cache reload · 83bc6afe
Nils Goroll authored Nov 06, 2023

83bc6afe
Mark a question to revisit later · ab644362
Nils Goroll authored Nov 06, 2023

ab644362

Add DBG() to fcsc_next() · 8fc211fe

Nils Goroll authored Nov 06, 2023

... to make it easier to follow the code in fellow_cache_test

motivated by #35

8fc211fe

Reorganize offsets in log info · 39700637
Nils Goroll authored Nov 03, 2023

39700637

Introduce a dynamic minimum to dsk_reserve_chunks ... · fefc08da

Nils Goroll authored Nov 03, 2023

... such that the total reserve is no less than 2MB.

This is required for stable operation of LRU when the log is full.

Ref #28

fefc08da

Add buddy_next_ptr_* · 6fc6bd78
Nils Goroll authored Oct 28, 2023

6fc6bd78
Fix single active logblock allocation for logregion-only case · 0b45d073
Nils Goroll authored Oct 26, 2023
```
Should be irrelevant in practice, because we would not flush
a single block during startup.
```
0b45d073

Fix nit in logblocks_alloc_from_logregion() with already allocated blocks · 78f2dcc4

Nils Goroll authored Oct 26, 2023

When some blocks were already allocated, we would fail to
use all of the log region, that is, the newly added assertion

	if (n > 0) AZ(logreg->free_n);

would fail

This left some blocks of the logregion unused, but was insignificant
otherwise.

78f2dcc4

Fix stupid glitch rendering logbuffer capabilities useless · 8b6e81f7

Nils Goroll authored Oct 26, 2023

Unfortunately, this was present even in the initial public
release 58ec40f9

This issue should have had no production impact, but it made hunting
down bugs unnecessary hard.

8b6e81f7

Move assertion to the right place · 108714e7

Nils Goroll authored Nov 03, 2023

When we work on the last segment, the remaining length is zero,
but we still have a current pointer and length.

This was a particularly annoying glitch because I wrote almost
the same code for varnish-cache with the equivalent assertion in
the right place :(

Sorry

Ref https://github.com/varnishcache/varnish-cache/pull/4013/commits/8ec77190d91603c8f0dead0cee013e3c9ca8fa78#diff-f79cfeda8456789ae873270aefa58e8f1e94213ee16d32ea96b8db8a7013ebf8R790
Closes #34

108714e7