Commits · 6445c85c51780c79ff477675595c25f7ef542786 · uplex-varnish / slash

19 Sep, 2023 1 commit
- Add an allocation limit injection facility · 6445c85c
  Nils Goroll authored Sep 19, 2023
  
  6445c85c
13 Sep, 2023 2 commits
- Run LRU during fellow_cache_test · 90a80e1f
  Nils Goroll authored Sep 13, 2023
  
  90a80e1f
- No need for VSL infrastructure if LRU does not log · b21b5681
  Nils Goroll authored Sep 13, 2023
  
  b21b5681
10 Sep, 2023 2 commits
- Need to pull more tricks to make out-of-three build work · ced43e81
  Nils Goroll authored Sep 10, 2023
  
  ced43e81
- Fix make distcheck · 2d71f887
  Nils Goroll authored Sep 10, 2023
  
  2d71f887
28 Aug, 2023 1 commit

We do not need libvarnish any more · 5bab1dda

Nils Goroll authored Aug 28, 2023

Since https://github.com/varnishcache/varnish-cache/commit/48d9f8094c9a54f4e41b44abc3931193b9ccdf7b

5bab1dda

03 Aug, 2023 3 commits

gc obsolete code · 4ed5e022

Nils Goroll authored Aug 03, 2023

fellow_cache_lru_chg() already calls fellow_cache_lru_chgbatch_apply()
when the remove array is full.

4ed5e022

Keep a counter of number of LRU entries · 1e7757be
Nils Goroll authored Aug 03, 2023

1e7757be

Fix: Optimize FCO LRU eviction · 03ff6049

Nils Goroll authored Aug 03, 2023

Fix a regression from 44d788bf:

While we do want to reduce the critical region holding the lru mtx,
we can not release the fco mtx before we have completed the trans-
action on it with respect to LRU state.

Because we might need to un-do the LRU removal of the FCO, we
need to keep the mtx held until we know.

Otherwise another thread can race us and change the state under
our feet.

In this case, we raced fellow_cache_obj_delete():

 #9  0x00007f2711972fd6 in __GI___assert_fail (
     assertion=assertion@entry=0x7f27116f3ec1 "(fcs->fcs_onlru) != 0",
     file=file@entry=0x7f27116f31f8 "fellow_cache.c", line=line@entry=3145,
     function=function@entry=0x7f27116f6b50 <__PRETTY_FUNCTION__.13829> "fellow_cache_lru_work") at assert.c:101
 #10 0x00007f27116bd1db in fellow_cache_lru_work (wrk=wrk@entry=0x7edb0a8135d0, lru=lru@entry=0x7edb4421eb10)
     at fellow_cache.c:3145
 #11 0x00007f27116bd7c7 in fellow_cache_lru_thread (wrk=0x7edb0a8135d0, priv=0x7edb4421eb10)
     at fellow_cache.c:3322
 #12 0x000056544bcc06cb in wrk_bgthread (arg=0x7edb3a6e0900) at cache/cache_wrk.c:104
 #13 0x00007f2711b39609 in start_thread (arg=<optimized out>) at pthread_create.c:477
 #14 0x00007f2711a5e133 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

(gdb) p *fcs
$1 = {magic = 25208, state = FCO_INCORE, fcs_onlru = 0, fco_infdb = 0, lcb_add = 0, lcb_remove = 0,
  fco_lru_mutate = 0, refcnt = 0, lru_list = {vtqe_next = 0x7eebee7a00a0, vtqe_prev = 0x7edb4421eb48},
  fco = 0x7ef07cf3c000, disk_seg = 0x7ef1cd8e8008, alloc = {ptr = 0x7ef1cd8e8000, size = 4096}, len = 0}
(gdb) p *fcs->fco
$2 = {magic = 2206029151, logstate = FCOL_DELETED, lru = 0x7edb4421eb10, fco_mem = {ptr = 0x7ef07cf3c000,
    bits = 13 '\r', magic = 4294193151}, mtx = pthread_mutex_t = {Type = Normal,
    Status = Acquired, possibly with waiters, Owner ID = 543234, Robust = No, Shared = No, Protocol = None},
  cond = pthread_cond_t = {Threads known to still execute a wait function = 0, Clock ID = CLOCK_REALTIME,
    Shared = No}, oc = 0x7ed3c82b0b00, fdb = {fdb = 2493649440769}, fdb_entry = {rbe_link = {0x7eebec72c000,
      0x0, 0x0}}, fdo_fcs = {magic = 25208, state = FCO_INCORE, fcs_onlru = 0, fco_infdb = 0, lcb_add = 0,
    lcb_remove = 0, fco_lru_mutate = 0, refcnt = 0, lru_list = {vtqe_next = 0x7eebee7a00a0,
      vtqe_prev = 0x7edb4421eb48}, fco = 0x7ef07cf3c000, disk_seg = 0x7ef1cd8e8008, alloc = {
      ptr = 0x7ef1cd8e8000, size = 4096}, len = 0}, aa_esidata_seg = {magic = 25208, state = FCS_USABLE,
    fcs_onlru = 0, fco_infdb = 0, lcb_add = 0, lcb_remove = 0, fco_lru_mutate = 0, refcnt = 0, lru_list = {
      vtqe_next = 0x0, vtqe_prev = 0x0}, fco = 0x7ef07cf3c000, disk_seg = 0x7ef1cd8e80f0, alloc = {ptr = 0x0,
      size = 0}, len = 0}, seglist = {magic = 3403082203, lsegs = 122, fdsl = 0x7ef1cd8e8178, fdsl_sz = 0,
    fcsl_sz = 0, next = 0x0, segs = 0x7ef07cf3c148}}

racing thread:

 Thread 3478 (Thread 0x7f2705d84700 (LWP 543234)):
 #0  __lll_lock_wait (futex=futex@entry=0x7edb4421eb20, private=0) at lowlevellock.c:52
 #1  0x00007f2711b3c0a3 in __GI___pthread_mutex_lock (mutex=mutex@entry=0x7edb4421eb20) at ../nptl/pthread_mutex_lock.c:80
 #2  0x00007f27116ab718 in fellow_cache_lru_chgbatch_apply (lcb=lcb@entry=0x7f2705d813f0) at fellow_cache.c:1104
 #3  0x00007f27116bf7b0 in fellow_cache_obj_delete (fc=0x7f27112ed000, fco=<optimized out>, fco@entry=0x7ef07cf3c000, hash=hash@entry=0x7ed6fb0c69b0 "b5*\371\064\062j\362\212Ze礤(X0լ\266\216JL&\231\223\302\031\315\365\277\n") at fellow_cache.c:4808
 #4  0x00007f271167eec2 in sfemem_free (wrk=wrk@entry=0x7f2705d825d0, memoc=memoc@entry=0x7ed3c82b0b00) at fellow_storage.c:543
 #5  0x00007f271167f365 in sfemem_objfree (wrk=0x7f2705d825d0, memoc=0x7ed3c82b0b00) at fellow_storage.c:577
 #6  0x000056544bc964aa in ObjFreeObj (wrk=wrk@entry=0x7f2705d825d0, oc=0x7ed3c82b0b00) at cache/cache_obj.c:412
 #7  0x000056544bc8ce8f in HSH_DerefObjCore (wrk=0x7f2705d825d0, ocp=ocp@entry=0x7f2705d82360, rushmax=rushmax@entry=0) at cache/cache_hash.c:1059
 #8  0x000056544bc81530 in exp_expire (now=1691019717.3146894, ep=0x7f2711246280) at cache/cache_expire.c:360

03ff6049

02 Aug, 2023 12 commits
- Add lru_exponent parameter · 81a5c96c
  Nils Goroll authored Aug 02, 2023
  
  81a5c96c
- Multi-LRU infrastructure · 9342b7c6
  Nils Goroll authored Aug 02, 2023
```
we allow up to 1 << MAX_NLRU_EXPONENT (64) LRUs. When objects
are created, they get hashed onto LRUs. LRUs never die but
during shutdown.

Consequently, the number of LRUs can be tuned at run time.
```
  9342b7c6
- Move fibonacci hash · 6ca5e232
  Nils Goroll authored Aug 02, 2023
  
  6ca5e232
- Add a WRK_BgThread-light for use with fellow_cache_test · 53d71f7f
  Nils Goroll authored Aug 02, 2023
  
  53d71f7f
- Optimize FCO LRU eviction · 44d788bf
  Nils Goroll authored Aug 02, 2023
  
  44d788bf
- New locking paradigm for fellow_cache_lru_work · cbff1609
  Nils Goroll authored Aug 02, 2023
  
  cbff1609
- whitespace · 565503af
  Nils Goroll authored Aug 02, 2023
  
  565503af
- Refactor LRU as a separate object · 0a6d1120
  Nils Goroll authored Aug 02, 2023
  
  0a6d1120
- gc code edit residues · d1a3e476
  Nils Goroll authored Aug 02, 2023
  
  d1a3e476
- Only wake up the logwatcher if it is running · e1b1cc47
  Nils Goroll authored Aug 02, 2023
```
This fixes a use-after-destroy of the logwather condition variable
reported in #19
```
  e1b1cc47
- Adjust to varnishtest changes in varnish-cache · 154fe30b
  Nils Goroll authored Aug 02, 2023
```
Use "-noserver" for now

Ref varnish-cache 3368ee0651d34526f1a6592b2b665f545381e42c
```
  154fe30b
- Use -jnone for fellow VTCs to avoid permission issues · 74c3b179
  Nils Goroll authored Aug 02, 2023
```
Fixes #20
```
  74c3b179
31 Jul, 2023 10 commits

Add a hall of fame · 094ee6e3
Nils Goroll authored Jul 31, 2023

094ee6e3

Unfortunate Flexelinting · 908d7c6c

Nils Goroll authored Jul 31, 2023

I have tried hard to make value tracking understand the code, but to
no avail. It seems, for example

	assert(n <= 56)

and later
	assert(n > 0)

will just lead to flexelint knowning 1? but not 56 as the limit.

908d7c6c

Flexelinting: use local variables to help value tracking · b4c3cc11
Nils Goroll authored Jul 31, 2023

b4c3cc11
Flexelinting: Avoid temporary out of bounds pointer · bc498361
Nils Goroll authored Jul 31, 2023
```
it was never accessed, but triggered flexelint
```
bc498361

Correct stack regionlist size used during fellow_log_entries_prep() · 20d8a157

Nils Goroll authored Jul 31, 2023

First and foremost, fellow_log_prep_max_regions was defined wrong:

Except in fellow_cache_test, we call log submission with a maximum of
FELLOW_DISK_LOG_BLOCK_ENTRIES = 56 DLEs. The intention of the
fellow_log_prep_max_regions was was to allocate space to track return
of the maximum number of regions possibly contained. The exact maximum
would be (FELLOW_DISK_LOG_BLOCK_ENTRIES - 1) * DLE_REG_NREGION + 1 =
(55 * 4) + 1 = 221, which is higher than FELLOW_DISK_LOG_BLOCK_ENTRIES
* DLE_BAN_REG_NREGION = 56 * 3 = 168.

Yet it seems prudent to not reply on any fixed maximum, and also our
test cases call for a higher value, so we now define the maximum three
times the actually used value, and also ensure that we batch the code
to this size.

In addition, one assertion in fellow_log_entries_prep() was wrong (it
compared a number of DLEs with a number of regions).

We also tighten some assertions to help future analysis of possible
issues in this area:

- Ensure that the data path via fellow_log_entries_prep() only ever
  uses a region list on the stack.

- By using the regionlist_onlystk_add() macro, ensure that we hit an
  assertion on the array on stack, rather than one on the regionlist
  pointer.

Diff best viewed with -b

Fixes #18

20d8a157

Rename for clarity · 292ecc59
Nils Goroll authored Jul 31, 2023
```
Related to #18
```
292ecc59
Tighten DLE array sizing · db933215
Nils Goroll authored Jul 31, 2023
```
We should do this right and not over-allocate, this is just confusing.
```
db933215

Add miniobj check · e112b51f

Nils Goroll authored Jul 31, 2023

Motivated by #18, but does not fix the root cause yet

For the call path in the bug ticket, the stack regionlist is supposed
to be big enough and the root cause is that it is not. But at any
rate, for that call path, the regionlist is OK to be NULL and
regionlist_add() should never be called.

If, however, it _is_ called, the regionlist can't be NULL.

e112b51f

If IO submission fails, assert that there are completions · b7b26499
Nils Goroll authored Jul 30, 2023

b7b26499

Call try_flags() even when there are no flags to try · bd33e5d5

Nils Goroll authored Jul 31, 2023

Avoids:

fellow_io_uring.c:234:1: error: ‘try_flag’ defined but not used [-Werror=unused-function]
  234 | try_flag(unsigned flag)
      | ^~~~~~~~

bd33e5d5

28 Jul, 2023 2 commits

Batch LRU changes · cb92c844

Nils Goroll authored Jul 28, 2023

the lru_mtx is our most contended mtx.

As a first improvement, batch changes to LRU for multiple segments
and maintain the effective change locally outside the lru mtx (but
while holding the obj mtx).

cb92c844

Minor refactor · da5d8b9e
Nils Goroll authored Jul 28, 2023

da5d8b9e

24 Jul, 2023 7 commits

Test io_uring flags before using them · 66a07c7a
Nils Goroll authored Jul 24, 2023
```
is there a better way?

https://github.com/axboe/liburing/issues/906
```
66a07c7a
make fellow_io_fini() idempotent · ecf6f24c
Nils Goroll authored Jul 24, 2023
```
during error paths, we might call it multiple times
```
ecf6f24c
Use io_uring_free_probe() · a378525e
Nils Goroll authored Jul 24, 2023

a378525e
Changelog TLC · 085ba7b4
Nils Goroll authored Jul 24, 2023

085ba7b4

LRU-Touch objcts for OA_VARY · 94d37731

Nils Goroll authored Jul 24, 2023

varnish-cache does not touch objects for OA_VARY, but we need
to keep FCOs in memory which are frequently used during lookup.

Thoughts on why this should not race LRU:

- lru_list is owned by lru_mtx
- object can't go away, because
  - for call from hash, we hold the oh->mtx
  - otherwise, we hold a ref

94d37731

Prioritize object memory allocation for OA_VARY · 491339c2
Nils Goroll authored Jul 24, 2023
```
... which happens potentially under the cache lock
```
491339c2

New region alloc · a0e8e8f7

Nils Goroll authored Jul 10, 2023

upfront: This is not the segment allocation, which uses parts of the busy
obj region allocation, and is mostly motivated by how much data we need
to have in RAM at minimum.

For the region allocation, we have conflicting goals:

- To keep the log short, we want to use the least number of regions
- To reduce fragmentation, we want to use the largest possible
  allocations
- To use space efficiently, we want to split regions into power of
  two allocations.

Also, for chunked encoding, we do not have an upper limit of
how much space we are going to need, so we have to use the
estimate provided by fellow_busy_obj_getspace(). It can not
guess more than objsize_max.

The new region alloc algorithm takes this compromise:

- For the base case that we ran out of available regions (220), we
  allocate all we need without cramming.
- Otherwise if we need less than a chunk, we request it
- Otherwise if we know the size, we round down to a power of two
- Otherwise we round up

We then allow any cramming down to the chunk size, because that
is what our LRU reservation uses.

a0e8e8f7