Commits · bd78105142bd105877d4ffca2c726eb36127bb09 · uplex-varnish / slash

30 Sep, 2023 15 commits
- Assert that fco LRU state is not changed during mutation · bd781051
  Nils Goroll authored Sep 30, 2023
  
  bd781051
- A simple assertion to keep once #25 is fixed · ed7d933f
  Nils Goroll authored Sep 30, 2023
  
  ed7d933f
- Hunt bug #25 · 2abdb0a8
  Nils Goroll authored Sep 29, 2023
```
Conflicts:
	src/fellow_cache.c
```
  2abdb0a8
- Flexelint · a3fe40a2
  Nils Goroll authored Sep 29, 2023
  
  a3fe40a2
- Test VTAILQ_CONCAT with empty first list · bd4bf017
  Nils Goroll authored Sep 29, 2023
  
  bd4bf017
- Assert that objects' lru pointer is correct · 20d2b993
  Nils Goroll authored Sep 29, 2023
  
  20d2b993
- Use error checking mutexes · cac336fa
  Nils Goroll authored Sep 29, 2023
```
Motivated by #25, which looks like a self-induced deadlock
in fellow_cache_lru_work()
```
  cac336fa
- Optimize eviction of subsequent segments of the same object · c8aa2fc2
  Nils Goroll authored Sep 28, 2023
  
  c8aa2fc2
- in memory LRU, try to meet the deficit in one go · 4019cc1b
  Nils Goroll authored Sep 28, 2023
  
  4019cc1b
- Use buddy_returns in memory LRU · 2160d040
  Nils Goroll authored Sep 28, 2023
  
  2160d040
- Buddy allocator: track size of buddy_returns · 1e1a2ed8
  Nils Goroll authored Sep 28, 2023
  
  1e1a2ed8
- Improve comment (explains why previous commit is correct) · 3031be02
  Nils Goroll authored Sep 28, 2023
  
  3031be02
- In mem LRU, skip the fco for which mutation has failed · ac60ee7e
  Nils Goroll authored Sep 28, 2023
```
I suspected we could run into an infinite loop here. Suppose there is
only one fco on the LRU and we fail to mutate it, then it was
re-inserted again and again.

Related to #25, but I am not sure yet if it is the root cause
```
  ac60ee7e
- Comment typo · 3f8553b6
  Nils Goroll authored Sep 25, 2023
  
  3f8553b6
- Silence Flexelint · 3e879528
  Nils Goroll authored Sep 25, 2023
  
  3e879528
25 Sep, 2023 2 commits

Nils Goroll authored Sep 25, 2023

Avoid memory LRU racing disk LRU.

Disk LRU uses the varnish-cache LRU facility, which works by setting
the OC_F_DYING and gaining one reference, resulting in two references.

One is lost again by the thread initiating the LRU nuke, the other by
the EXP thread. Between the two events, the refcnt is one again, thus
stvfe_mutate could race.

I believe this fixes #23 and #24. If not, please reopen

d4131883

Improve comment · 0acd3537
Nils Goroll authored Sep 19, 2023

0acd3537

19 Sep, 2023 19 commits

Test reaching objsize_max under simulated memory pressure · 818588cc
Nils Goroll authored Sep 19, 2023
```
Tests #22
```
818588cc

Release the fco mtx during "unbusy" log submission · e7976b24

Nils Goroll authored Sep 19, 2023

the test case from the next commit exposed a deadlock because the
only object in the test would consume all memory und could not
get LRUd because fellow_cache_async_write_complete() would hold
the fco mtx during log submission.

e7976b24

Be more precise about the maximum number of segments per list block · 62e1f319

Nils Goroll authored Sep 19, 2023

UINT16_MAX would be just over a disk block, and allocating another
3.93k for a single additional segment does not make sense.

	(gdb) p /x sizeof(struct fellow_disk_seg) * 65534 +
		sizeof(struct fellow_disk_seglist)
	$7 = 0x37ffd0

Also fix a potential overflow, where a uint16_t was incremented after
clamping to UINT16_MAX.

62e1f319

Add test for max size · a0f2bb17
Nils Goroll authored Sep 19, 2023
```
Tests #22
```
a0f2bb17
Allocate object segments for the worst case object size · a7ee75b0
Nils Goroll authored Sep 19, 2023
```
Fixes another case of #22
```
a7ee75b0
Allocate seglists for the worst case object size · 620d3ea2
Nils Goroll authored Sep 19, 2023
```
Fixes one case for #22
```
620d3ea2
Add an allocation limit injection facility · 87944d21
Nils Goroll authored Sep 19, 2023

87944d21
Run LRU during fellow_cache_test · 41f87ada
Nils Goroll authored Sep 13, 2023

41f87ada
No need for VSL infrastructure if LRU does not log · d52dff20
Nils Goroll authored Sep 13, 2023

d52dff20

gc obsolete code · b86fd0fb

Nils Goroll authored Aug 03, 2023

fellow_cache_lru_chg() already calls fellow_cache_lru_chgbatch_apply()
when the remove array is full.

b86fd0fb

Keep a counter of number of LRU entries · b36a96be
Nils Goroll authored Aug 03, 2023

b36a96be

Fix: Optimize FCO LRU eviction · b696a8f8

Nils Goroll authored Aug 03, 2023

Fix a regression from 44d788bf:

While we do want to reduce the critical region holding the lru mtx,
we can not release the fco mtx before we have completed the trans-
action on it with respect to LRU state.

Because we might need to un-do the LRU removal of the FCO, we
need to keep the mtx held until we know.

Otherwise another thread can race us and change the state under
our feet.

In this case, we raced fellow_cache_obj_delete():

 #9  0x00007f2711972fd6 in __GI___assert_fail (
     assertion=assertion@entry=0x7f27116f3ec1 "(fcs->fcs_onlru) != 0",
     file=file@entry=0x7f27116f31f8 "fellow_cache.c", line=line@entry=3145,
     function=function@entry=0x7f27116f6b50 <__PRETTY_FUNCTION__.13829> "fellow_cache_lru_work") at assert.c:101
 #10 0x00007f27116bd1db in fellow_cache_lru_work (wrk=wrk@entry=0x7edb0a8135d0, lru=lru@entry=0x7edb4421eb10)
     at fellow_cache.c:3145
 #11 0x00007f27116bd7c7 in fellow_cache_lru_thread (wrk=0x7edb0a8135d0, priv=0x7edb4421eb10)
     at fellow_cache.c:3322
 #12 0x000056544bcc06cb in wrk_bgthread (arg=0x7edb3a6e0900) at cache/cache_wrk.c:104
 #13 0x00007f2711b39609 in start_thread (arg=<optimized out>) at pthread_create.c:477
 #14 0x00007f2711a5e133 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

(gdb) p *fcs
$1 = {magic = 25208, state = FCO_INCORE, fcs_onlru = 0, fco_infdb = 0, lcb_add = 0, lcb_remove = 0,
  fco_lru_mutate = 0, refcnt = 0, lru_list = {vtqe_next = 0x7eebee7a00a0, vtqe_prev = 0x7edb4421eb48},
  fco = 0x7ef07cf3c000, disk_seg = 0x7ef1cd8e8008, alloc = {ptr = 0x7ef1cd8e8000, size = 4096}, len = 0}
(gdb) p *fcs->fco
$2 = {magic = 2206029151, logstate = FCOL_DELETED, lru = 0x7edb4421eb10, fco_mem = {ptr = 0x7ef07cf3c000,
    bits = 13 '\r', magic = 4294193151}, mtx = pthread_mutex_t = {Type = Normal,
    Status = Acquired, possibly with waiters, Owner ID = 543234, Robust = No, Shared = No, Protocol = None},
  cond = pthread_cond_t = {Threads known to still execute a wait function = 0, Clock ID = CLOCK_REALTIME,
    Shared = No}, oc = 0x7ed3c82b0b00, fdb = {fdb = 2493649440769}, fdb_entry = {rbe_link = {0x7eebec72c000,
      0x0, 0x0}}, fdo_fcs = {magic = 25208, state = FCO_INCORE, fcs_onlru = 0, fco_infdb = 0, lcb_add = 0,
    lcb_remove = 0, fco_lru_mutate = 0, refcnt = 0, lru_list = {vtqe_next = 0x7eebee7a00a0,
      vtqe_prev = 0x7edb4421eb48}, fco = 0x7ef07cf3c000, disk_seg = 0x7ef1cd8e8008, alloc = {
      ptr = 0x7ef1cd8e8000, size = 4096}, len = 0}, aa_esidata_seg = {magic = 25208, state = FCS_USABLE,
    fcs_onlru = 0, fco_infdb = 0, lcb_add = 0, lcb_remove = 0, fco_lru_mutate = 0, refcnt = 0, lru_list = {
      vtqe_next = 0x0, vtqe_prev = 0x0}, fco = 0x7ef07cf3c000, disk_seg = 0x7ef1cd8e80f0, alloc = {ptr = 0x0,
      size = 0}, len = 0}, seglist = {magic = 3403082203, lsegs = 122, fdsl = 0x7ef1cd8e8178, fdsl_sz = 0,
    fcsl_sz = 0, next = 0x0, segs = 0x7ef07cf3c148}}

racing thread:

 Thread 3478 (Thread 0x7f2705d84700 (LWP 543234)):
 #0  __lll_lock_wait (futex=futex@entry=0x7edb4421eb20, private=0) at lowlevellock.c:52
 #1  0x00007f2711b3c0a3 in __GI___pthread_mutex_lock (mutex=mutex@entry=0x7edb4421eb20) at ../nptl/pthread_mutex_lock.c:80
 #2  0x00007f27116ab718 in fellow_cache_lru_chgbatch_apply (lcb=lcb@entry=0x7f2705d813f0) at fellow_cache.c:1104
 #3  0x00007f27116bf7b0 in fellow_cache_obj_delete (fc=0x7f27112ed000, fco=<optimized out>, fco@entry=0x7ef07cf3c000, hash=hash@entry=0x7ed6fb0c69b0 "b5*\371\064\062j\362\212Ze礤(X0լ\266\216JL&\231\223\302\031\315\365\277\n") at fellow_cache.c:4808
 #4  0x00007f271167eec2 in sfemem_free (wrk=wrk@entry=0x7f2705d825d0, memoc=memoc@entry=0x7ed3c82b0b00) at fellow_storage.c:543
 #5  0x00007f271167f365 in sfemem_objfree (wrk=0x7f2705d825d0, memoc=0x7ed3c82b0b00) at fellow_storage.c:577
 #6  0x000056544bc964aa in ObjFreeObj (wrk=wrk@entry=0x7f2705d825d0, oc=0x7ed3c82b0b00) at cache/cache_obj.c:412
 #7  0x000056544bc8ce8f in HSH_DerefObjCore (wrk=0x7f2705d825d0, ocp=ocp@entry=0x7f2705d82360, rushmax=rushmax@entry=0) at cache/cache_hash.c:1059
 #8  0x000056544bc81530 in exp_expire (now=1691019717.3146894, ep=0x7f2711246280) at cache/cache_expire.c:360

b696a8f8

Add lru_exponent parameter · fe785e54
Nils Goroll authored Aug 02, 2023

fe785e54

Multi-LRU infrastructure · 4ed3cf53

Nils Goroll authored Aug 02, 2023

we allow up to 1 << MAX_NLRU_EXPONENT (64) LRUs. When objects
are created, they get hashed onto LRUs. LRUs never die but
during shutdown.

Consequently, the number of LRUs can be tuned at run time.

4ed3cf53

Move fibonacci hash · a3b549d8
Nils Goroll authored Aug 02, 2023

a3b549d8
7.3/master compat · 5b11a83a
Nils Goroll authored Sep 19, 2023

5b11a83a
Add a WRK_BgThread-light for use with fellow_cache_test · 6a1aae7e
Nils Goroll authored Aug 02, 2023

6a1aae7e
Optimize FCO LRU eviction · f18a34eb
Nils Goroll authored Aug 02, 2023

f18a34eb
New locking paradigm for fellow_cache_lru_work · d75053a1
Nils Goroll authored Aug 02, 2023

d75053a1

03 Aug, 2023 3 commits
- whitespace · f53aec90
  Nils Goroll authored Aug 02, 2023
  
  f53aec90
- Refactor LRU as a separate object · 22a34188
  Nils Goroll authored Aug 02, 2023
  
  22a34188
- gc code edit residues · c65dc348
  Nils Goroll authored Aug 02, 2023
  
  c65dc348
02 Aug, 2023 1 commit
- Only wake up the logwatcher if it is running · eac587a8
  Nils Goroll authored Aug 02, 2023
```
This fixes a use-after-destroy of the logwather condition variable
reported in #19
```
  eac587a8