- 30 Sep, 2023 15 commits
-
-
Nils Goroll authored
-
Nils Goroll authored
-
Nils Goroll authored
Conflicts: src/fellow_cache.c
-
Nils Goroll authored
-
Nils Goroll authored
-
Nils Goroll authored
-
Nils Goroll authored
Motivated by #25, which looks like a self-induced deadlock in fellow_cache_lru_work()
-
Nils Goroll authored
-
Nils Goroll authored
-
Nils Goroll authored
-
Nils Goroll authored
-
Nils Goroll authored
-
Nils Goroll authored
I suspected we could run into an infinite loop here. Suppose there is only one fco on the LRU and we fail to mutate it, then it was re-inserted again and again. Related to #25, but I am not sure yet if it is the root cause
-
Nils Goroll authored
-
Nils Goroll authored
-
- 25 Sep, 2023 2 commits
-
-
Nils Goroll authored
Avoid memory LRU racing disk LRU. Disk LRU uses the varnish-cache LRU facility, which works by setting the OC_F_DYING and gaining one reference, resulting in two references. One is lost again by the thread initiating the LRU nuke, the other by the EXP thread. Between the two events, the refcnt is one again, thus stvfe_mutate could race. I believe this fixes #23 and #24. If not, please reopen
-
Nils Goroll authored
-
- 19 Sep, 2023 19 commits
-
-
Nils Goroll authored
Tests #22
-
Nils Goroll authored
the test case from the next commit exposed a deadlock because the only object in the test would consume all memory und could not get LRUd because fellow_cache_async_write_complete() would hold the fco mtx during log submission.
-
Nils Goroll authored
UINT16_MAX would be just over a disk block, and allocating another 3.93k for a single additional segment does not make sense. (gdb) p /x sizeof(struct fellow_disk_seg) * 65534 + sizeof(struct fellow_disk_seglist) $7 = 0x37ffd0 Also fix a potential overflow, where a uint16_t was incremented after clamping to UINT16_MAX.
-
Nils Goroll authored
Tests #22
-
Nils Goroll authored
Fixes another case of #22
-
Nils Goroll authored
Fixes one case for #22
-
Nils Goroll authored
-
Nils Goroll authored
-
Nils Goroll authored
-
Nils Goroll authored
fellow_cache_lru_chg() already calls fellow_cache_lru_chgbatch_apply() when the remove array is full.
-
Nils Goroll authored
-
Nils Goroll authored
Fix a regression from 44d788bf: While we do want to reduce the critical region holding the lru mtx, we can not release the fco mtx before we have completed the trans- action on it with respect to LRU state. Because we might need to un-do the LRU removal of the FCO, we need to keep the mtx held until we know. Otherwise another thread can race us and change the state under our feet. In this case, we raced fellow_cache_obj_delete(): #9 0x00007f2711972fd6 in __GI___assert_fail ( assertion=assertion@entry=0x7f27116f3ec1 "(fcs->fcs_onlru) != 0", file=file@entry=0x7f27116f31f8 "fellow_cache.c", line=line@entry=3145, function=function@entry=0x7f27116f6b50 <__PRETTY_FUNCTION__.13829> "fellow_cache_lru_work") at assert.c:101 #10 0x00007f27116bd1db in fellow_cache_lru_work (wrk=wrk@entry=0x7edb0a8135d0, lru=lru@entry=0x7edb4421eb10) at fellow_cache.c:3145 #11 0x00007f27116bd7c7 in fellow_cache_lru_thread (wrk=0x7edb0a8135d0, priv=0x7edb4421eb10) at fellow_cache.c:3322 #12 0x000056544bcc06cb in wrk_bgthread (arg=0x7edb3a6e0900) at cache/cache_wrk.c:104 #13 0x00007f2711b39609 in start_thread (arg=<optimized out>) at pthread_create.c:477 #14 0x00007f2711a5e133 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95 (gdb) p *fcs $1 = {magic = 25208, state = FCO_INCORE, fcs_onlru = 0, fco_infdb = 0, lcb_add = 0, lcb_remove = 0, fco_lru_mutate = 0, refcnt = 0, lru_list = {vtqe_next = 0x7eebee7a00a0, vtqe_prev = 0x7edb4421eb48}, fco = 0x7ef07cf3c000, disk_seg = 0x7ef1cd8e8008, alloc = {ptr = 0x7ef1cd8e8000, size = 4096}, len = 0} (gdb) p *fcs->fco $2 = {magic = 2206029151, logstate = FCOL_DELETED, lru = 0x7edb4421eb10, fco_mem = {ptr = 0x7ef07cf3c000, bits = 13 '\r', magic = 4294193151}, mtx = pthread_mutex_t = {Type = Normal, Status = Acquired, possibly with waiters, Owner ID = 543234, Robust = No, Shared = No, Protocol = None}, cond = pthread_cond_t = {Threads known to still execute a wait function = 0, Clock ID = CLOCK_REALTIME, Shared = No}, oc = 0x7ed3c82b0b00, fdb = {fdb = 2493649440769}, fdb_entry = {rbe_link = {0x7eebec72c000, 0x0, 0x0}}, fdo_fcs = {magic = 25208, state = FCO_INCORE, fcs_onlru = 0, fco_infdb = 0, lcb_add = 0, lcb_remove = 0, fco_lru_mutate = 0, refcnt = 0, lru_list = {vtqe_next = 0x7eebee7a00a0, vtqe_prev = 0x7edb4421eb48}, fco = 0x7ef07cf3c000, disk_seg = 0x7ef1cd8e8008, alloc = { ptr = 0x7ef1cd8e8000, size = 4096}, len = 0}, aa_esidata_seg = {magic = 25208, state = FCS_USABLE, fcs_onlru = 0, fco_infdb = 0, lcb_add = 0, lcb_remove = 0, fco_lru_mutate = 0, refcnt = 0, lru_list = { vtqe_next = 0x0, vtqe_prev = 0x0}, fco = 0x7ef07cf3c000, disk_seg = 0x7ef1cd8e80f0, alloc = {ptr = 0x0, size = 0}, len = 0}, seglist = {magic = 3403082203, lsegs = 122, fdsl = 0x7ef1cd8e8178, fdsl_sz = 0, fcsl_sz = 0, next = 0x0, segs = 0x7ef07cf3c148}} racing thread: Thread 3478 (Thread 0x7f2705d84700 (LWP 543234)): #0 __lll_lock_wait (futex=futex@entry=0x7edb4421eb20, private=0) at lowlevellock.c:52 #1 0x00007f2711b3c0a3 in __GI___pthread_mutex_lock (mutex=mutex@entry=0x7edb4421eb20) at ../nptl/pthread_mutex_lock.c:80 #2 0x00007f27116ab718 in fellow_cache_lru_chgbatch_apply (lcb=lcb@entry=0x7f2705d813f0) at fellow_cache.c:1104 #3 0x00007f27116bf7b0 in fellow_cache_obj_delete (fc=0x7f27112ed000, fco=<optimized out>, fco@entry=0x7ef07cf3c000, hash=hash@entry=0x7ed6fb0c69b0 "b5*\371\064\062j\362\212Ze礤(X0լ\266\216JL&\231\223\302\031\315\365\277\n") at fellow_cache.c:4808 #4 0x00007f271167eec2 in sfemem_free (wrk=wrk@entry=0x7f2705d825d0, memoc=memoc@entry=0x7ed3c82b0b00) at fellow_storage.c:543 #5 0x00007f271167f365 in sfemem_objfree (wrk=0x7f2705d825d0, memoc=0x7ed3c82b0b00) at fellow_storage.c:577 #6 0x000056544bc964aa in ObjFreeObj (wrk=wrk@entry=0x7f2705d825d0, oc=0x7ed3c82b0b00) at cache/cache_obj.c:412 #7 0x000056544bc8ce8f in HSH_DerefObjCore (wrk=0x7f2705d825d0, ocp=ocp@entry=0x7f2705d82360, rushmax=rushmax@entry=0) at cache/cache_hash.c:1059 #8 0x000056544bc81530 in exp_expire (now=1691019717.3146894, ep=0x7f2711246280) at cache/cache_expire.c:360
-
Nils Goroll authored
-
Nils Goroll authored
we allow up to 1 << MAX_NLRU_EXPONENT (64) LRUs. When objects are created, they get hashed onto LRUs. LRUs never die but during shutdown. Consequently, the number of LRUs can be tuned at run time.
-
Nils Goroll authored
-
Nils Goroll authored
-
Nils Goroll authored
-
Nils Goroll authored
-
Nils Goroll authored
-
- 03 Aug, 2023 3 commits
-
-
Nils Goroll authored
-
Nils Goroll authored
-
Nils Goroll authored
-
- 02 Aug, 2023 1 commit
-
-
Nils Goroll authored
This fixes a use-after-destroy of the logwather condition variable reported in #19
-