- 19 Sep, 2023 9 commits
-
-
Nils Goroll authored
-
Nils Goroll authored
Fix a regression from 44d788bf: While we do want to reduce the critical region holding the lru mtx, we can not release the fco mtx before we have completed the trans- action on it with respect to LRU state. Because we might need to un-do the LRU removal of the FCO, we need to keep the mtx held until we know. Otherwise another thread can race us and change the state under our feet. In this case, we raced fellow_cache_obj_delete(): #9 0x00007f2711972fd6 in __GI___assert_fail ( assertion=assertion@entry=0x7f27116f3ec1 "(fcs->fcs_onlru) != 0", file=file@entry=0x7f27116f31f8 "fellow_cache.c", line=line@entry=3145, function=function@entry=0x7f27116f6b50 <__PRETTY_FUNCTION__.13829> "fellow_cache_lru_work") at assert.c:101 #10 0x00007f27116bd1db in fellow_cache_lru_work (wrk=wrk@entry=0x7edb0a8135d0, lru=lru@entry=0x7edb4421eb10) at fellow_cache.c:3145 #11 0x00007f27116bd7c7 in fellow_cache_lru_thread (wrk=0x7edb0a8135d0, priv=0x7edb4421eb10) at fellow_cache.c:3322 #12 0x000056544bcc06cb in wrk_bgthread (arg=0x7edb3a6e0900) at cache/cache_wrk.c:104 #13 0x00007f2711b39609 in start_thread (arg=<optimized out>) at pthread_create.c:477 #14 0x00007f2711a5e133 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95 (gdb) p *fcs $1 = {magic = 25208, state = FCO_INCORE, fcs_onlru = 0, fco_infdb = 0, lcb_add = 0, lcb_remove = 0, fco_lru_mutate = 0, refcnt = 0, lru_list = {vtqe_next = 0x7eebee7a00a0, vtqe_prev = 0x7edb4421eb48}, fco = 0x7ef07cf3c000, disk_seg = 0x7ef1cd8e8008, alloc = {ptr = 0x7ef1cd8e8000, size = 4096}, len = 0} (gdb) p *fcs->fco $2 = {magic = 2206029151, logstate = FCOL_DELETED, lru = 0x7edb4421eb10, fco_mem = {ptr = 0x7ef07cf3c000, bits = 13 '\r', magic = 4294193151}, mtx = pthread_mutex_t = {Type = Normal, Status = Acquired, possibly with waiters, Owner ID = 543234, Robust = No, Shared = No, Protocol = None}, cond = pthread_cond_t = {Threads known to still execute a wait function = 0, Clock ID = CLOCK_REALTIME, Shared = No}, oc = 0x7ed3c82b0b00, fdb = {fdb = 2493649440769}, fdb_entry = {rbe_link = {0x7eebec72c000, 0x0, 0x0}}, fdo_fcs = {magic = 25208, state = FCO_INCORE, fcs_onlru = 0, fco_infdb = 0, lcb_add = 0, lcb_remove = 0, fco_lru_mutate = 0, refcnt = 0, lru_list = {vtqe_next = 0x7eebee7a00a0, vtqe_prev = 0x7edb4421eb48}, fco = 0x7ef07cf3c000, disk_seg = 0x7ef1cd8e8008, alloc = { ptr = 0x7ef1cd8e8000, size = 4096}, len = 0}, aa_esidata_seg = {magic = 25208, state = FCS_USABLE, fcs_onlru = 0, fco_infdb = 0, lcb_add = 0, lcb_remove = 0, fco_lru_mutate = 0, refcnt = 0, lru_list = { vtqe_next = 0x0, vtqe_prev = 0x0}, fco = 0x7ef07cf3c000, disk_seg = 0x7ef1cd8e80f0, alloc = {ptr = 0x0, size = 0}, len = 0}, seglist = {magic = 3403082203, lsegs = 122, fdsl = 0x7ef1cd8e8178, fdsl_sz = 0, fcsl_sz = 0, next = 0x0, segs = 0x7ef07cf3c148}} racing thread: Thread 3478 (Thread 0x7f2705d84700 (LWP 543234)): #0 __lll_lock_wait (futex=futex@entry=0x7edb4421eb20, private=0) at lowlevellock.c:52 #1 0x00007f2711b3c0a3 in __GI___pthread_mutex_lock (mutex=mutex@entry=0x7edb4421eb20) at ../nptl/pthread_mutex_lock.c:80 #2 0x00007f27116ab718 in fellow_cache_lru_chgbatch_apply (lcb=lcb@entry=0x7f2705d813f0) at fellow_cache.c:1104 #3 0x00007f27116bf7b0 in fellow_cache_obj_delete (fc=0x7f27112ed000, fco=<optimized out>, fco@entry=0x7ef07cf3c000, hash=hash@entry=0x7ed6fb0c69b0 "b5*\371\064\062j\362\212Ze礤(X0լ\266\216JL&\231\223\302\031\315\365\277\n") at fellow_cache.c:4808 #4 0x00007f271167eec2 in sfemem_free (wrk=wrk@entry=0x7f2705d825d0, memoc=memoc@entry=0x7ed3c82b0b00) at fellow_storage.c:543 #5 0x00007f271167f365 in sfemem_objfree (wrk=0x7f2705d825d0, memoc=0x7ed3c82b0b00) at fellow_storage.c:577 #6 0x000056544bc964aa in ObjFreeObj (wrk=wrk@entry=0x7f2705d825d0, oc=0x7ed3c82b0b00) at cache/cache_obj.c:412 #7 0x000056544bc8ce8f in HSH_DerefObjCore (wrk=0x7f2705d825d0, ocp=ocp@entry=0x7f2705d82360, rushmax=rushmax@entry=0) at cache/cache_hash.c:1059 #8 0x000056544bc81530 in exp_expire (now=1691019717.3146894, ep=0x7f2711246280) at cache/cache_expire.c:360
-
Nils Goroll authored
-
Nils Goroll authored
we allow up to 1 << MAX_NLRU_EXPONENT (64) LRUs. When objects are created, they get hashed onto LRUs. LRUs never die but during shutdown. Consequently, the number of LRUs can be tuned at run time.
-
Nils Goroll authored
-
Nils Goroll authored
-
Nils Goroll authored
-
Nils Goroll authored
-
Nils Goroll authored
-
- 03 Aug, 2023 3 commits
-
-
Nils Goroll authored
-
Nils Goroll authored
-
Nils Goroll authored
-
- 02 Aug, 2023 2 commits
-
-
Nils Goroll authored
This fixes a use-after-destroy of the logwather condition variable reported in #19
-
Nils Goroll authored
Fixes #20
-
- 31 Jul, 2023 12 commits
-
-
Nils Goroll authored
-
Nils Goroll authored
I have tried hard to make value tracking understand the code, but to no avail. It seems, for example assert(n <= 56) and later assert(n > 0) will just lead to flexelint knowning 1? but not 56 as the limit.
-
Nils Goroll authored
-
Nils Goroll authored
it was never accessed, but triggered flexelint
-
Nils Goroll authored
First and foremost, fellow_log_prep_max_regions was defined wrong: Except in fellow_cache_test, we call log submission with a maximum of FELLOW_DISK_LOG_BLOCK_ENTRIES = 56 DLEs. The intention of the fellow_log_prep_max_regions was was to allocate space to track return of the maximum number of regions possibly contained. The exact maximum would be (FELLOW_DISK_LOG_BLOCK_ENTRIES - 1) * DLE_REG_NREGION + 1 = (55 * 4) + 1 = 221, which is higher than FELLOW_DISK_LOG_BLOCK_ENTRIES * DLE_BAN_REG_NREGION = 56 * 3 = 168. Yet it seems prudent to not reply on any fixed maximum, and also our test cases call for a higher value, so we now define the maximum three times the actually used value, and also ensure that we batch the code to this size. In addition, one assertion in fellow_log_entries_prep() was wrong (it compared a number of DLEs with a number of regions). We also tighten some assertions to help future analysis of possible issues in this area: - Ensure that the data path via fellow_log_entries_prep() only ever uses a region list on the stack. - By using the regionlist_onlystk_add() macro, ensure that we hit an assertion on the array on stack, rather than one on the regionlist pointer. Diff best viewed with -b Fixes #18
-
Nils Goroll authored
Related to #18
-
Nils Goroll authored
We should do this right and not over-allocate, this is just confusing.
-
Nils Goroll authored
Motivated by #18, but does not fix the root cause yet For the call path in the bug ticket, the stack regionlist is supposed to be big enough and the root cause is that it is not. But at any rate, for that call path, the regionlist is OK to be NULL and regionlist_add() should never be called. If, however, it _is_ called, the regionlist can't be NULL.
-
Nils Goroll authored
-
Nils Goroll authored
Avoids: fellow_io_uring.c:234:1: error: ‘try_flag’ defined but not used [-Werror=unused-function] 234 | try_flag(unsigned flag) | ^~~~~~~~
-
Nils Goroll authored
the lru_mtx is our most contended mtx. As a first improvement, batch changes to LRU for multiple segments and maintain the effective change locally outside the lru mtx (but while holding the obj mtx).
-
Nils Goroll authored
-
- 24 Jul, 2023 14 commits
-
-
Nils Goroll authored
is there a better way? https://github.com/axboe/liburing/issues/906
-
Nils Goroll authored
during error paths, we might call it multiple times
-
Nils Goroll authored
-
Nils Goroll authored
-
Nils Goroll authored
varnish-cache does not touch objects for OA_VARY, but we need to keep FCOs in memory which are frequently used during lookup. Thoughts on why this should not race LRU: - lru_list is owned by lru_mtx - object can't go away, because - for call from hash, we hold the oh->mtx - otherwise, we hold a ref
-
Nils Goroll authored
... which happens potentially under the cache lock
-
Nils Goroll authored
upfront: This is not the segment allocation, which uses parts of the busy obj region allocation, and is mostly motivated by how much data we need to have in RAM at minimum. For the region allocation, we have conflicting goals: - To keep the log short, we want to use the least number of regions - To reduce fragmentation, we want to use the largest possible allocations - To use space efficiently, we want to split regions into power of two allocations. Also, for chunked encoding, we do not have an upper limit of how much space we are going to need, so we have to use the estimate provided by fellow_busy_obj_getspace(). It can not guess more than objsize_max. The new region alloc algorithm takes this compromise: - For the base case that we ran out of available regions (220), we allocate all we need without cramming. - Otherwise if we need less than a chunk, we request it - Otherwise if we know the size, we round down to a power of two - Otherwise we round up We then allow any cramming down to the chunk size, because that is what our LRU reservation uses.
-
Nils Goroll authored
-
Nils Goroll authored
-
Nils Goroll authored
-
Nils Goroll authored
Ref #10
-
Nils Goroll authored
-
Nils Goroll authored
-
Nils Goroll authored
-