1. 19 Sep, 2023 13 commits
    • Nils Goroll's avatar
      Add an allocation limit injection facility · 87944d21
      Nils Goroll authored
      87944d21
    • Nils Goroll's avatar
      Run LRU during fellow_cache_test · 41f87ada
      Nils Goroll authored
      41f87ada
    • Nils Goroll's avatar
      d52dff20
    • Nils Goroll's avatar
      gc obsolete code · b86fd0fb
      Nils Goroll authored
      fellow_cache_lru_chg() already calls fellow_cache_lru_chgbatch_apply()
      when the remove array is full.
      b86fd0fb
    • Nils Goroll's avatar
      Keep a counter of number of LRU entries · b36a96be
      Nils Goroll authored
      b36a96be
    • Nils Goroll's avatar
      Fix: Optimize FCO LRU eviction · b696a8f8
      Nils Goroll authored
      Fix a regression from 44d788bf:
      
      While we do want to reduce the critical region holding the lru mtx,
      we can not release the fco mtx before we have completed the trans-
      action on it with respect to LRU state.
      
      Because we might need to un-do the LRU removal of the FCO, we
      need to keep the mtx held until we know.
      
      Otherwise another thread can race us and change the state under
      our feet.
      
      In this case, we raced fellow_cache_obj_delete():
      
       #9  0x00007f2711972fd6 in __GI___assert_fail (
           assertion=assertion@entry=0x7f27116f3ec1 "(fcs->fcs_onlru) != 0",
           file=file@entry=0x7f27116f31f8 "fellow_cache.c", line=line@entry=3145,
           function=function@entry=0x7f27116f6b50 <__PRETTY_FUNCTION__.13829> "fellow_cache_lru_work") at assert.c:101
       #10 0x00007f27116bd1db in fellow_cache_lru_work (wrk=wrk@entry=0x7edb0a8135d0, lru=lru@entry=0x7edb4421eb10)
           at fellow_cache.c:3145
       #11 0x00007f27116bd7c7 in fellow_cache_lru_thread (wrk=0x7edb0a8135d0, priv=0x7edb4421eb10)
           at fellow_cache.c:3322
       #12 0x000056544bcc06cb in wrk_bgthread (arg=0x7edb3a6e0900) at cache/cache_wrk.c:104
       #13 0x00007f2711b39609 in start_thread (arg=<optimized out>) at pthread_create.c:477
       #14 0x00007f2711a5e133 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
      
      (gdb) p *fcs
      $1 = {magic = 25208, state = FCO_INCORE, fcs_onlru = 0, fco_infdb = 0, lcb_add = 0, lcb_remove = 0,
        fco_lru_mutate = 0, refcnt = 0, lru_list = {vtqe_next = 0x7eebee7a00a0, vtqe_prev = 0x7edb4421eb48},
        fco = 0x7ef07cf3c000, disk_seg = 0x7ef1cd8e8008, alloc = {ptr = 0x7ef1cd8e8000, size = 4096}, len = 0}
      (gdb) p *fcs->fco
      $2 = {magic = 2206029151, logstate = FCOL_DELETED, lru = 0x7edb4421eb10, fco_mem = {ptr = 0x7ef07cf3c000,
          bits = 13 '\r', magic = 4294193151}, mtx = pthread_mutex_t = {Type = Normal,
          Status = Acquired, possibly with waiters, Owner ID = 543234, Robust = No, Shared = No, Protocol = None},
        cond = pthread_cond_t = {Threads known to still execute a wait function = 0, Clock ID = CLOCK_REALTIME,
          Shared = No}, oc = 0x7ed3c82b0b00, fdb = {fdb = 2493649440769}, fdb_entry = {rbe_link = {0x7eebec72c000,
            0x0, 0x0}}, fdo_fcs = {magic = 25208, state = FCO_INCORE, fcs_onlru = 0, fco_infdb = 0, lcb_add = 0,
          lcb_remove = 0, fco_lru_mutate = 0, refcnt = 0, lru_list = {vtqe_next = 0x7eebee7a00a0,
            vtqe_prev = 0x7edb4421eb48}, fco = 0x7ef07cf3c000, disk_seg = 0x7ef1cd8e8008, alloc = {
            ptr = 0x7ef1cd8e8000, size = 4096}, len = 0}, aa_esidata_seg = {magic = 25208, state = FCS_USABLE,
          fcs_onlru = 0, fco_infdb = 0, lcb_add = 0, lcb_remove = 0, fco_lru_mutate = 0, refcnt = 0, lru_list = {
            vtqe_next = 0x0, vtqe_prev = 0x0}, fco = 0x7ef07cf3c000, disk_seg = 0x7ef1cd8e80f0, alloc = {ptr = 0x0,
            size = 0}, len = 0}, seglist = {magic = 3403082203, lsegs = 122, fdsl = 0x7ef1cd8e8178, fdsl_sz = 0,
          fcsl_sz = 0, next = 0x0, segs = 0x7ef07cf3c148}}
      
      racing thread:
      
       Thread 3478 (Thread 0x7f2705d84700 (LWP 543234)):
       #0  __lll_lock_wait (futex=futex@entry=0x7edb4421eb20, private=0) at lowlevellock.c:52
       #1  0x00007f2711b3c0a3 in __GI___pthread_mutex_lock (mutex=mutex@entry=0x7edb4421eb20) at ../nptl/pthread_mutex_lock.c:80
       #2  0x00007f27116ab718 in fellow_cache_lru_chgbatch_apply (lcb=lcb@entry=0x7f2705d813f0) at fellow_cache.c:1104
       #3  0x00007f27116bf7b0 in fellow_cache_obj_delete (fc=0x7f27112ed000, fco=<optimized out>, fco@entry=0x7ef07cf3c000, hash=hash@entry=0x7ed6fb0c69b0 "b5*\371\064\062j\362\212Ze礤(X0լ\266\216JL&\231\223\302\031\315\365\277\n") at fellow_cache.c:4808
       #4  0x00007f271167eec2 in sfemem_free (wrk=wrk@entry=0x7f2705d825d0, memoc=memoc@entry=0x7ed3c82b0b00) at fellow_storage.c:543
       #5  0x00007f271167f365 in sfemem_objfree (wrk=0x7f2705d825d0, memoc=0x7ed3c82b0b00) at fellow_storage.c:577
       #6  0x000056544bc964aa in ObjFreeObj (wrk=wrk@entry=0x7f2705d825d0, oc=0x7ed3c82b0b00) at cache/cache_obj.c:412
       #7  0x000056544bc8ce8f in HSH_DerefObjCore (wrk=0x7f2705d825d0, ocp=ocp@entry=0x7f2705d82360, rushmax=rushmax@entry=0) at cache/cache_hash.c:1059
       #8  0x000056544bc81530 in exp_expire (now=1691019717.3146894, ep=0x7f2711246280) at cache/cache_expire.c:360
      b696a8f8
    • Nils Goroll's avatar
      Add lru_exponent parameter · fe785e54
      Nils Goroll authored
      fe785e54
    • Nils Goroll's avatar
      Multi-LRU infrastructure · 4ed3cf53
      Nils Goroll authored
      we allow up to 1 << MAX_NLRU_EXPONENT (64) LRUs. When objects
      are created, they get hashed onto LRUs. LRUs never die but
      during shutdown.
      
      Consequently, the number of LRUs can be tuned at run time.
      4ed3cf53
    • Nils Goroll's avatar
      Move fibonacci hash · a3b549d8
      Nils Goroll authored
      a3b549d8
    • Nils Goroll's avatar
      7.3/master compat · 5b11a83a
      Nils Goroll authored
      5b11a83a
    • Nils Goroll's avatar
      6a1aae7e
    • Nils Goroll's avatar
      Optimize FCO LRU eviction · f18a34eb
      Nils Goroll authored
      f18a34eb
    • Nils Goroll's avatar
      d75053a1
  2. 03 Aug, 2023 3 commits
  3. 02 Aug, 2023 2 commits
  4. 31 Jul, 2023 12 commits
    • Nils Goroll's avatar
      Add a hall of fame · f3f60c0e
      Nils Goroll authored
      f3f60c0e
    • Nils Goroll's avatar
      Unfortunate Flexelinting · c113a423
      Nils Goroll authored
      I have tried hard to make value tracking understand the code, but to
      no avail. It seems, for example
      
      	assert(n <= 56)
      
      and later
      	assert(n > 0)
      
      will just lead to flexelint knowning 1? but not 56 as the limit.
      c113a423
    • Nils Goroll's avatar
      9572fc70
    • Nils Goroll's avatar
      Flexelinting: Avoid temporary out of bounds pointer · 0c6399b7
      Nils Goroll authored
      it was never accessed, but triggered flexelint
      0c6399b7
    • Nils Goroll's avatar
      Correct stack regionlist size used during fellow_log_entries_prep() · f62133cc
      Nils Goroll authored
      First and foremost, fellow_log_prep_max_regions was defined wrong:
      
      Except in fellow_cache_test, we call log submission with a maximum of
      FELLOW_DISK_LOG_BLOCK_ENTRIES = 56 DLEs. The intention of the
      fellow_log_prep_max_regions was was to allocate space to track return
      of the maximum number of regions possibly contained. The exact maximum
      would be (FELLOW_DISK_LOG_BLOCK_ENTRIES - 1) * DLE_REG_NREGION + 1 =
      (55 * 4) + 1 = 221, which is higher than FELLOW_DISK_LOG_BLOCK_ENTRIES
      * DLE_BAN_REG_NREGION = 56 * 3 = 168.
      
      Yet it seems prudent to not reply on any fixed maximum, and also our
      test cases call for a higher value, so we now define the maximum three
      times the actually used value, and also ensure that we batch the code
      to this size.
      
      In addition, one assertion in fellow_log_entries_prep() was wrong (it
      compared a number of DLEs with a number of regions).
      
      We also tighten some assertions to help future analysis of possible
      issues in this area:
      
      - Ensure that the data path via fellow_log_entries_prep() only ever
        uses a region list on the stack.
      
      - By using the regionlist_onlystk_add() macro, ensure that we hit an
        assertion on the array on stack, rather than one on the regionlist
        pointer.
      
      Diff best viewed with -b
      
      Fixes #18
      f62133cc
    • Nils Goroll's avatar
      Rename for clarity · d5e53eac
      Nils Goroll authored
      Related to #18
      d5e53eac
    • Nils Goroll's avatar
      Tighten DLE array sizing · 3bdbeede
      Nils Goroll authored
      We should do this right and not over-allocate, this is just confusing.
      3bdbeede
    • Nils Goroll's avatar
      Add miniobj check · 99ebbda3
      Nils Goroll authored
      Motivated by #18, but does not fix the root cause yet
      
      For the call path in the bug ticket, the stack regionlist is supposed
      to be big enough and the root cause is that it is not. But at any
      rate, for that call path, the regionlist is OK to be NULL and
      regionlist_add() should never be called.
      
      If, however, it _is_ called, the regionlist can't be NULL.
      99ebbda3
    • Nils Goroll's avatar
      403819a3
    • Nils Goroll's avatar
      Call try_flags() even when there are no flags to try · 44929e67
      Nils Goroll authored
      Avoids:
      
      fellow_io_uring.c:234:1: error: ‘try_flag’ defined but not used [-Werror=unused-function]
        234 | try_flag(unsigned flag)
            | ^~~~~~~~
      44929e67
    • Nils Goroll's avatar
      Batch LRU changes · ea397ae7
      Nils Goroll authored
      the lru_mtx is our most contended mtx.
      
      As a first improvement, batch changes to LRU for multiple segments
      and maintain the effective change locally outside the lru mtx (but
      while holding the obj mtx).
      ea397ae7
    • Nils Goroll's avatar
      Minor refactor · e73e97da
      Nils Goroll authored
      e73e97da
  5. 24 Jul, 2023 10 commits
    • Nils Goroll's avatar
      436762fc
    • Nils Goroll's avatar
      make fellow_io_fini() idempotent · 8eea0d01
      Nils Goroll authored
      during error paths, we might call it multiple times
      8eea0d01
    • Nils Goroll's avatar
      Use io_uring_free_probe() · dd3ae6c0
      Nils Goroll authored
      dd3ae6c0
    • Nils Goroll's avatar
      Changelog TLC · 550084c5
      Nils Goroll authored
      550084c5
    • Nils Goroll's avatar
      LRU-Touch objcts for OA_VARY · e0912eaf
      Nils Goroll authored
      varnish-cache does not touch objects for OA_VARY, but we need
      to keep FCOs in memory which are frequently used during lookup.
      
      Thoughts on why this should not race LRU:
      
      - lru_list is owned by lru_mtx
      - object can't go away, because
        - for call from hash, we hold the oh->mtx
        - otherwise, we hold a ref
      e0912eaf
    • Nils Goroll's avatar
      Prioritize object memory allocation for OA_VARY · b43164ff
      Nils Goroll authored
      ... which happens potentially under the cache lock
      b43164ff
    • Nils Goroll's avatar
      New region alloc · a40c34c0
      Nils Goroll authored
      upfront: This is not the segment allocation, which uses parts of the busy
      obj region allocation, and is mostly motivated by how much data we need
      to have in RAM at minimum.
      
      For the region allocation, we have conflicting goals:
      
      - To keep the log short, we want to use the least number of regions
      - To reduce fragmentation, we want to use the largest possible
        allocations
      - To use space efficiently, we want to split regions into power of
        two allocations.
      
      Also, for chunked encoding, we do not have an upper limit of
      how much space we are going to need, so we have to use the
      estimate provided by fellow_busy_obj_getspace(). It can not
      guess more than objsize_max.
      
      The new region alloc algorithm takes this compromise:
      
      - For the base case that we ran out of available regions (220), we
        allocate all we need without cramming.
      - Otherwise if we need less than a chunk, we request it
      - Otherwise if we know the size, we round down to a power of two
      - Otherwise we round up
      
      We then allow any cramming down to the chunk size, because that
      is what our LRU reservation uses.
      a40c34c0
    • Nils Goroll's avatar
      Refactor size estimate · c08bab8e
      Nils Goroll authored
      c08bab8e
    • Nils Goroll's avatar
      Add objsize_max · e9c77473
      Nils Goroll authored
      e9c77473
    • Nils Goroll's avatar
      Refactor region reserve · 93578934
      Nils Goroll authored
      93578934