fellow_cache: Rewrite body layout planning

I believe the actual issue reported in #22 is that, with the old disk region
allocation scheme, we could need one more disk segment list, such that
FCO_REGIONS_RESERVE was too small.

But pondering this issue, I went back to square one and re-thought the
allocation plan. I now think that there were some fundamental flaws in the
previous allocation code:

- we did not plan for how many segment lists we would actually need
- we would cram the segment allocation, which could lead to the number of
  segment lists growing
- for growing allocations, we would switch from "assume we have enough regions"
  to "assume we have no regions at all any more" when FCO_REGIONS_RESERVE was
  reached.

Hopefully, this new allocation plan code now takes a more sensible, holistic
approach:

Whenever we need more disk space (= another disk region), we calculate how many
regions we are sensibly going to need in total. For no cram, this is just one,
and for abs(cram) >=1 it is going to be the number of one-bits (popcount) of the
size. Then we calculate the chunk size which we need to go to in order to fit
all segments into segment lists. Based on this outcome, we calculate a maximum
cam which we can allow for region allocations.

This approach is fundamentally different to before in that we no longer cram
segment sizes - which was wrong, because we do not have an infinite number of
segment lists.

Fixes #22 for real now, I hope
parent 9208117e
......@@ -143,6 +143,7 @@ AM_VTC_LOG_FLAGS = \
-p vmod_path="$(abs_builddir)/.libs:$(vmoddir):$(VARNISHAPI_VMODDIR)"
TESTS = \
fellow_cache_test_ndebug \
fellow_cache_test \
buddy_test \
buddy_test_witness \
......@@ -340,11 +341,17 @@ fellow_cache_test_CFLAGS = $(fellow_log_test_ndebug_CFLAGS) \
-DDEBUG -DBUDDY_WITNESS
fellow_cache_test_SOURCES = $(standalone_aux) fellow_cache.c
fellow_cache_test_ndebug_LDFLAGS = $(fellow_log_test_ndebug_LDFLAGS)
fellow_cache_test_ndebug_LDADD = libfellow.la
fellow_cache_test_ndebug_CFLAGS = $(fellow_log_test_ndebug_CFLAGS)
fellow_cache_test_ndebug_SOURCES = $(standalone_aux) fellow_cache.c
noinst_PROGRAMS += \
fellow_log_dbg \
fellow_log_test_ndebug \
fellow_log_test \
fellow_cache_test
fellow_cache_test \
fellow_cache_test_ndebug
check-local:
@mkdir -p vtc
......
......@@ -27,10 +27,12 @@
#define clzszt(v) __builtin_clzll(v)
#define ffsszt(v) __builtin_ffsll(v)
typedef unsigned long long bitf_word_t;
#define popcount(v) (unsigned)__builtin_popcountll(v)
#elif SIZE_MAX == 0xffffffff
#define clzszt(v) __builtin_clzl(v)
#define ffsszt(v) __builtin_ffsl(v)
typedef unsigned long bitf_word_t;
#define popcount(v) (unsigned)__builtin_popcount(v)
#else
#error unsupported size_t
#endif
......
This diff is collapsed.
......@@ -35,7 +35,8 @@ struct fellow_io_status {
* because the uring_cqe result member is 32bit (max pos value is 2GB-1), we
* limit our max IO (and, thus, disk allocation) size to 1GB
*/
#define FIO_MAX (1<<30)
#define FIO_MAX_BITS 30
#define FIO_MAX (1<<FIO_MAX_BITS)
struct fellow_io_discard {
uint64_t offset, len;
......
......@@ -122,7 +122,7 @@
// symbol not ref (constructor)
-esym(528, init_*, assert_fcos*)
-esym(755, BWIT_*)
-esym(755, fc_inj_*, FC_INJ_SZLIM_SET) // not referenced
-esym(755, fc_inj_*, FC_INJ_SZLIM*) // not referenced
-emacro(827, FC_WRONG) // Loop not reachable
// fellow_io_uring.c
......
......@@ -549,12 +549,14 @@ in larger units called segment lists, which are sized between 4KB for
asynchronously and LRU'd together with the respective
``fellow_cache_obj``.
Consequently, the *chunk_bytes* / *chunk_exponent* parameter is chosen
such that a typical object needs only a small number of chunks, which
requires an appropriately sized memory cache: To ensure that the cache
can always move data, the parameter is hard capped at 1/1024 of the
memory cache size, so, for example, for 1MB chunks, a memory cache of
at least 1GB is needed.
Consequently, the *chunk_bytes* / *chunk_exponent* parameter is chosen such that
a typical object needs only a small number of chunks, which requires an
appropriately sized memory cache: To ensure that the cache can always move data,
the parameter is capped at 1/1024 of the memory cache size, so, for example, for
1MB chunks, a memory cache of at least 1GB is needed.
Internally, the *chunk_bytes* / *chunk_exponent* parameter can be higher if an
object otherwise consistet of too many segments.
Extended attributes (currently only used for ESI data) use a separate
segment, which is only read on demand and also LRU'd with the
......@@ -745,7 +747,7 @@ fellow storage can be fine tuned:
the next power of two and used as if *chunk_exponent* was used with
the 2-logarithm of that value.
*chunk_bytes* / *chunk_exponent* are hard capped to less than 1/1024
*chunk_bytes* / *chunk_exponent* are capped to less than 1/1024
of the memory cache size.
Using both arguments at the same time triggers a VCL error.
......
......@@ -493,12 +493,14 @@ in larger units called segment lists, which are sized between 4KB for
asynchronously and LRU'd together with the respective
``fellow_cache_obj``.
Consequently, the *chunk_bytes* / *chunk_exponent* parameter is chosen
such that a typical object needs only a small number of chunks, which
requires an appropriately sized memory cache: To ensure that the cache
can always move data, the parameter is hard capped at 1/1024 of the
memory cache size, so, for example, for 1MB chunks, a memory cache of
at least 1GB is needed.
Consequently, the *chunk_bytes* / *chunk_exponent* parameter is chosen such that
a typical object needs only a small number of chunks, which requires an
appropriately sized memory cache: To ensure that the cache can always move data,
the parameter is capped at 1/1024 of the memory cache size, so, for example, for
1MB chunks, a memory cache of at least 1GB is needed.
Internally, the *chunk_bytes* / *chunk_exponent* parameter can be higher if an
object otherwise consistet of too many segments.
Extended attributes (currently only used for ESI data) use a separate
segment, which is only read on demand and also LRU'd with the
......@@ -682,7 +684,7 @@ fellow storage can be fine tuned:
the next power of two and used as if *chunk_exponent* was used with
the 2-logarithm of that value.
*chunk_bytes* / *chunk_exponent* are hard capped to less than 1/1024
*chunk_bytes* / *chunk_exponent* are capped to less than 1/1024
of the memory cache size.
Using both arguments at the same time triggers a VCL error.
......
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment