doc: Add a details on sizing

Closes #46

doc: Add a details on sizing
Closes #46
db9aacca · Nils Goroll · eb0d61cf · db9aacca · db9aacca
Unverified Commit db9aacca authored Feb 06, 2024 by Nils Goroll
Hide whitespace changes
Inline Side-by-side

Showing with 172 additions and 0 deletions

vmod_slash.man.rst src/vmod_slash.man.rst +86 -0

vmod_slash.vcc src/vmod_slash.vcc +86 -0

No files found.
--- a/src/vmod_slash.man.rst
+++ b/src/vmod_slash.man.rst
@@ -461,6 +461,11 @@ a global fellow storage. *Note* that this this kind of dynamic storage
 removal is a new feature first introduced with `fellow` and might not
 work perfectly yet.

+When it comes to cache sizes, a "too big" generally does not exist -
+more cache is always better, but `fellow` only supports a memory cache
+of size up to that of the disk cache. For more information, see
+`slash_fellow_size`_.
+
 On Linux, the memory cache will be allocated from huge pages, if
 available and if *memsize* is larger than a huge page. *memsize* will
 then be rounded up to a multiple of the respective huge page size.
@@ -486,6 +491,87 @@ error with sizing requirements.

 *delete* specifies if the storage is to be emptied.

+.. _slash_fellow_size:
+
+Sizing fellow storage
+~~~~~~~~~~~~~~~~~~~~~
+
+This section is intended to provide guidance on cache sizing by
+explaining the overall cache organization and ballpark figures for
+object sizes.
+
+A simple, yet fundamental insight is that, with `fellow`, there is no
+such thing as "delivering objects directly from disk". While hardware
+architectures exist which allow DMA directly from flash storage,
+`fellow` implements a "disk" and "memory" tier, with all reads and
+writes going through RAM first. This architecture has been shown to be
+most efficient both in terms of performance and price/performance, but
+it establishes a fundamental principle for sizing: The memory cache
+should be big enough to hold all actively/frequently accessed
+data. Writes happen to memory, and need to be written to disk before
+the memory can be re-used. Reads go into memory, from where data can
+be accessed.
+
+Besides the always consistent, eventually persistent log, the central
+disk structure is the ``fellow_disk_obj``. It contains the fixed and
+variable object attributes defined by Varnish-Cache (most importantly
+headers) and pointers to the first body segments. For efficiency (log,
+memory) this structure is addressed by a single 64bit value. Because
+`fellow` uses a minimum disk block size of 4KB, the object can have
+sizes between 4KB and just under 16MB. Under optimal circumstances, a
+``fellow_disk_obj`` takes only 4KB, but needs to grow bigger if longer
+headers or vary specifications need to be stored.
+
+When read into memory, a companion data structure named
+``fellow_cache_obj`` is created. Under ideal circumstances (small
+headers), both data structures are made fit into a single 4KB
+allocation or even less, but as a rule of thumb, the amount of memory
+needed per actively accessed object should be assumed to be 4KB plus
+the size of the headers and vary specification. Both
+``fellow_disk_obj`` and ``fellow_cache_obj`` remain in memory for as
+long as any part of the object is accessed.
+
+The object body is organized in chunks of 2^\ *chunk_exponent* bytes,
+called segments. Segments are the smallest I/O units of object bodies
+and are lru-cached individually, allowing `fellow` to handle objects
+bigger than *memsize*: When an object body is iterated over, up to
+*readahead* segments are referenced and, if necessary, asynchronously
+read into cache in advance. Segments outside the readahead window,
+which are not concurrently accessed by other threads, either reside in
+memory on the LRU or only on disk. The amount of disk and memory
+storage in addition to the actual data amounts to roughly 64 bytes per
+segment on disk and another 64 bytes per segment in memory, organized
+in larger units called segment lists, which are sized between 4KB for
+63 segments and 4MB for 65534 segments. Segment lists are read
+asynchronously and LRU'd together with the respective
+``fellow_cache_obj``.
+
+Consequently, the *chunk_bytes* / *chunk_exponent* parameter is chosen
+such that a typical object needs only a small number of chunks, which
+requires an appropriately sized memory cache: To ensure that the cache
+can always move data, the parameter is hard capped at 1/1024 of the
+memory cache size, so, for example, for 1MB chunks, a memory cache of
+at least 1GB is needed.
+
+Extended attributes (currently only used for ESI data) use a separate
+segment, which is only read on demand and also LRU'd with the
+respective object.
+
+"Busy" objects going into cache while being fetched from a backend
+have the same memory requirements as "finished" objects, but need
+another 8KB of memory on top while being created.
+
+To achieve high efficiency and to support Direct I/O, the buddy
+allocator used to organize both the disk and memory cache only ever
+makes allocations at multiples of the requested size, rounded up to
+the next power of two. For this reason, it is normal for
+:ref:`slashmap(1)` to show substantial amounts of free memory (like
+30-40%) in smaller page sizes below 4KB even if LRU is active.
+
+To summarize, one should assume for memory sizing at least the amount
+of data actively accessed, plus 4KB per object, plus 8KB per "busy"
+object.
+
 .. _slash_fellow_resize:

 Resizing fellow storage

--- a/src/vmod_slash.vcc
+++ b/src/vmod_slash.vcc
@@ -405,6 +405,11 @@ a global fellow storage. *Note* that this this kind of dynamic storage
 removal is a new feature first introduced with `fellow` and might not
 work perfectly yet.

+When it comes to cache sizes, a "too big" generally does not exist -
+more cache is always better, but `fellow` only supports a memory cache
+of size up to that of the disk cache. For more information, see
+`slash_fellow_size`_.
+
 On Linux, the memory cache will be allocated from huge pages, if
 available and if *memsize* is larger than a huge page. *memsize* will
 then be rounded up to a multiple of the respective huge page size.
@@ -430,6 +435,87 @@ error with sizing requirements.

 *delete* specifies if the storage is to be emptied.

+.. _slash_fellow_size:
+
+Sizing fellow storage
+~~~~~~~~~~~~~~~~~~~~~
+
+This section is intended to provide guidance on cache sizing by
+explaining the overall cache organization and ballpark figures for
+object sizes.
+
+A simple, yet fundamental insight is that, with `fellow`, there is no
+such thing as "delivering objects directly from disk". While hardware
+architectures exist which allow DMA directly from flash storage,
+`fellow` implements a "disk" and "memory" tier, with all reads and
+writes going through RAM first. This architecture has been shown to be
+most efficient both in terms of performance and price/performance, but
+it establishes a fundamental principle for sizing: The memory cache
+should be big enough to hold all actively/frequently accessed
+data. Writes happen to memory, and need to be written to disk before
+the memory can be re-used. Reads go into memory, from where data can
+be accessed.
+
+Besides the always consistent, eventually persistent log, the central
+disk structure is the ``fellow_disk_obj``. It contains the fixed and
+variable object attributes defined by Varnish-Cache (most importantly
+headers) and pointers to the first body segments. For efficiency (log,
+memory) this structure is addressed by a single 64bit value. Because
+`fellow` uses a minimum disk block size of 4KB, the object can have
+sizes between 4KB and just under 16MB. Under optimal circumstances, a
+``fellow_disk_obj`` takes only 4KB, but needs to grow bigger if longer
+headers or vary specifications need to be stored.
+
+When read into memory, a companion data structure named
+``fellow_cache_obj`` is created. Under ideal circumstances (small
+headers), both data structures are made fit into a single 4KB
+allocation or even less, but as a rule of thumb, the amount of memory
+needed per actively accessed object should be assumed to be 4KB plus
+the size of the headers and vary specification. Both
+``fellow_disk_obj`` and ``fellow_cache_obj`` remain in memory for as
+long as any part of the object is accessed.
+
+The object body is organized in chunks of 2^\ *chunk_exponent* bytes,
+called segments. Segments are the smallest I/O units of object bodies
+and are lru-cached individually, allowing `fellow` to handle objects
+bigger than *memsize*: When an object body is iterated over, up to
+*readahead* segments are referenced and, if necessary, asynchronously
+read into cache in advance. Segments outside the readahead window,
+which are not concurrently accessed by other threads, either reside in
+memory on the LRU or only on disk. The amount of disk and memory
+storage in addition to the actual data amounts to roughly 64 bytes per
+segment on disk and another 64 bytes per segment in memory, organized
+in larger units called segment lists, which are sized between 4KB for
+63 segments and 4MB for 65534 segments. Segment lists are read
+asynchronously and LRU'd together with the respective
+``fellow_cache_obj``.
+
+Consequently, the *chunk_bytes* / *chunk_exponent* parameter is chosen
+such that a typical object needs only a small number of chunks, which
+requires an appropriately sized memory cache: To ensure that the cache
+can always move data, the parameter is hard capped at 1/1024 of the
+memory cache size, so, for example, for 1MB chunks, a memory cache of
+at least 1GB is needed.
+
+Extended attributes (currently only used for ESI data) use a separate
+segment, which is only read on demand and also LRU'd with the
+respective object.
+
+"Busy" objects going into cache while being fetched from a backend
+have the same memory requirements as "finished" objects, but need
+another 8KB of memory on top while being created.
+
+To achieve high efficiency and to support Direct I/O, the buddy
+allocator used to organize both the disk and memory cache only ever
+makes allocations at multiples of the requested size, rounded up to
+the next power of two. For this reason, it is normal for
+:ref:`slashmap(1)` to show substantial amounts of free memory (like
+30-40%) in smaller page sizes below 4KB even if LRU is active.
+
+To summarize, one should assume for memory sizing at least the amount
+of data actively accessed, plus 4KB per object, plus 8KB per "busy"
+object.
+
 .. _slash_fellow_resize:

 Resizing fellow storage