doc: Add a details on sizing

Closes #46
parent eb0d61cf
......@@ -461,6 +461,11 @@ a global fellow storage. *Note* that this this kind of dynamic storage
removal is a new feature first introduced with `fellow` and might not
work perfectly yet.
When it comes to cache sizes, a "too big" generally does not exist -
more cache is always better, but `fellow` only supports a memory cache
of size up to that of the disk cache. For more information, see
`slash_fellow_size`_.
On Linux, the memory cache will be allocated from huge pages, if
available and if *memsize* is larger than a huge page. *memsize* will
then be rounded up to a multiple of the respective huge page size.
......@@ -486,6 +491,87 @@ error with sizing requirements.
*delete* specifies if the storage is to be emptied.
.. _slash_fellow_size:
Sizing fellow storage
~~~~~~~~~~~~~~~~~~~~~
This section is intended to provide guidance on cache sizing by
explaining the overall cache organization and ballpark figures for
object sizes.
A simple, yet fundamental insight is that, with `fellow`, there is no
such thing as "delivering objects directly from disk". While hardware
architectures exist which allow DMA directly from flash storage,
`fellow` implements a "disk" and "memory" tier, with all reads and
writes going through RAM first. This architecture has been shown to be
most efficient both in terms of performance and price/performance, but
it establishes a fundamental principle for sizing: The memory cache
should be big enough to hold all actively/frequently accessed
data. Writes happen to memory, and need to be written to disk before
the memory can be re-used. Reads go into memory, from where data can
be accessed.
Besides the always consistent, eventually persistent log, the central
disk structure is the ``fellow_disk_obj``. It contains the fixed and
variable object attributes defined by Varnish-Cache (most importantly
headers) and pointers to the first body segments. For efficiency (log,
memory) this structure is addressed by a single 64bit value. Because
`fellow` uses a minimum disk block size of 4KB, the object can have
sizes between 4KB and just under 16MB. Under optimal circumstances, a
``fellow_disk_obj`` takes only 4KB, but needs to grow bigger if longer
headers or vary specifications need to be stored.
When read into memory, a companion data structure named
``fellow_cache_obj`` is created. Under ideal circumstances (small
headers), both data structures are made fit into a single 4KB
allocation or even less, but as a rule of thumb, the amount of memory
needed per actively accessed object should be assumed to be 4KB plus
the size of the headers and vary specification. Both
``fellow_disk_obj`` and ``fellow_cache_obj`` remain in memory for as
long as any part of the object is accessed.
The object body is organized in chunks of 2^\ *chunk_exponent* bytes,
called segments. Segments are the smallest I/O units of object bodies
and are lru-cached individually, allowing `fellow` to handle objects
bigger than *memsize*: When an object body is iterated over, up to
*readahead* segments are referenced and, if necessary, asynchronously
read into cache in advance. Segments outside the readahead window,
which are not concurrently accessed by other threads, either reside in
memory on the LRU or only on disk. The amount of disk and memory
storage in addition to the actual data amounts to roughly 64 bytes per
segment on disk and another 64 bytes per segment in memory, organized
in larger units called segment lists, which are sized between 4KB for
63 segments and 4MB for 65534 segments. Segment lists are read
asynchronously and LRU'd together with the respective
``fellow_cache_obj``.
Consequently, the *chunk_bytes* / *chunk_exponent* parameter is chosen
such that a typical object needs only a small number of chunks, which
requires an appropriately sized memory cache: To ensure that the cache
can always move data, the parameter is hard capped at 1/1024 of the
memory cache size, so, for example, for 1MB chunks, a memory cache of
at least 1GB is needed.
Extended attributes (currently only used for ESI data) use a separate
segment, which is only read on demand and also LRU'd with the
respective object.
"Busy" objects going into cache while being fetched from a backend
have the same memory requirements as "finished" objects, but need
another 8KB of memory on top while being created.
To achieve high efficiency and to support Direct I/O, the buddy
allocator used to organize both the disk and memory cache only ever
makes allocations at multiples of the requested size, rounded up to
the next power of two. For this reason, it is normal for
:ref:`slashmap(1)` to show substantial amounts of free memory (like
30-40%) in smaller page sizes below 4KB even if LRU is active.
To summarize, one should assume for memory sizing at least the amount
of data actively accessed, plus 4KB per object, plus 8KB per "busy"
object.
.. _slash_fellow_resize:
Resizing fellow storage
......
......@@ -405,6 +405,11 @@ a global fellow storage. *Note* that this this kind of dynamic storage
removal is a new feature first introduced with `fellow` and might not
work perfectly yet.
When it comes to cache sizes, a "too big" generally does not exist -
more cache is always better, but `fellow` only supports a memory cache
of size up to that of the disk cache. For more information, see
`slash_fellow_size`_.
On Linux, the memory cache will be allocated from huge pages, if
available and if *memsize* is larger than a huge page. *memsize* will
then be rounded up to a multiple of the respective huge page size.
......@@ -430,6 +435,87 @@ error with sizing requirements.
*delete* specifies if the storage is to be emptied.
.. _slash_fellow_size:
Sizing fellow storage
~~~~~~~~~~~~~~~~~~~~~
This section is intended to provide guidance on cache sizing by
explaining the overall cache organization and ballpark figures for
object sizes.
A simple, yet fundamental insight is that, with `fellow`, there is no
such thing as "delivering objects directly from disk". While hardware
architectures exist which allow DMA directly from flash storage,
`fellow` implements a "disk" and "memory" tier, with all reads and
writes going through RAM first. This architecture has been shown to be
most efficient both in terms of performance and price/performance, but
it establishes a fundamental principle for sizing: The memory cache
should be big enough to hold all actively/frequently accessed
data. Writes happen to memory, and need to be written to disk before
the memory can be re-used. Reads go into memory, from where data can
be accessed.
Besides the always consistent, eventually persistent log, the central
disk structure is the ``fellow_disk_obj``. It contains the fixed and
variable object attributes defined by Varnish-Cache (most importantly
headers) and pointers to the first body segments. For efficiency (log,
memory) this structure is addressed by a single 64bit value. Because
`fellow` uses a minimum disk block size of 4KB, the object can have
sizes between 4KB and just under 16MB. Under optimal circumstances, a
``fellow_disk_obj`` takes only 4KB, but needs to grow bigger if longer
headers or vary specifications need to be stored.
When read into memory, a companion data structure named
``fellow_cache_obj`` is created. Under ideal circumstances (small
headers), both data structures are made fit into a single 4KB
allocation or even less, but as a rule of thumb, the amount of memory
needed per actively accessed object should be assumed to be 4KB plus
the size of the headers and vary specification. Both
``fellow_disk_obj`` and ``fellow_cache_obj`` remain in memory for as
long as any part of the object is accessed.
The object body is organized in chunks of 2^\ *chunk_exponent* bytes,
called segments. Segments are the smallest I/O units of object bodies
and are lru-cached individually, allowing `fellow` to handle objects
bigger than *memsize*: When an object body is iterated over, up to
*readahead* segments are referenced and, if necessary, asynchronously
read into cache in advance. Segments outside the readahead window,
which are not concurrently accessed by other threads, either reside in
memory on the LRU or only on disk. The amount of disk and memory
storage in addition to the actual data amounts to roughly 64 bytes per
segment on disk and another 64 bytes per segment in memory, organized
in larger units called segment lists, which are sized between 4KB for
63 segments and 4MB for 65534 segments. Segment lists are read
asynchronously and LRU'd together with the respective
``fellow_cache_obj``.
Consequently, the *chunk_bytes* / *chunk_exponent* parameter is chosen
such that a typical object needs only a small number of chunks, which
requires an appropriately sized memory cache: To ensure that the cache
can always move data, the parameter is hard capped at 1/1024 of the
memory cache size, so, for example, for 1MB chunks, a memory cache of
at least 1GB is needed.
Extended attributes (currently only used for ESI data) use a separate
segment, which is only read on demand and also LRU'd with the
respective object.
"Busy" objects going into cache while being fetched from a backend
have the same memory requirements as "finished" objects, but need
another 8KB of memory on top while being created.
To achieve high efficiency and to support Direct I/O, the buddy
allocator used to organize both the disk and memory cache only ever
makes allocations at multiples of the requested size, rounded up to
the next power of two. For this reason, it is normal for
:ref:`slashmap(1)` to show substantial amounts of free memory (like
30-40%) in smaller page sizes below 4KB even if LRU is active.
To summarize, one should assume for memory sizing at least the amount
of data actively accessed, plus 4KB per object, plus 8KB per "busy"
object.
.. _slash_fellow_resize:
Resizing fellow storage
......
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment