Add back man page

it got lost in 2b239f08

Add back man page
it got lost in 2b239f08
c9e9204f · Nils Goroll · 270c0905 · c9e9204f
Unverified Commit c9e9204f authored Jul 06, 2024 by Nils Goroll
Hide whitespace changes
Inline Side-by-side

Showing with 1497 additions and 0 deletions

vmod_slash.man.rst src/vmod_slash.man.rst +1497 -0

No files found.
--- a/src/vmod_slash.man.rst
+++ b/src/vmod_slash.man.rst
+..
+.. NB:  This file is machine generated, DO NOT EDIT!
+..
+.. Edit ./vmod_slash.vcc and run make instead
+..
+
+.. role:: ref(emphasis)
+
+==========
+vmod_slash
+==========
+
+---------------------------------------------------------------------------------
+Varnish-Cache SLASH/ stevedores (buddy, fellow) and loadmasters (storage routers)
+---------------------------------------------------------------------------------
+
+:Manual section: 3
+
+SYNOPSIS
+========
+
+global storages
+---------------
+
+* Make ``vmod_slash`` available::
+
+	varnishd -E /path/to/vmod_slash.so
+
+* Configure a buddy (memory) storage::
+
+	varnishd -s<name>=buddy,<size>[,<minpage>]
+
+* Configure a fellow (persistent disk with memory cache) storage::
+
+	varnishd -s<name>=fellow,<path>,<dsksize>,<memsize>[=<storage>],<objsize_hint>
+
+vcl storage objects and methods
+-------------------------------
+
+.. parsed-literal::
+
+  import slash [as name] [from "path"]
+
+  new xbuddy = slash.buddy(BYTES size, BYTES minpage)
+
+      STRING xbuddy.tune([INT chunk_exponent], [BYTES chunk_bytes], [INT reserve_chunks], [INT cram])
+
+      STEVEDORE xbuddy.storage()
+
+      VOID xbuddy.as_transient()
+
+  new xfellow = slash.fellow(STRING filename, BYTES dsksize, BYTES memsize, BYTES objsize_hint, BOOL delete)
+
+      STRING xfellow.tune([INT logbuffer_size], [DURATION logbuffer_flush_interval], [REAL log_rewrite_ratio], [INT chunk_exponent], [BYTES chunk_bytes], [INT dsk_reserve_chunks], [INT mem_reserve_chunks], [BYTES objsize_max], [ INT objsize_lw_exponent ], [ INT objsize_hw_exponent ], [INT cram], [INT readahead], [BYTES discard_immediate], [INT io_batch_min], [INT io_batch_max], [ENUM hash_obj], [ENUM hash_log], [ENUM ioerr_obj], [ENUM ioerr_log], [ENUM allocerr_obj], [ENUM allocerr_log])
+
+      STEVEDORE xfellow.storage()
+
+      VOID xfellow.as_transient()
+
+vcl functions
+-------------
+
+.. parsed-literal::
+
+  import slash [as name] [from "path"]
+
+  VOID as_transient(STEVEDORE)
+
+  STRING tune_buddy(STEVEDORE storage, [INT chunk_exponent], [BYTES chunk_bytes], [INT reserve_chunks], [INT cram])
+
+  STRING tune_fellow(STEVEDORE storage, [INT logbuffer_size], [DURATION logbuffer_flush_interval], [REAL log_rewrite_ratio], [INT chunk_exponent], [BYTES chunk_bytes], [INT dsk_reserve_chunks], [INT mem_reserve_chunks], [BYTES objsize_max], [ INT objsize_lw_exponent ], [ INT objsize_hw_exponent ], [INT cram], [INT readahead], [BYTES discard_immediate], [INT io_batch_min], [INT io_batch_max], [ENUM hash_obj], [ENUM hash_log], [ENUM ioerr_obj], [ENUM ioerr_log], [ENUM allocerr_obj], [ENUM allocerr_log])
+
+vcl loadmasters (storage routers)
+---------------------------------
+
+.. parsed-literal::
+
+  import slash [as name] [from "path"]
+
+  new xloadmaster_rr = slash.loadmaster_rr()
+
+      VOID xloadmaster_rr.add_storage(STEVEDORE)
+
+      STEVEDORE xloadmaster_rr.storage()
+
+EXAMPLES
+========
+
+* Configure a global buddy (memory only) storage of 1 GB named ``mem``::
+
+    varnishd -E /path/to/libvmod_slash.so \
+      -s mem=buddy,1g
+
+  Use this storage with VCL code like this::
+
+    sub vcl_backend_response {
+	set beresp.storage = storage.mem;
+    }
+
+    sub vcl_backend_error {
+	set beresp.storage = storage.mem;
+    }
+
+    # ... more of your own VCL code
+
+* Configure two global fellow (persistent, disk-backed) storages,
+
+  * one named ``fast`` of 1TB on a raw device
+    ``/dev/mapper/ssd-volume`` using 100GB memory cache with an
+    expected object size of 10MB, and
+
+  * one named ``slow`` of 10TB on a file ``/hugefs/varnish-storage``,
+    which shares the memory cache with the ``fast`` storage and also
+    has the same expected object size::
+
+     varnishd -E /path/to/libvmod_slash.so \
+       -s fast=fellow,/dev/mapper/ssd-volume,1TB,100GB,10MB \
+       -s slow=fellow,/hugefs/varnish-storage,10TB,100GB=fast,10MB
+
+  Use these storages with VCL code, where responses to requests on
+  paths beginning with ``/archive/`` go to the ``slow`` storage::
+
+    sub vcl_backend_response {
+	if (bereq.url ~ "^/archive/") {
+	    set beresp.storage = storage.slow;
+	}
+	else {
+	    set beresp.storage = storage.fast;
+	}
+    }
+
+* Configure a round-robin storage router in VCL::
+
+    # assumes that storages A .. C have been defined globaly
+    sub vcl_init {
+	new storageX = slash.loadmaster_rr();
+	storageX.add_storage(storage.A);
+	storageX.add_storage(storage.B);
+	storageX.add_storage(storage.C);
+    }
+
+  and use it::
+
+    sub vcl_backend_response {
+	set beresp.storage = rr.storage();
+    }
+
+DESCRIPTION
+===========
+
+.. _buddy_memory_allocator: https://en.wikipedia.org/wiki/Buddy_memory_allocation
+
+.. _README.rst: https://code.uplex.de/uplex-varnish/slash/blob/master/README.rst
+.. _INSTALL.rst: https://code.uplex.de/uplex-varnish/slash/blob/master/INSTALL.rst
+
+This module can be used both as a varnish extension (VEXT) and a
+VCL module (VMOD).
+
+It provides the two storage engines `buddy` and `fellow`, which can be
+configured at ``varnishd`` startup and, with limitations, from VCL.
+
+The `buddy` storage engine is an advanced, high performance stevedore
+with a fixed memory size based on a new `buddy_memory_allocator`_
+implementation from first principles.
+
+The `fellow` storage engine is an advanced, high performance, eventually
+persistent, always consistent implementation based on the same
+allocator as the buddy storage engine.
+
+See `README.rst`_ for more details.
+
+Installation instructions can be found in `INSTALL.rst`_.
+
+STORAGE VEXT INTERFACES
+=======================
+
+The two storage engines `buddy` and `fellow` should preferably be
+configured globally by loading ``vmod_slash.so`` through the
+``varnishd -E`` option and adding global storages with ``-s`` as shown
+in `SYNOPSIS`_.
+
+buddy
+-----
+
+For `buddy`, the ``-s`` parameter syntax is::
+
+	-s<name>=buddy,<size>[,<minpage>]
+
+with
+
+* *<name>* being a given name for the storage instance, which will
+  become available from vcl as ``storage.``\ *<name>*,
+
+* *<size>* being a size expression like ``100m`` or ``5g`` for the
+  storage size to be configured,
+
+* the optional *<minpage>* argument being a size expression for the
+  minimal allocation unit of the storage instance. See
+  `slash.buddy()`_ for details.
+
+A global `buddy` storage can be tuned from VCL using
+`slash.tune_buddy()`_ with ``storage.``\ *<name>* as the first
+argument.
+
+fellow
+------
+
+For `fellow`, the ``-s`` parameter syntax is::
+
+	-s<name>=fellow,<path>,<dsksize>,<memsize>[=<storage>],<objsize_hint>
+
+with
+
+* *<name>* being a given name for the storage instance, which will
+  become available from vcl as ``storage.``\ *<name>*,
+
+* *<path>* being the path to the storage file or device,
+
+  Permissions and ownership of *path* are changed during startup using
+  the Varnish-Cache `jail`_ facility.
+
+* *<dsksize>* being a size expression like ``100m`` or ``5g`` for
+  the storage size to be configured,
+
+* *<memsize>* being a size expression for the memory cache size to
+  be configured,
+
+* optionally, *<storage>* being the name of a previously defined
+  fellow storage to share the memory cache with, and
+
+* *<objsize_hint>* being a size expression for the expected average
+  object size with which the storage instance is being used.
+
+See `slash.fellow()`_ for additional details.
+
+A global `fellow` storage can be tuned from VCL using
+`slash.tune_fellow()`_ with ``storage.``\ *<name>* as the first
+argument.
+
+Memory Cache Sharing
+~~~~~~~~~~~~~~~~~~~~
+
+When memory cache sharing with the ``<memsize>[=<storage>]`` syntax is
+configured, *<memsize>* is ignored. The actual memory size is always
+that of the referenced storage.
+
+LRU with memory cache sharing is cooperative. Whenever memory is
+needed by any storage, all storages using the shared cache are asked
+to make room. Consequently, more frequently used storages are likely
+to keep more of the shared memory cache.
+
+STORAGE VMOD INTERFACES
+=======================
+
+.. _slash.buddy():
+
+new xbuddy = slash.buddy(BYTES size, BYTES minpage=64)
+------------------------------------------------------
+
+Create or reference a buddy storage of size *size* with the given vmod
+object name.  The storage will remain in existence as long as
+
+- any loaded VCL has an object by that name
+- there are objects using it
+
+The *minpage* argument can be used to define the smallest possible
+allocation unit. The default and lowest possible *minpage* argument is
+64B. The *minpage* argument will be rounded up to the next power of
+two. Larger *minpage* arguments improve efficiency at the cost of
+memory overhead.
+
+The *size* argument will be rounded down to a multiple of the
+(possibly rounded) *minpage* argument.
+
+Besides the configured memory size, approximately 1 / ( *minpage* *
+4) of it is additionally required for metadata (bitmaps) in the
+varnish home directory and in memory. For the default *minpage* of 64
+Bytes, this amounts to approximately 0.4%. The actual figure is output
+at startup as ``buddy: metadata (bitmap) size``.
+
+This storage can *not* be used via ``storage.``\ *<name>*.
+
+If the last vcl using this vmod is discarded before the storage is
+empty, all its memory will remain allocated until a varnish restart.
+
+.. _xbuddy.tune():
+
+STRING xbuddy.tune([INT chunk_exponent], [BYTES chunk_bytes], [INT reserve_chunks], [INT cram])
+-----------------------------------------------------------------------------------------------
+
+::
+
+      STRING xbuddy.tune(
+            [INT chunk_exponent],
+            [BYTES chunk_bytes],
+            [INT reserve_chunks],
+            [INT cram]
+      )
+
+)
+
+Using the `xbuddy.tune()` method, the following parameters of the
+buddy storage can be fine tuned:
+
+* *chunk_exponent* / *chunk_bytes*
+
+  - unit: bytes as a power of two / bytes
+  - default: 20 / 1 MB
+  - minimum:  6 / 64 B
+  - maximum: 28 / 256 MB
+
+  *chunk_bytes* and *chunk_exponent* are alternative ways to configure
+  the chunk size. If *chunk_bytes* is used, the value is rounded up to
+  the next power of two and used as if *chunk_exponent* was used with
+  the 2-logarithm of that value.
+
+  Using both arguments at the same time triggers a VCL error.
+
+  *chunk_exponent* / *chunk_bytes* are very similar to the
+  ``fetch_maxchunksize`` varnishd parameter, but can be configured per
+  storage instance: They specify the maximum contiguous memory region
+  which the storage will return for a single allocation request. The
+  default is the smaller of 1/16 the *size* of the storage and
+  256MB. The smallest possible value is 1/4 the *size* of the storage
+  and rounded down to the previous power of two.
+
+* *reserve_chunks*
+
+  - unit: scalar
+  - default: 1
+  - minimum: 0
+
+  specifies a number of chunks to reserve in memory. The reserve is
+  used to immediately fulfill requests while LRU cache eviction is
+  running: When the cache is full, allocation requests need to wait
+  until LRU eviction has made room, and the reserve can help reduce
+  latencies in these situations at the expense of some memory
+  unavailable for caching.
+
+* *cram*
+
+  - unit: powers of two
+  - default: 1
+  - minimum: -64
+  - maximum: 64
+
+  specifies to which extent the allocator should return regions
+  smaller than requested when it would need to wait for LRU to make
+  room.
+
+  Its unit is powers of two, valid values are -64 to 64, but sensible
+  values are much smaller.
+
+  * cram = 0: Always allocate the requested size
+
+  * cram != 0: Also return abs(*cram*) powers of two less than the
+    roundup of the requested size.
+
+    For example, with a *cram* value of 1 (the default) or -1, for 129
+    to 255 bytes requested, also 128 bytes could be returned.
+
+    For a *cram* value of 2 or -2, also 64 bytes could be returned for
+    129 to 255 bytes requested.
+
+  * For positive *cram* value, page splits are avoided - that is, if a
+    larger memory region would need to be split to fulfill all of the
+    request, but a memory region that is up to *cram* powers of two
+    smaller is available, the smaller memory region is returned.
+
+  * A negative *cram* value means that smaller memory regions are only
+    returned if the request could not be fulfilled otherwise.
+
+  Higher absolute *cram* values generally lead to higher fragmentation
+  in return for less unused space. Higher fragmentation is generally
+  bad for performance.
+
+.. _xbuddy.storage():
+
+STEVEDORE xbuddy.storage()
+--------------------------
+
+Return the the buddy storage. Can be used to set it for storing a
+backend response::
+
+	set beresp.storage = mybuddy.storage();
+
+.. _xbuddy.as_transient():
+
+VOID xbuddy.as_transient()
+--------------------------
+
+Set this buddy storage as the transient storage.
+
+Restricted to: ``vcl_init``.
+
+
+
+.. _slash.tune_buddy():
+
+STRING tune_buddy(STEVEDORE storage, [INT chunk_exponent], [BYTES chunk_bytes], [INT reserve_chunks], [INT cram])
+-----------------------------------------------------------------------------------------------------------------
+
+::
+
+   STRING tune_buddy(
+      STEVEDORE storage,
+      [INT chunk_exponent],
+      [BYTES chunk_bytes],
+      [INT reserve_chunks],
+      [INT cram]
+   )
+
+)
+
+Tune the given globally defined fellow storage, for all other
+parameters see `xbuddy.tune()`.
+
+.. _slash.fellow():
+
+new xfellow = slash.fellow(STRING path, BYTES dsksize, BYTES memsize, BYTES objsize_hint, BOOL delete)
+------------------------------------------------------------------------------------------------------
+
+::
+
+   new xfellow = slash.fellow(
+      STRING path,
+      BYTES dsksize,
+      BYTES memsize,
+      BYTES objsize_hint=262144,
+      BOOL delete=0
+   )
+
+Create or reference a fellow storage on *path* of size *dsksize*
+with a memory cache of size *memsize*. See `slash_fellow_resize`_
+below for information on changing sizes.
+
+A VCL-defined fellow storage can not load persisted objects, so to
+avoid accidentally emptying a storage, either the storage referenced
+by *path* must be empty, or the *delete* argument must be ``true``.
+
+*path* has to be either a regular file, or a block device. If *path*
+does not exist, it is created as a regular file. Checks on *path* are
+conducted in order to not accidentally create or use a file where
+block devices reside (e.g. on ``/dev/``). The environment variable
+``slash_fellow_options`` can be set to contain ``skip-path-check``
+where, for whatever exotic reason, this check needs to be skipped.
+
+.. _jail: https://varnish-cache.org/docs/trunk/reference/varnishd.html#jail
+
+Permissions and ownership on *path* need to be set such that the
+``varnishd`` worker process has read/write access (see ``workuser`` in
+the `jail`_ option documentation). On a system where ``varnishd``
+starts as root with the default unix jail configuration (``vcache``
+workuser), the permissions can be set using::
+
+  my_fellow_path=...		# REPLACE ... with path
+  chown vcache $my_fellow_path
+  chmod 600 $my_fellow_path
+
+When a VCL-defined fellow storage goes out of scope because the last
+VCL referencing it is discarded, all of its objects are removed from
+the cache, but remain on disk. They can be loaded again by configuring
+a global fellow storage. *Note* that this this kind of dynamic storage
+removal is a new feature first introduced with `fellow` and might not
+work perfectly yet.
+
+When it comes to cache sizes, a "too big" generally does not exist -
+more cache is always better, but `fellow` only supports a memory cache
+of size up to that of the disk cache. For more information, see
+`slash_fellow_size`_.
+
+On Linux, the memory cache will be allocated from huge pages, if
+available and if *memsize* is larger than a huge page. *memsize* will
+then be rounded up to a multiple of the respective huge page size.
+
+Besides the configured memory cache size, approximately 1 / 256 (0.4%)
+of *memsize* plus 1 / 16384 (0.006%) of *dsksize* will be required in
+the varnish home directory and in memory. For example, for
+``dsksize=1t`` and ``memsize=1g``, this amounts to roughly 70MB. The
+actual figures are output at startup as ``fellow: metadata (bitmap)
+memory``.
+
+*objsize_hint* (default 256KB) is used to sanity check *memsize* in
+relation to *dsksize* and to size the fixed log regions. It should be
+set to a value **lower** than the expected average object size. If
+*memsize* is configured too low with respect to *dsksize* and
+*objsize_hint*, a higher *memsize* value will be used (which might
+fail if insufficient memory is available).
+
+For an already populated storage, the configured *memsize* is checked
+against the minimum amount of memory required for the actual average
+object size. If it is too low, fellow will not start and emit a fatal
+error with sizing requirements.
+
+*delete* specifies if the storage is to be emptied.
+
+.. _slash_fellow_size:
+
+Sizing fellow storage
+~~~~~~~~~~~~~~~~~~~~~
+
+This section is intended to provide guidance on cache sizing by
+explaining the overall cache organization and ballpark figures for
+object sizes.
+
+A simple, yet fundamental insight is that, with `fellow`, there is no
+such thing as "delivering objects directly from disk". While hardware
+architectures exist which allow DMA directly from flash storage,
+`fellow` implements a "disk" and "memory" tier, with all reads and
+writes going through RAM first. This architecture has been shown to be
+most efficient both in terms of performance and price/performance, but
+it establishes a fundamental principle for sizing: The memory cache
+should be big enough to hold all actively/frequently accessed
+data. Writes happen to memory, and need to be written to disk before
+the memory can be re-used. Reads go into memory, from where data can
+be accessed.
+
+Besides the always consistent, eventually persistent log, the central
+disk structure is the ``fellow_disk_obj``. It contains the fixed and
+variable object attributes defined by Varnish-Cache (most importantly
+headers) and pointers to the first body segments. For efficiency (log,
+memory) this structure is addressed by a single 64bit value. Because
+`fellow` uses a minimum disk block size of 4KB, the object can have
+sizes between 4KB and just under 16MB. Under optimal circumstances, a
+``fellow_disk_obj`` takes only 4KB, but needs to grow bigger if longer
+headers or vary specifications need to be stored.
+
+When read into memory, a companion data structure named
+``fellow_cache_obj`` is created. Under ideal circumstances (small
+headers), both data structures are made fit into a single 4KB
+allocation or even less, but as a rule of thumb, the amount of memory
+needed per actively accessed object should be assumed to be 4KB plus
+the size of the headers and vary specification. Both
+``fellow_disk_obj`` and ``fellow_cache_obj`` remain in memory for as
+long as any part of the object is accessed.
+
+The object body is organized in chunks of 2^\ *chunk_exponent* bytes,
+called segments. Segments are the smallest I/O units of object bodies
+and are lru-cached individually, allowing `fellow` to handle objects
+bigger than *memsize*: When an object body is iterated over, up to
+*readahead* segments are referenced and, if necessary, asynchronously
+read into cache in advance. Segments outside the readahead window,
+which are not concurrently accessed by other threads, either reside in
+memory on the LRU or only on disk. The amount of disk and memory
+storage in addition to the actual data amounts to roughly 64 bytes per
+segment on disk and another 64 bytes per segment in memory, organized
+in larger units called segment lists, which are sized between 4KB for
+63 segments and 4MB for 65534 segments. Segment lists are read
+asynchronously and LRU'd together with the respective
+``fellow_cache_obj``.
+
+Consequently, the *chunk_bytes* / *chunk_exponent* parameter is chosen
+such that a typical object needs only a small number of chunks, which
+requires an appropriately sized memory cache: To ensure that the cache
+can always move data, the parameter is hard capped at 1/1024 of the
+memory cache size, so, for example, for 1MB chunks, a memory cache of
+at least 1GB is needed.
+
+Extended attributes (currently only used for ESI data) use a separate
+segment, which is only read on demand and also LRU'd with the
+respective object.
+
+"Busy" objects going into cache while being fetched from a backend
+have the same memory requirements as "finished" objects, but need
+another 8KB of memory on top while being created.
+
+To achieve high efficiency and to support Direct I/O, the buddy
+allocator used to organize both the disk and memory cache only ever
+makes allocations at multiples of the requested size, rounded up to
+the next power of two. For this reason, it is normal for
+:ref:`slashmap(1)` to show substantial amounts of free memory (like
+30-40%) in smaller page sizes below 4KB even if LRU is active.
+
+To summarize, one should assume for memory sizing at least the amount
+of data actively accessed, plus 4KB per object, plus 8KB per "busy"
+object.
+
+.. _slash_fellow_resize:
+
+Resizing fellow storage
+~~~~~~~~~~~~~~~~~~~~~~~
+
+In general, resizing a fellow storage is supported by restarting
+varnishd with different parameters (be it on the command line or in
+VCL), but for size reductions, cache contents may be lost, to the
+extent of all cache contents. Read this paragraph for details.
+
+Before applying any size change, it is strongly recommended to cleanly
+shut down fellow using ``varnishadm stop``.
+
+Increasing ``memsize``
+
+  Increasing ``memsize`` up to ``dsksize`` should never cause any
+  issues: Administrators should make sure that the amount of memory is
+  actually available (which might need additional consideration if huge
+  pages are used, see `INSTALL.rst`_), change the parameter and restart
+  :ref:`varnishd(1)`. Configuring ``memsize`` larger than ``dsksize`` is
+  not supported.
+
+Decreasing ``memsize``
+
+  When decreasing ``memsize``, first and foremost consider that
+  performance might significantly degrade, depending on access
+  patterns. As a simple rule, it is recommended to only reduce
+  ``memsize`` of an existing cache by halving at most and then letting
+  the cache contents rotate.
+
+  Consider that a dynamic minimum applies to ``memsize`` (see the
+  paragraph on *objsize_hint* in `slash.fellow()`_), so it can not be
+  made arbitrarily small. ``memsize`` also caps some tunables (see
+  `xfellow.tune()`_), of which *chunk_exponent* / *chunk_bytes*
+  deserve special consideration: At any time of fellow serving
+  requests for object bodies, some number of chunks needs to fit in
+  memory. Obviously, fellow can not work if a new ``memsize`` is
+  chosen too small to fit existing disk chunks. To be on the safe
+  side, *chunk_exponent* / *chunk_bytes* should thus be reduced to at
+  most 1 / 1024 of the planned ``memsize`` reduction *before* the
+  reduction is applied. Then, ideally, all of the cache contents
+  should be recreated. Keep in mind that smaller chunk sizes are
+  generally less efficient.
+
+Increasing ``dsksize``
+
+  Increasing ``dsksize`` is generally not an issue. Keep in mind that
+  memory required for metadata and the minimum ``memsize`` will also
+  increase (see `slash.fellow()`_). It is recommended to increase
+  ``dsksize`` in steps of at least 10% to ensure that free space can
+  be used to accommodate grown log regions (otherwise objects need to
+  be removed until enough contiguous space is available).
+
+  If the configured storage path points to a file, fellow will make an
+  attempt to change its size using :ref:`posix_fallocate(3)`. Success
+  and failure will be reported as ``fellow: ... grown to ...`` or
+  ``fellow: ... warning, fallocate failed ...``.
+
+  If the configured storage path points to a block device, the
+  administrator needs to ensure that it is at least as large as
+  ``dsksize``, or fellow will not start.
+
+  Once the storage is loaded, the log regions will be recreated to
+  accommodate the now higher number of objects possible to store.
+
+Decreasing ``dsksize``
+
+  Decreasing ``dsksize`` is also supported and fellow will make an
+  effort to load as many objects from the shrunken storage as
+  possible, but it will not move data. That is to say, objects
+  residing entirely within the shrunken storage region will be loaded,
+  and others will simply be ignored.
+
+  This also applies to the log: If log blocks reside outside the
+  shrunken storage, the respective objects will not be loaded. Log
+  regions are reported when fellow starts up, so it is possible to
+  configure a reduced ``dsksize`` preserving the log, but this is not
+  a well supported operation. Consider getting professional support if
+  you require help with such advanced reconfigurations on a regular
+  basis.
+
+  Once a shrunken storage is loaded, the log regions will also be
+  shrunk according to the now projected number of objects possible to
+  store.
+
+  The actual size change will be applied to files using
+  :ref:`posix_fallocate(3)` as with increases. The size of block
+  devices can not be changed by fellow.
+
+.. _xfellow.tune():
+
+STRING xfellow.tune([INT logbuffer_size], [DURATION logbuffer_flush_interval], [REAL log_rewrite_ratio], [INT chunk_exponent], [BYTES chunk_bytes], [INT wait_table_exponent], [INT lru_exponent], [INT dsk_reserve_chunks], [INT mem_reserve_chunks], [BYTES objsize_max], [INT objsize_update_min_log2_ratio], [INT objsize_update_max_log2_ratio], [INT objsize_update_min_occupancy], [INT objsize_update_max_occupancy], [INT cram], [INT readahead], [BYTES discard_immediate], [INT io_batch_min], [INT io_batch_max], [ENUM hash_obj], [ENUM hash_log], [ENUM ioerr_obj], [ENUM ioerr_log], [ENUM allocerr_obj], [ENUM allocerr_log], [INT panic_flags])
+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
+
+::
+
+      STRING xfellow.tune(
+            [INT logbuffer_size],
+            [DURATION logbuffer_flush_interval],
+            [REAL log_rewrite_ratio],
+            [INT chunk_exponent],
+            [BYTES chunk_bytes],
+            [INT wait_table_exponent],
+            [INT lru_exponent],
+            [INT dsk_reserve_chunks],
+            [INT mem_reserve_chunks],
+            [BYTES objsize_max],
+            [INT objsize_update_min_log2_ratio],
+            [INT objsize_update_max_log2_ratio],
+            [INT objsize_update_min_occupancy],
+            [INT objsize_update_max_occupancy],
+            [INT cram],
+            [INT readahead],
+            [BYTES discard_immediate],
+            [INT io_batch_min],
+            [INT io_batch_max],
+            [ENUM {sha256, xxh32, xxh3_64, xxh3_128} hash_obj],
+            [ENUM {sha256, xxh32, xxh3_64, xxh3_128} hash_log],
+            [ENUM {panic, purge} ioerr_obj],
+            [ENUM {panic, fail} ioerr_log],
+            [ENUM {panic, purge} allocerr_obj],
+            [ENUM {panic, fail} allocerr_log],
+            [INT panic_flags]
+      )
+
+Using the `xfellow.tune()`_ method, the following parameters of the
+fellow storage can be fine tuned:
+
+* *logbuffer_size*
+
+  - unit: scalar
+  - default: 24336
+  - minimum: 28
+
+  specifies an approximate number of objects to hold in a
+  logbuffer. Once a logbuffer is full, it is flushed if possible, so
+  this parameter constitutes an approximate upper bound on the number
+  of objects to hold unpersisted.
+
+* *logbuffer_flush_interval*
+
+  - unit: duration
+  - default: 2.0s
+  - minimum: 0s
+
+  specifies the regular interval between regular logbuffer flushes,
+  persisting objects to disk. Logbuffer flushes can happen more often
+  if required.
+
+* *log_rewrite_ratio*
+
+  - unit: ratio
+  - default: 0.5
+  - minimum: 0.001
+
+  specifies the minimum ratio of deleted by added objects (n_del / n_add)
+  in the log which triggers a log rewrite.
+
+* *chunk_exponent* / *chunk_bytes*
+
+  - unit: bytes as a power of two / bytes
+  - default: 20 / 1 MB
+  - minimum: 12 / 4 KB
+  - maximum: 28 / 256 MB or <1/1024 of memsize
+
+  *chunk_bytes* and *chunk_exponent* are alternative ways to configure
+  the chunk size. If *chunk_bytes* is used, the value is rounded up to
+  the next power of two and used as if *chunk_exponent* was used with
+  the 2-logarithm of that value.
+
+  *chunk_bytes* / *chunk_exponent* are hard capped to less than 1/1024
+  of the memory cache size.
+
+  Using both arguments at the same time triggers a VCL error.
+
+  See `xbuddy.tune()` for additional details.
+
+* *wait_table_exponent*
+
+  TL;DR: 2-logarithm of concurrency for initial reads of objects from
+  disk.
+
+  - unit: wait table entries as a power of two
+  - default: 10
+  - minimum: 6
+  - maximum: 32
+
+  When objects are initially read from disk after a cold start or
+  eviction from memory, condition variables are used to serialize
+  parallel requests to the same object, similar in effect to the
+  waitinglist mechanism in Varnish-Cache.
+
+  These condition variables are organized in a hash table. This
+  parameter specifies the 2-logarithm of that table's size.
+
+  Two to the power of this value represents an upper limit to the
+  number of objects read from disk in parallel. The actual limit can
+  be lower when hash collisions occur. The amount of memory used is
+  roughly 128 bytes times two to the power of this value.
+
+  Note: The wait table only concerns objects initially read from
+  disk. Once an object is read, its body data is read in parallel
+  independent of this limit.
+
+* *lru_exponent*
+
+  TL;DR: 2-logarithm of number of LRU lists
+
+  - unit: number of LRU lists as a power of two
+  - default: 0
+  - minimum: 0
+  - maximum: 6
+
+  On large systems, with mostly memory bound access, the LRU
+  list becomes the main contender as segments are removed and
+  re-added from/to LRU frequently.
+
+  A single LRU (``lru_exponent=0``) is most fair, only the absolute
+  least recently used segment is eviced ever. But more LRUs reduce
+  contention on the LRU lists significantly and improve parallelism of
+  evictions.
+
+* *dsk_reserve_chunks*
+
+  - unit: scalar
+  - default: 4
+  - minimum: 2 MB / chunk_bytes
+  - maximum: dsksize / 8 / chunk_bytes
+
+  specifies a number of chunks to reserve on disk. The reserve is used
+  to fulfill storage requests when storage is otherwise full. Because
+  LRU cache eviction of disk objects is an expensive process involving
+  disk io, a reserve helps keeping response times for cache misses
+  low. It is also needed for the LRU algorithm itself, which, when the
+  fixed log space is full, might momentarily require additional space
+  before making room.
+
+  The value is always raised to a dynamic minimum such that the disk
+  reserve is at least 2MB.
+
+  The value is capped such that the number of reserved chunks times
+  the chunk size does not exceed 1/8 of the disk size.
+
+* *mem_reserve_chunks*
+
+  - unit: scalar
+  - default: 1
+  - minimum: 0
+  - maximum: memsize / 8 / chunk_bytes
+
+  specifies a number of chunks to reserve in memory per LRU. The
+  reserve is used to provide memory for new objects or objects staged
+  from disk to memory when memory is otherwise full. It can help
+  reduce latencies in these situations at the expense of some memory
+  unavailable for caching.
+
+  The value is capped such that the number of reserved chunks times
+  the chunk size does not exceed 1/8 of the memory size.
+
+* *objsize_max*
+
+  - unit: bytes
+  - default: 0
+
+  specifies the maxiumum object size which fellow will accept.
+
+  The default of ``0`` represents 1/4th of *dsksize*. It is strongly
+  recommended to not use a value higher than that.
+
+  The effectively enforced value is rounded up to 4KB.
+
+* *objsize_update_min_log2_ratio*
+
+  - unit: bytes ratio as a power of two
+  - default: 1
+  - minimum: 1
+  - maximum: 64
+
+  **This parameter should only be changed if advised by a developer.**
+
+  It specifies the minimum binary logarithmic ratio between the
+  expected object size and the actual average object size to trigger
+  an update of the expectation.
+
+  fellow uses an expected object size to determine the required
+  capacity of vital data structures, in particular the fixed log
+  regions on disk. This object size estimate is initially set by the
+  administrator as the *objsize_hint* parameter (see
+  `slash.fellow()`_) and then possibly updated based on the actual
+  size of objects stored.
+
+  Updates of the internal *objsize_hint* are important to ensure that
+  fixed log regions are large enough to hold meta data about all
+  stored objects. On the other hand, they incur relevant cost because
+  recreation of fixed log regions may require a high number of cache
+  objects to be removed in order to free contiguous regions of disk
+  space.
+
+  Thus, the expected object size is only lowered when the rounded-down
+  2-logarithm of the actual average object size is at least
+  *objsize_min_log2_ratio* less than the rounded-down 2-logarithm of
+  the expected object size.
+
+  To illustrate: Suppose *objsize_hint* is given as 65KB, but the
+  actual average object size is 63KB. Then, with the
+  *objsize_update_min_log2_ratio* default of 1, the internal
+  *objsize_hint* will be lowered to 32KB. If
+  *objsize_update_min_log2_ratio* was set to 2, it would remain
+  unchanged.
+
+* *objsize_update_max_log2_ratio*
+
+  - unit: bytes ratio as a power of two
+  - default: 3
+  - minimum: 1
+  - maximum: 64
+
+  **This parameter should only be changed if advised by a developer.**
+
+  It specifies the maximum binary logarithmic ratio between the actual
+  average object size and the expected object size to trigger an
+  update of the expectation.
+
+  The parameter concerns the opposite end of
+  *objsize_update_min_log2_ratio*: When the internal *objsize_hint* is
+  smaller than the actual average object size, disk space is wasted
+  for unused fixed log regions. Yet disk space is relatively cheap,
+  the amount of log space needed per object is relatively low
+  (typically ~400 to ~900 bytes) and the cost of recreating fixed log
+  regions is high, so the expected object size should only be
+  increased for a considerable space saving.
+
+  This parameter's default of 3 triggers an increase of the internal
+  *objsize_hint* only if the rounded down 2-logarithm of the actual
+  average object size is at least 2^3 = 8 times larger than the
+  current *objsize_hint*.
+
+* *objsize_update_min_occupancy*
+* *objsize_update_max_occupancy*
+
+  - unit: percent of disk storage occupied
+  - default: 25 / 75
+  - minimum: 0
+  - maximum: 100
+
+  These parameters specify minimum and maximum percent used of disk
+  space to potentially trigger internal *objsize_hint* updates as
+  described before. The minimum exists to ensure statistical
+  significance of the actual average object size value, the maximum is
+  to avoid costly updates when the storage is highly occupied.
+
+* *cram* is documented in `xbuddy.tune()`_
+
+* *readahead*
+
+  - unit: scalar
+  - default: 5
+  - minimum: 0
+  - maximum: 31 or 1/16th of *memsize*
+
+  specifies how many additional segments of an object's body should be
+  staged into memory asynchronously before being required. This
+  parameter helps keeping response times low and throughput high for
+  objects which are not already present in the memory cache.
+
+  The maximum is the lower of 31 or the value corresponding to 1/16th
+  of *memsize* divided by *chunk_bytes*.
+
+  Read ahead triggers whenever the number of read ahead segments is at
+  readahead / 2 (rounded down) or less. Thus, for the default value of
+  5, read ahead will, after the initial read of 5 segments, read 2
+  segments whenever 2 segments have been sent.
+
+  Note that, on a system with a decently sized memory cache, no disk
+  IO will happen for most requests. When segments are still in memory
+  cache, read ahead only references them. Disk IO is only needed for
+  segments which are accessed for the first time after a cache load or
+  LRU eviction.
+
+* *discard_immediate*
+
+  - unit: bytes
+  - default: 256KB
+  - minimum: 4KB
+
+  minimum size for which to attempt to issue immediate discards of
+  disk blocks to be freed.
+
+  To disable immediate discards, use a number higher than your storage
+  size. For most users, 42PB will work to disable.
+
+  The discard implementation attempts these methods in order:
+
+  - ``ioctl(x, BLKDISCARD, ...)``
+  - ``fallocate(x, FALLOC_FL_KEEP_SIZE | FALLOC_FL_PUNCH_HOLE, ...)``
+
+  Methods are tried once and disabled upon failure, until a tune
+  operation is executed which re-enables discard.
+
+  If possible, discard commands are issued asynchronously, but they
+  need to be completed before disk space can be re-used, so discards
+  can impose additional latency.
+
+  Discard operations are skipped when a space deficit exists.
+
+  The potential advantage is improved performance and reduced wear on
+  flash storage.
+
+  See :ref:`fallocate(2)` and  :ref:`blkdiscard(8)` which contains
+  related information, because there exists no man page for the
+  ``BLKDISCARD`` :ref:`ioctl(2)`.
+
+* *io_batch_min*, *io_batch_max*
+
+  - unit: I/O operations
+  - default: 8, 512
+  - minimum: 1
+
+  Minimum and maximum number of IO operations to batch within a single
+  submission to the kernel, where applicable.
+
+  Larger values save on system calls, but can increase latency.
+
+* *hash_obj*, *hash_log*
+
+  - value: one of ``sha256``, ``xxh32``, ``xxh3_64``, ``xxh3_128``
+  - default: ``xxh3_64`` if xxhash > 0.8.0 has been compiled in,
+    ``xxh32`` if xxhash > 0.7.3 has been compiled in,
+    ``sha256`` otherwise
+
+  *hash_obj* specifies the hash algorithm to ensure data integrity of
+  objects and their data.
+
+  *hash_log* specifies the hash algorithm to ensure data integrity of
+  the log.
+
+* *ioerr_obj*
+
+  - value: ``panic`` or ``purge``
+  - default: ``panic``
+
+  *ioerr_obj* allows to select the action to taken when an IO error
+  is encountered while reading or writing object data or when a
+  checksum mismatch is found for object data:
+
+  - ``panic`` aborts varnish with a panic
+  - ``purge`` purges the object from the cache
+
+  With ``purge``, consider the following consequences:
+
+  * Read errors may lead to delivery of truncated object bodies and/or
+    other hard delivery errors such as early connection closure.
+
+    .. XXX implement .prefetch() from VCL to allow control over it
+
+  * Depending on whether or not the object's segment list is present
+    in RAM, storage may remain allocated until the next restart.
+
+* *ioerr_log*
+
+  *NOTE:* As of this release, this feature is not fully
+  implemented. IO errors may trigger ``panic`` mode even if another
+  mode is selected.
+
+  - value: ``panic`` or ``fail``
+  - default: ``panic``
+
+  *ioerr_log* allows to select the action to taken when an IO error
+  is encountered while reading or writing the log or when a
+  checksum mismatch is found for log data:
+
+  - ``panic`` aborts varnish with a panic
+  - ``fail`` causes all allocation requests to the stevedore to fail
+    (`xfellow.storage()`_ return ``NULL``)
+
+* *allocerr_obj*
+
+  - value: ``panic`` or ``purge``
+  - default: ``panic``
+
+  *allocerr_obj* allows to select the action to take when insufficient
+  memory or storage is available for reading or writing object data:
+
+  - ``panic`` aborts varnish with a panic
+  - ``purge`` purges the object from the cache
+
+  For ``purge``, depending on whether or not the object's segment list
+  is present in RAM, storage may remain allocated until a restart.
+
+  Because the fellow storage is designed to not fail allocations under
+  normal circumstances and instead wait for LRU to make room,
+  ``panic`` is intended also for production use.
+
+* *allocerr_log*
+
+  *NOTE:* As of this release, this feature is not fully
+  implemented. IO errors may trigger ``panic`` mode even if another
+  mode is selected.
+
+  - value: ``panic`` or ``fail``
+  - default: ``panic``
+
+  *allocerr_log* allows to select the action to taken when when insufficient
+  memory or storage is available for reading or writing the log:
+
+  - ``panic`` aborts varnish with a panic
+  - ``fail`` causes all allocation requests to the stevedore to fail
+    (`xfellow.storage()`_ return ``NULL``)
+
+  Because the fellow storage is designed to not fail allocations under
+  normal circumstances and instead wait for LRU to make room,
+  ``panic`` is intended also for production use.
+
+* *panic_flags*
+
+  Used to increase verbosity of panic messages, read as a bit field.
+
+  0x01 : dump full fellow_cache_seg
+  0x02 : dump full fellow_cache_seglist / fellow_disk_seglist
+
+.. _xfellow.storage():
+
+STEVEDORE xfellow.storage()
+---------------------------
+
+Return the the buddy storage. Can be used to set it for storing a
+backend response::
+
+	set beresp.storage = myfellow.storage();
+
+.. _xfellow.as_transient():
+
+VOID xfellow.as_transient()
+---------------------------
+
+Set this fellow storage as the transient storage.
+
+Restricted to: ``vcl_init``.
+
+
+
+.. _slash.as_transient():
+
+VOID as_transient(STEVEDORE)
+----------------------------
+
+Set this storage as the transient storage.
+
+Restricted to: ``vcl_init``.
+
+
+
+.. _slash.tune_fellow():
+
+STRING tune_fellow(STEVEDORE storage, [INT logbuffer_size], [DURATION logbuffer_flush_interval], [REAL log_rewrite_ratio], [INT chunk_exponent], [BYTES chunk_bytes], [INT wait_table_exponent], [INT lru_exponent], [INT dsk_reserve_chunks], [INT mem_reserve_chunks], [BYTES objsize_max], [INT objsize_update_min_log2_ratio], [INT objsize_update_max_log2_ratio], [INT objsize_update_min_occupancy], [INT objsize_update_max_occupancy], [INT cram], [INT readahead], [BYTES discard_immediate], [INT io_batch_min], [INT io_batch_max], [ENUM hash_obj], [ENUM hash_log], [ENUM ioerr_obj], [ENUM ioerr_log], [ENUM allocerr_obj], [ENUM allocerr_log], [INT panic_flags])
+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
+
+::
+
+   STRING tune_fellow(
+      STEVEDORE storage,
+      [INT logbuffer_size],
+      [DURATION logbuffer_flush_interval],
+      [REAL log_rewrite_ratio],
+      [INT chunk_exponent],
+      [BYTES chunk_bytes],
+      [INT wait_table_exponent],
+      [INT lru_exponent],
+      [INT dsk_reserve_chunks],
+      [INT mem_reserve_chunks],
+      [BYTES objsize_max],
+      [INT objsize_update_min_log2_ratio],
+      [INT objsize_update_max_log2_ratio],
+      [INT objsize_update_min_occupancy],
+      [INT objsize_update_max_occupancy],
+      [INT cram],
+      [INT readahead],
+      [BYTES discard_immediate],
+      [INT io_batch_min],
+      [INT io_batch_max],
+      [ENUM {sha256, xxh32, xxh3_64, xxh3_128} hash_obj],
+      [ENUM {sha256, xxh32, xxh3_64, xxh3_128} hash_log],
+      [ENUM {panic, purge} ioerr_obj],
+      [ENUM {panic, fail} ioerr_log],
+      [ENUM {panic, purge} allocerr_obj],
+      [ENUM {panic, fail} allocerr_log],
+      [INT panic_flags]
+   )
+
+Tune the given globally defined fellow storage, for all other
+parameters see `xfellow.tune()`_.
+
+STATISTICS / COUNTERS
+=====================
+
+`buddy` and `fellow` expose statistics and counters which can be
+observed with VSC clients like :ref:`varnishstat(1)`.
+
+The counter documentation is available through :ref:`varnishstat(1)`
+and the :ref:`slash-counters(7)` man page.
+
+The ``g_dsk_*`` and ``g_mem_*`` gauges are updated at regular
+intervals of *logbuffer_flush_interval*.
+
+Interpreting Gauges and Background on Cache Behavior
+----------------------------------------------------
+
+The gauges ``g_mem_space`` and ``g_mem_space`` give the number of free
+bytes in memory and on disk, the ``*_bytes`` statistics give the
+number of used bytes.
+
+On a typical system which uses all of the available cache and evicts
+objects mostly through LRU, these gauges should more or less stabilize
+over time, which should become obvious when logging and graphing the
+above values over longer time spans. But depending on how the cache is
+used and tuned, that point might well be in the region of 70% and
+below.
+
+The fact that `fellow` does not, by default, attempt to use each and
+every byte of the available cache is a deliberate decision:
+
+To achieve optimal disk and network I/O throughput, object data should
+be stored in contiguous regions. However, such a region might not
+always be available, and `fellow` needs to make a decision if
+returning a smaller region or waiting for LRU to make room is the
+better option. Also, it might be better to return a smaller region
+than to split a larger region, which could instead be used for a
+larger object coming in later.
+
+The *cram* parameter controls this trade off: If *cram* allows a
+smaller segment, it is returned, otherwise the allocator needs to wait
+for LRU to make room.
+
+While higher absolute *cram* values improve space usage, they lead to
+higher fragmentation and might negatively impact performance. Positive
+*cram* values avoid using larger free regions for smaller
+requests. Negative *cram* values do not.
+
+See `xbuddy.tune()`_ for additional explanations on *cram*, tuning for
+`fellow` happens through `xfellow.tune()`_.
+
+Another factor is that the LRU algorithm pre-evicts segments and
+objects from cache until ``mem_reserve_chunks`` have been reserved
+
+The important aspect here is that the reserved chunks are contiguous
+in order to counteract fragmentation: LRU runs until there happens to
+be enough contiguous space for each of the reserved chunks.
+
+The smaller objects are compared to the chunk size, the more objects
+need to be evicted for a contiguous chunk to become available.
+
+This behavior can be controlled by adjusting ``chunk_exponent`` /
+``chunk_bytes``. We recommend to set the chunk size larger than the
+expected object size such that typical new objects will fit into
+reserved chunks. However, if the goal is to maximize ram cache usage,
+the chunk size can be reduced at the expense of somehow higher I/O
+overhead and fragmentation.
+
+The higher ``reserve_chunks`` is set, the more agressively LRU will
+pre-evict objects in order to have space available for new requests.
+
+FELLOW DIAGNOSTICS
+==================
+
+`fellow` writes diagnostic information about initialization, the
+initial load and log rewrites to :ref:`vsl(7)`.
+
+To extract the relevant information, query the log in raw mode for
+lines with tag ``Storage`` and no vxid (``vxid == 0``), as for example
+with :ref:`varnishlog(1)`::
+
+       varnishlog -t off -g raw -i Storage -q 'vxid == 0'
+
+During startup, additional diagnostic information is written to
+standard error (stderr).
+
+Explanation of some commonly seen startup errors:
+
+* ``open(...) failed: Permission denied``
+
+  Permissions on the storage path are not set correctly. See
+  `slash.fellow()` for how to set them.
+
+* ``... is not a fellow file``
+
+  The first 4KB of the storage path are neither zero nor written by
+  fellow. This is a safeguard in order to avoid overwriting
+  potentially precious data. Either recreate the file/device or
+  overwrite the first 4KB with zeroes (as always, entirely at your own
+  risk, replace ``...`` with the fellow path)::
+
+    dd if=/dev/zero of=... bs=4096 count=1
+
+FELLOW CACHE LOADING
+====================
+
+Upon :ref:`varnishd(1)` startup with a globally configured `fellow`,
+the log is read to recreate all persisted object sparsely as *vampire
+objects* (that is, only minimal metadata is added to the cache).
+
+Until `fellow` is fully initialized and the cache loaded, the varnish
+instance remains unusable. This is because free space on the storage
+is implicitly defined as not being used by any object. Further
+improvements of the initial load time might be possible, though.
+
+To wait for cache loading to complete, the following methods can be
+used:
+
+* Wait for the ``FELLOW.<name>.b_happy`` bitfield from
+  :ref:`slash-counters(7)` to become non-zero.
+
+* Wait for the ``storage.<name>.happy`` VCL variable to become true.
+
+Cache loading can be observed using the folowing methods:
+
+* By observing the :ref:`varnish-counters(7)` ``MAIN.n_objectcore``
+  and ``MAIN.n_vampireobject``. Note that to see the latter with
+  :ref:`varnishstat(1)` in interactive mode, the ``v`` key needs to be
+  pressed to select at least ``DIAG`` verbosity.
+
+* By running :ref:`slashmap(1)` to observe how the disk space shown as
+  allocated fills up as the log is processed.
+
+* By running the :ref:`varnishlog(1)` command given under `FELLOW
+  DIAGNOSTICS`_. It will continiously display updates on the number of
+  loaded objects like in this example::
+
+    ...
+    0 Storage        - fellow fellow: resurrected 8231700
+    0 Storage        - fellow fellow: resurrected 8416700
+    ...
+
+  When loading is complete, a summary will be shown like::
+
+    0 Storage        - ... done: 53.485482s
+
+    0 Storage        - fellow fellow: first load t1 = 0.154415
+    0 Storage        - fellow fellow: 10010027 resurrected in 53.485892s (187160.720410/s), 431 already expired
+
+FELLOW PLANNED BUT MISSING FEATURES
+===================================
+
+The following features are planned for implementation:
+
+* Support some successor of xkey (additional cache keys)
+* Further improve cache loading speed
+
+Please see `README.rst`_ for how to support the project in order to
+get them implemented.
+
+FELLOW KNOWN ISSUES
+===================
+
+* With `fellow` storage on XFS, spurious read errors - most likely
+  short reads - have been observed. While short reads are technically
+  legal to happen, handling them would complicate the `fellow` code
+  substantially, so a fix is currently not planned. If you require a
+  fix, please support the project and let us know, see _CONTRIBUTING_
+  in `README.rst`_ for details.
+
+  For best performance, it is recommended to use `fellow` storage on a
+  raw device.
+
+* On Linux with ``io_uring``, by default, `fellow` registers all of
+  the memory cache as buffers using
+  :ref:`io_uring_register_buffers(3)` to achieve optimal performance
+  at runtime, if supported by the system. Where supported, this
+  enables *zero-copy* IO, where the hardware performs DMA directly
+  into the `fellow` memory cache.
+
+  Buffer registrations happen in multiple threads in parallel, one for
+  each io ring.
+
+  During initialization, however, this takes considerable amounts of
+  time for larger memory caches.
+
+  If this is an issue for you, please ask the kernel developers to
+  make buffer registration more efficient.
+
+  If you are willing to sacrifice runtime performance for a faster
+  startup, :ref:`varnishd(1)` can be started with the environment
+  variable ``slash_fellow_options`` set to contain
+  ``skip-uring-register-buffers``.
+
+  If the variable contains ``sync-uring-register-buffers``, buffer
+  registration is forced to be done as serial, syncronous registration
+  operations.
+
+  Note that even with registered buffers, ``io_uring`` has nothing to
+  do with how the `fellow` memory cache and LRU on it work.
+
+* Bug 3940_ causes :ref:`varnishd(1)` to hang if storage
+  initialization takes longer than the ``cli_timeout``.
+
+  For varnish-cache versions with the fix 3941_, set
+  ``startup_timeout`` to a duration sufficient for `fellow` startup,
+  e.g. add to the :ref:`varnishd(1)` arguments::
+
+	-p startup_timeout=3600
+
+  For varnish-cache versions without this fix, set ``cli_timeout``
+  instead, e.g. add to the :ref:`varnishd(1)` arguments::
+
+	-p cli_timeout=3600
+
+  .. _3940: https://github.com/varnishcache/varnish-cache/issues/3940
+  .. _3941: https://github.com/varnishcache/varnish-cache/pull/3941
+
+* Because `fellow` might use varnish threads for some or all IOs and
+  those might be issued in huge bursts, the infamous *Worker Pool
+  Queue does not move* panic is more likely to occur when there is
+  otherwise no problem. It is thus recommended to set the
+  ``thread_pool_watchdog`` parameter to a value significantly higher
+  than the default, e.g. by adding to the :ref:`varnishd(1)`
+  arguments::
+
+	-p 'thread_pool_watchdog=600'
+
+FELLOW ADDITIONAL TUNING KNOBS
+==============================
+
+These options are not expected to ever require tuning, but exist just
+in case:
+
+* The environment variable ``fellow_log_io_entries`` can be used to
+  set the log io ring size, which is configured when the storage
+  engine starts. The default is 1024, values below 128 are not
+  generally recommended, and for higher values, the stack size will
+  likely need to be adjusted or stack overflows might occur.
+
+  Three leased log IO rings are used for reading and writing log data.
+
+* Likewise, the environment variable ``fellow_cache_io_entries`` can
+  be used to set the cache io ring size.
+
+  A single shared IO ring is used for reading and writing object data.
+
+Both options affect all IO backends, but in different ways:
+
+* For io_uring, they set the submission and completion ring sizes,
+  which, simply put, define the maximum number of IOs to be handled
+  through a single system call. With io_uring, this specifically does
+  not affect the maximum number of IOs "in flight".
+
+* For the other IO backends, they define the maximum number of IOs "in
+  flight".
+
+LOADMASTER VMOD INTERFACES
+==========================
+
+We call storage routers loadmasters because they coordinate
+stevedores.
+
+.. _slash.loadmaster_rr():
+
+new xloadmaster_rr = slash.loadmaster_rr()
+------------------------------------------
+
+Defines a round-robin loadmaster which allocates objects from
+associated storages in turn. If the preferred, round-robin selected
+storage fails, other storages are tried in order until one succeeds,
+if at all.
+
+For performance reasons, the implementation does not serialize
+requests, so concurrent requests might receive object allocations from
+the same backend momentarily. This effect should average out.
+
+.. _xloadmaster_rr.add_storage():
+
+VOID xloadmaster_rr.add_storage(STEVEDORE)
+------------------------------------------
+
+Add a storage to the loadmaster.
+
+Restricted to: ``vcl_init``.
+
+
+
+.. _xloadmaster_rr.storage():
+
+STEVEDORE xloadmaster_rr.storage()
+----------------------------------
+
+Return a reference to the loadmaster, mostly for use with ``set
+beresp.backend = loadmaster.storage()``.
+
+.. _slash.loadmaster_hash():
+
+new xloadmaster_hash = slash.loadmaster_hash()
+----------------------------------------------
+
+Defines a hashing loadmaster which selects the preferred storage by
+taking the first four bytes of the object's hash key (basically
+``req.hash``) modulo the number of storages defined.
+
+As with `slash.loadmaster_rr()`_, if the preferred storage fails,
+other storages are tried in order until one succeeds, if at all.
+
+.. _xloadmaster_hash.add_storage():
+
+VOID xloadmaster_hash.add_storage(STEVEDORE)
+--------------------------------------------
+
+Same as `xloadmaster_rr.add_storage()`_.
+
+Restricted to: ``vcl_init``.
+
+
+
+.. _xloadmaster_hash.storage():
+
+STEVEDORE xloadmaster_hash.storage()
+------------------------------------
+
+Same as `xloadmaster_rr.storage()`_.
+
+SEE ALSO
+========
+
+:ref:`vcl(7)`, :ref:`varnishd(1)`