Documentation overhaul

32532a98 · Nils Goroll · 3370a4c4 · 32532a98 · 32532a98 · 32532a98
Unverified Commit 32532a98 authored Feb 24, 2023 by Nils Goroll
Showing with 945 additions and 871 deletions

.gitignore .gitignore +2 -1

Makefile.am Makefile.am +0 -4

README.rst README.rst +119 -787

vdp_pesi.vcc src/vdp_pesi.vcc +2 -79

vmod_pesi.man.rst src/vmod_pesi.man.rst +822 -0

No files found.
--- a/.gitignore
+++ b/.gitignore
@@ -33,7 +33,8 @@ Makefile.in

 /src/vcc_pesi_debug_if.[ch]
 /src/vcc_pesi_if.[ch]
-/src/vmod_*rst
+/src/vmod_*debug*rst
+/src/vmod_pesi.rst
 /src/VSC_pesi.c
 /src/VSC_pesi.h


--- a/Makefile.am
+++ b/Makefile.am
@@ -9,10 +9,6 @@ EXTRA_DIST = README.rst LICENSE CONTRIBUTING.rst INSTALL.rst

 doc_DATA = README.rst LICENSE CONTRIBUTING.rst INSTALL.rst

-README.rst: src/vdp_pesi.vcc
-	$(MAKE) $(AM_MAKEFLAGS) -C src vmod_pesi.man.rst
-	cp src/vmod_pesi.man.rst README.rst
-
 coverage:
 	$(MAKE) $(AM_MAKEFLAGS) -C src coverage


--- a/README.rst
+++ b/README.rst
-..
-.. NB:  This file is machine generated, DO NOT EDIT!
-..
-.. Edit ./vdp_pesi.vcc and run make instead
-..
+==============================
+Parallel ESI for Varnish-Cache
+==============================

 .. role:: ref(emphasis)

-=========
-vmod_pesi
-=========
+.. _Varnish-Cache: https://varnish-cache.org/

----------------------------------------------------
-Varnish Delivery Processor for parallel ESI includes
----------------------------------------------------
+This project provides parallel ESI processing for `Varnish-Cache`_ as
+a module (VMOD).

-:Manual section: 3
+PROJECT RESOURCES
+=================

-SYNOPSIS
-========
+* The primary repository is at https://code.uplex.de/uplex-varnish/libvdp-pesi

-::
+  This server does not accept user registrations, so please use ...
+
+* the mirror at https://gitlab.com/uplex/varnish/libvdp-pesi for issues,
+  merge requests and all other interactions.
+
+INTRODUCTION
+============
+
+.. _Standard ESI processing: https://varnish-cache.org/docs/trunk/users-guide/esi.html
+
+`Standard ESI processing`_ in `Varnish-Cache`_ is sequential. In
+short, it works like this:

-  import pesi;
+1. Process the (sub)request

-  # Enable parallel ESI processing in vcl_deliver {}.
-  VOID pesi.activate()
+2. For a cache-miss or pass, fetch the requested object and parse it
+   on the backend side, if ESI parsing is enabled. Store the object in
+   a parsed, pre-segmented form.

-  # Set a boolean configuration parameter.
-  VOID pesi.set(ENUM, BOOL)
+3. Back on the client side, process the parsed, pre-segmented ESI
+   object. For all includes, create a sub-request and start with it at
+   step 1.

-  # Configure workspace pre-allocation for internal variable-sized
-  # data structures.
-  VOID pesi.workspace_prealloc(BYTES min_free, INT max_nodes)
+Simply put, this process is very efficient if Step 2 does not need to
+be done because the requested object is already in cache.

-  # Configure the memory pool used when pre-allocated structures
-  # from the workspace are insufficient.
-  VOID pesi.pool(INT min, INT max, DURATION max_age)
+Conversely, the total time it takes to generate an ESI response is
+roughly the sum of all fetch times.

-  # VDP version
-  STRING pesi.version()
+This is where parallel ESI processing can help: In step 3, all the
+sub-requests for any particular object are run in parallel, such that
+the total time it takes to generate an ESI response at a particular
+include level is reduced to the longest of the fetch times.

-.. _varnishd(1): https://varnish-cache.org/docs/trunk/reference/varnishd.html
+"At a particular include level" is important, because the optimization
+only helps if there are many includes at a particular level: For
+example, if object A includes object B, which includes object C and no
+object is cachable, they still need to be fetched in order: The
+request for B can only be started once A is available, and likewise
+for B and C.

-.. _vcl(7): https://varnish-cache.org/docs/trunk/reference/vcl.html
+To summarize:

-.. _varnishadm(1): https://varnish-cache.org/docs/trunk/reference/varnishadm.html
+* Parallel ESI can *substantially* increase the response times for ESI
+  if cachable objects include many uncachable objects. The maximum
+  benefit, compared with standard, serial processing, is achieved
+  through parallel ESI in cases where all nodes of an ESI tree are
+  cacheable and at least some leafs are not.

-.. _varnishstat(1): https://varnish-cache.org/docs/trunk/reference/varnishstat.html
+* If basically all objects are cacheable, parallel ESI only provides a
+  relevant benefit on an empty cache or of cache TTLs are low, such
+  that cache misses are likely.
+
+Example
+-------
+
+Consider this ESI tree, where an object A includes B1 and B2, which,
+in turn, include C1 to C3 and C4 to C6, respectively::
+
+          A
+       __/ \__
+      /       \
+     B1       B2
+   / |  \   / |  \
+  C1 C2 C3 C4 C5 C6
+
+Let's assume that A, B1 and B2 are cacheable and already in cache and
+all C objects are uncacheable (passes). Let's also assume that C1 to
+C6 take their number times 100ms to fetch from the backend - that is,
+C1 takes 100ms, C2 200ms etc.
+
+With `Standard ESI processing`_, the total response time will be
+roughly 100ms + 200ms + ... 600ms = 2100ms = 2.1s. If the response is
+a web page, the top bit will load relatively fast, the next part half
+as fast, the third part again 100ms slower etc.
+
+With parallel ESI, the total response time will be roughly 600ms =
+0.6s. There will still be a delay for each fragment of the page, but
+it will be 100ms for each part.
+
+REQUIREMENTS
+============
+
+All versions of the VDP require strict ABI compatibility with Varnish,
+meaning that it must run against the same build version of Varnish as
+the version against which the VDP was built. This means that the
+"commit id" portion of the Varnish version string (the SHA1 hash) must
+be the same at runtime as at build time.
+
+INSTALLATION
+============
+
+See `INSTALL.rst <INSTALL.rst>`_ in the source repository.

 TL;DR: QUICK START
 ==================

-This documentation is detailed on purpose. It aims to explain well
+The full documentation of thie VMOD is in :ref:`vmod_slash(3)`. If you
+are reading this document online, it should be available as
+`vmod_pesi.man.rst <src/vmod_pesi.man.rst>`_.
+
+The full documentation is detailed on purpose. It aims to explain well
 how this VMOD works and how optimizations can be tuned.

 We welcome all users to read the documentation, but many users will
@@ -89,744 +153,6 @@ differently from standard ESI. Understanding these difference, and how
 to monitor and manage resource usage affected by pESI, is a main focus
 of the detailed discussion that follows.

-DESCRIPTION
-===========
-
-.. _standard ESI processing: https://varnish-cache.org/docs/trunk/users-guide/esi.html
-
-VDP pesi is a Varnish Delivery Processor for parallel Edge Side
-Includes (ESI). The VDP implements content composition in client
-responses as specified by ``<esi>`` directives in the response body,
-just as Varnish does with its `standard ESI processing`_. While
-standard Varnish processes ESI subrequests serially, in the order in
-which the ``<esi>`` directives appear in the response, the pesi VDP
-executes the subrequests in parallel. This can lead to a significant
-reduction in latency for the complete response, if Varnish has to wait
-for backend fetches for more than one of the included requests.
-
-Backend applications that use ESI includes for standard Varnish can be
-expected to work without changes with the VDP, provided that they do
-not depend on assumptions about the serialization of ESI subrequests.
-Serial ESI requests are processed in a predictable order, one after
-the other, but the pesi VDP executes them at roughly the same time. A
-backend may conceivably receive a request forwarded for the second
-include in a response before the first one. If the logic of ESI
-composition in a standard Varnish deployment does not depend on the
-serial order, then it will work the same way with VDP pesi.
-
-Parallel ESI processing is enabled by invoking |pesi.activate()|_ in
-``vcl_deliver {}``::
-
-   import pesi;
-   
-   sub vcl_backend_response {
-       set beresp.do_esi = true;
-   }
-   
-   sub vcl_deliver {
-       pesi.activate();
-   }
-
-Other functions provided by the VDP serve to set configuration
-parameters (or return the VDP version string). If your deployment uses
-the default configuration, then |pesi.activate()|_ in ``vcl_deliver``
-may be the only modification to VCL that you need.
-
-The invocation of |pesi.activate()|_ can of course be subject to
-logic in VCL::
-
-   sub vcl_deliver {
-       # Use parallel ESI only if the request header X-PESI is present.
-       if (req.http.X-PESI) {
-           pesi.activate();
-       }
-   }
-
-But see below for restrictions on the use of |pesi.activate()|_.
-
-All of the computing resources used by the pesi VDP -- threads, storage,
-workspace, locks, and so on -- can be configured, either with Varnish
-runtime parameters or configuration settings made available by the
-pesi VDP. And their usage can be monitored with Varnish statistics. So you
-can limit resource usage, and use monitoring tools such as
-`varnishstat(1)`_ to ensure efficient parallel ESI processing. For
-details see `RESOURCE USAGE, CONFIGURATION AND MONITORING`_ below.
-
-.. _pesi.activate():
-
-VOID activate()
---------------
-
-Enable parallel ESI processing for the client response.
-
-``pesi.activate()`` MUST be called in ``vcl_deliver {}`` only. If it is
-called in any other VCL subroutine, VCL failure is invoked (see
-`ERRORS`_ below for details).
-
-If ``pesi.activate()`` is called on *any* ESI level (any depth of include
-nesting), then it MUST be called on *all* levels of the response. If
-``pesi.activate()`` is invoked at some ESI levels but not others, then the
-results are undefined, and will very likely lead to a Varnish panic.
-
-It is also safe, for instance, to call ``pesi.activate()`` only if a
-request header is present, as in the example shown above; since the
-same request headers are set for every ESI subrequest, the result is
-the same at every ESI level. But that should *not* be done if you have
-logic that unsets the header at some ESI levels but not at
-others. Under no circumstances should the invocation of ``pesi.activate()``
-depend on the value of ``req.esi_level``, or on ``req.url`` (since
-URLs are different at different ESI levels).
-
-See |pesi.set()|_ below for a way to choose serial
-ESI processing for all of the includes in the response at the current
-ESI level. Even then, ``pesi.activate()`` must be called in ``vcl_deliver
-{}`` in addition to ``pesi.set()``.
-
-As with standard Varnish, ESI processing can be selectively disabled
-for a client response, by setting ``resp.do_esi`` to ``false`` in VCL
-since version 4.1, or setting ``req.esi`` to ``false`` in VCL 4.0 (see
-`vcl(7)`_). The requirement remains: if ESI processing is enabled and
-``pesi.activate()`` is called at any ESI level, then both must happen at
-all levels.
-
-``pesi.activate()`` has the effect of setting the VCL string variable
-``resp.filters``, which is a whitespace-separated list of the names of
-delivery processors to be applied to the client response (see
-`vcl(7)`_). It configures the correct list of filters for the current
-response, analogous to the default filter settings in Varnish when
-sequential ESI is in use. These include the ``gunzip`` VDP for
-uncompressed responses, and ``range`` for responses to range
-requests. ``pesi.activate()`` checks the conditions for which the VDPs are
-required, and arranges them in the correct order.
-
-It is possible to manually set or change ``resp.filters`` to enable
-parallel ESI, instead of calling ``pesi.activate()``, but that is only
-advised to experts. If you do so, use the string ``pesi`` for this
-VDP, and do *not* include ``esi``, for Varnish's standard ESI VDP, in
-the same list with ``pesi``. As with the ``pesi.activate()`` call -- if
-``pesi`` appears in ``resp.filters`` for a response at *any* ESI
-level, it MUST be in ``resp.filters`` at *all* ESI levels.
-
-Notice that all VCL code affecting ESI (such as setting
-``resp.do_esi``), gzip (such as changes to
-``req.http.Accept-Encoding``) or range processing (such as changes
-``req.http.Range``) must execute before this function is called to
-have an effect.
-
-Example::
-
-  vcl 4.1;
-
-  import pesi;
-
-  sub vcl_recv {
-      # Disable gzipped responses by removing Accept-Encoding.
-      unset req.http.Accept-Encoding;
-  }
-
-  sub vcl_backend_response {
-      set beresp.do_esi = true;
-  }
-
-  sub vcl_deliver {
-      # If the request header X-Debug-ESI is present, then disable ESI
-      # for the current response.
-      if (req.http.X-Debug-ESI) {
-          set resp.do_esi = false;
-      }
-      pesi.activate()
-  }
-
-.. _pesi.set():
-
-VOID set(ENUM {serial, thread} parameter, [BOOL bool])
------------------------------------------------------
-
-Set a configuration parameter for the VDP, which holds for the current
-(sub)request, as documented below. The parameter to be set is
-identified by the ENUM ``parameter``. Currently the parameters can
-only be set with a boolean value in ``bool`` (but future versions of
-this function may allow for setting other data types).
-
-``pesi.set()`` MUST be called in ``vcl_deliver {}`` only; otherwise VCL
-failure is invoked (see `ERRORS`_).
-
-The parameters that can be set are currently ``serial`` and ``thread``:
-
-``serial``
----------
-
-Activates serial mode if ``bool`` is ``true``; default is ``false``.
-
-In serial mode, the ESI subrequests processed for includes in the
-current response body are processed in serial, in the current thread.
-In other words, all ESI subrequests at the next level will be
-processed without requesting threads from the thread pool (which
-potentially starts new threads, if necessary). This setting only
-affects include processing at the current ESI level, not nested
-includes at the next level.
-
-It is strongly recommended to *not* use serial mode from ESI level 0
-(the top level request received from a client), because the ESI level
-0 thread can send available data to the client concurrently with other
-parallel ESI threads.
-
-Serial mode may sensibly be used to reduce overhead and the number of
-threads required without relevant drawbacks
-
-* at ESI level > 0 _and_
-
-* when the VCL author knows that all objects included by the current
-  request are cacheable, and thus are highly likely to lead to cache
-  hits.
-
-Example::
-
-  # Activate serial mode at ESI level > 0, if we know that all includes
-  # in the response at this level lead to cacheable responses.
-
-  sub vcl_deliver {
-      pesi.activate();
-      if (req.esi_level > 0 && req.url ~ "^/all/cacheable/includes") {
-          pesi.set(serial, true);
-      }
-  }
-
-.. _thread:
-
-``thread``
----------
-
-Whether we always request a new thread for includes, default is
-``true``.
-
-* ``false``
-
-  Only use a new thread if immediately available, process the include
-  in the same thread otherwise.
-
-* ``true``
-
-  Request a new thread, potentially waiting for one to become
-  available.
-
-See the detailled discussion in `THREADS`_ for details.
-
-.. _pesi.workspace_prealloc():
-
-VOID workspace_prealloc(BYTES min_free, INT max_nodes)
------------------------------------------------------
-
-::
-
-   VOID workspace_prealloc(BYTES min_free=4096, INT max_nodes=32)
-
-Configure the maximum amount of workspace used for pesi internal data
-structures.
-
-The pesi VDP builds a structure, whose size is roughly proportional to
-the size of the ESI tree -- the conceptual tree with the top-level
-response at the root, and its includes and all of their nested
-includes as branches. The nodes in this structure have a fixed size,
-but the number of nodes used by the VDP varies with the size of the
-ESI tree.
-
-For each (sub)request, the VDP pre-allocates a constant number of such
-nodes in client workspace, and initially uses the pre-allocation for
-child nodes of that (sub)request. If more are needed, they are
-obtained from a global memory pool as described below. The use of
-pre-allocated nodes from workspace is preferred, since it never
-requires new system memory allocations (workspaces themselves are
-pre-allocated by Varnish), and because they are local to each request,
-so locking is never required to access them (but is required for the
-memory pool).
-
-The pre-allocation only uses workspace available after ``vcl_deliver
-{}`` returns, keeping at least ``min_free`` bytes free, if
-possible. Thus, the number of nodes configured by ``max_nodes`` may
-not actually be available, unless the ``workspace_client`` parameter
-is set sufficiently high.
-
-``pesi.workspace_prealloc()`` configures the pre-allocation. The default
-values of its parameters are defaults used by the VDP; that is, the
-configuration if ``pesi.workspace_prealloc()`` is never called.
-
-The ``min_free`` parameter sets the minimum amount of space that the
-pre-allocation will always leave free in client workspace; if the
-targeted number of pre-allocated nodes would result in less free space
-than ``min_free`` bytes in workspace, then fewer nodes are
-allocated. This ensures that free workspace is always left over for
-other VMODs, VCL usage, and so forth. Note that most of the operations
-typically requiring workspace have already finished when VDP pesi
-makes the pre-allocation, because it starts after `vcl_deliver
-{}`. Thus, the reservation is mostly for other VDPs and VMODs using
-`PRIV_TOP`. ``min_free`` defaults to 4 KiB.
-
-If other VDPs or VMODs using `PRIV_TOP` report workspace overflows,
-``min_free`` should be increased.
-
-The ``max_nodes`` parameter sets the number of nodes to be allocated,
-unless the limit imposed by ``min_free`` is exceeded; ``max_nodes``
-defaults to 32. ``max_nodes`` MUST be >= 0; otherwise, VCL failure is
-invoked (see `ERRORS`_). If ``max_nodes`` is set to 0, then no nodes
-are pre-allocated; they are all taken from the memory pool described
-below.
-
-Ideally, ``max_nodes`` matches the number of includes any one ESI
-object can have plus the number of fragments before, after and
-in between the includes. For all practical purposes, ``max_nodes``
-should match twice the number of expected ESI includes. However, if
-the number of ESI includes across objects varies substantially, it
-might be better to use less memory and set ``max_nodes`` according to
-the number of includes of a typical object, so that objects with
-more includes use the memory pool.
-
-When ``pesi.workspace_prealloc()`` is called, its configuration becomes
-effective immediately for all new requests processed by the VDP. The
-configuration remains valid for all instances of VCL, for as long as
-the VDP remains loaded; that is, until the last instance of VCL using
-the VDP is discarded.
-
-``pesi.workspace_prealloc()`` can be called in ``vcl_init`` to set the
-configuration at VCL load time.  But you can also write VCL that calls
-the function when a request is received by Varnish, for example using
-a special URL for system administrators. This is similar to using the
-``param.set`` command for `varnishadm(1)`_ to change a Varnish
-parameter at runtime. Such a request should be protected, for example
-with an ACL and/or Basic Authentication, so that it can be invoked
-only by admins. Remember that as soon as such a request is processed
-and ``pesi.workspace_prealloc()`` is executed, the changed configuration is
-globally valid.
-
-Examples::
-
-  # Configure workspace pre-allocation at VCL load time.
-  sub vcl_init {
-      pesi.workspace_prealloc(min_free=8k, max_nodes=64);
-  }
-  
-  # Change the configuration at runtime, when Varnish receives an
-  # admin request.
-  import pesi;
-  import std;
-  
-  sub vcl_recv {
-      if (req.url ~ "^/admin/pesi_ws") {
-  
-          # Reject the request with "403 Forbidden" unless the client
-          # IP matches an ACL for admin requests.
-          if (client.ip !~ admin_acl) {
-              return (synth(403));
-          }
-  
-          # Set min_free from a GET parameter, if present.
-          if (req.url ~ "\bmin_free=\d+[kmgtp]?") {
-              # Extract the BYTES parameter.
-              set req.http.Tmp-Bytes
-                  = regsub(req.url, "^.+\bmin_free=(\d+[kmgtp]?).*$", "\1");
-              pesi.workspace_prealloc(std.bytes(req.http.Tmp-Bytes));
-          }
-  
-          # Set max_nodes from a GET parameter.
-          if (req.url ~ "\bmax_nodes=\d+") {
-              # Extract the INT parameter.
-              set req.http.Tmp-Nodes
-                  = regsub(req.url, "^.+\bmax_nodes=(\d+).*$", "\1");
-              pesi.workspace_prealloc(max_nodes=std.integer(req.http.Tmp-Nodes));
-          }
-  
-          # Return status 204 to indicate success.
-          return (synth(204));
-      }
-  }
-
-.. _pesi.pool():
-
-VOID pool(INT min=10, INT max=100, DURATION max_age=10)
-------------------------------------------------------
-
-Configure the memory pool used by the VDP for internal variable-sized
-data structures, when more is needed than is provided by the client
-workspace pre-allocation described above. The objects in the memory
-pool are the nodes used in structures whose size is proportional to
-the size of the ESI tree, as discussed above.
-
-The VDP uses the same mechanism that Varnish uses for its memory
-pools, and the configuration values have the same meaning and defaults
-as the Varnish runtime parameters ``pool_req``, ``pool_sess`` and
-``pool_vbo`` (see `varnishd(1)`_). ``min`` and ``max`` control the
-size of the pool -- the number of pre-allocated nodes available for
-allocation requests. ``max_age`` is the maximum lifetime for nodes in
-the pool -- when there are no pending allocation requests, nodes in
-the pool that are older than ``max_age`` are freed, down to the limit
-imposed by ``min``.
-
-The values of the parameters MUST fulfill the following requirements,
-otherwise VCL failure is invoked (see `ERRORS`_):
-
-* ``min`` and ``max`` MUST be both > 0.
-
-* ``max`` MUST be >= ``min``.
-
-* ``max_age`` MUST be >= 0s (and <= one million seconds).
-
-Note that ``max`` is a soft limit. The memory pool satisfies all
-allocation requests, even if ``max`` is execeeded when nodes are
-returned to the pool. But the pool size will then be reduced to
-``max``, without waiting for ``max_age`` to expire.
-
-As with |pesi.workspace_prealloc()|_: when ``pesi.pool()`` is called, the
-changed configuration immediately becomes valid (although it may take
-some time for the memory pool to adjust to the new values). It remains
-vaild for as long as the VDP is still loaded, unless ``pesi.pool()`` is
-called again. ``pesi.pool()`` may be called in ``vcl_init`` to set a
-configuration at VCL load time, but may also be called elsewhere in
-VCL, for example to enable changing configurations at runtime using a
-special "admin" request.
-
-Examples::
-
-  # Configure the memory pool at VCL load time.
-  sub vcl_init {
-      pesi.pool(min=50, max=500, max_age=30s);
-  }
-  
-  # Change the configuration at runtime, when Varnish receives an
-  # admin request.
-  import pesi;
-  import std;
-  
-  sub vcl_recv {
-      if (req.url ~ "^/admin/pesi_pool") {
-  
-          # Protect the call with an ACL, as in the example above.
-          if (client.ip !~ admin_acl) {
-              return (synth(403));
-          }
-
-          # Set max_age from a GET parameter.
-          if (req.url ~ "\bmax_age=\d+(\.\d+)?(ms|s|m|h|d|w|y)") {
-              # Extract the DURATION parameter.
-              set req.http.Tmp-Duration
-                  = regsub(req.url,
-                           "^.\bmax_age=(\d+(?:\.\d+)?(?:ms|s|m|h|d|w|y))+.*$",
-                           "\1");
-              pesi.pool(max_age=std.duration(req.http.Tmp-Duration));
-          }
-
-          # Set min from a GET parameter.
-          if (req.url ~ "\bmin=\d+") {
-              # Extract the INT parameter.
-              set req.http.Tmp-Min = regsub(req.url, "^.+\bmin=(\d+).*$", "\1");
-              pesi.pool(min=std.integer(req.http.Tmp-Min));
-          }
-
-          # Extract max from a GET parameter, the same way as for min,
-          # not repeated here ...
-
-          # Status 204 indicates success.
-          return (synth(204));
-      }
-  }
-
-.. _pesi.version():
-
-STRING version()
----------------
-
-Return the version string for this VDP.
-
-Example::
-
-  std.log("Using VDP pesi version: " + pesi.version());
-
-ERRORS
-======
-
-As documented above, VCL failure is invoked under some of the error
-conditions for functions provided by the VDP. VCL failure has the same
-results as if ``return(fail)`` is called from a VCL subroutine:
-
-* If the failure occurs in ``vcl_init``, then the VCL load fails with
-  an error message.
-
-* If the failure occurs in any other subroutine besides ``vcl_synth``,
-  then a ``VCL_Error`` message is written to the log, and control is
-  directed immediately to ``vcl_synth``, with ``resp.status`` set to
-  503 and ``resp.reason`` set to ``"VCL failed"``.
-
-* If the failure occurs in ``vcl_synth``, then ``vcl_synth`` is
-  aborted, and the response line "503 VCL failed" is sent.
-
-RESOURCE USAGE, CONFIGURATION AND MONITORING
-============================================
-
-.. _Transient storage allocator: https://varnish-cache.org/docs/trunk/users-guide/storage-backends.html#transient-storage
-
-To understand the way computing resources are used by the VDP, and
-thus how they can be configured and monitored, first note that
-response bodies returned for ESI subrequests running in parallel may
-have to be buffered. Consider a response body with two
-``<esi:include>`` directives, both of which lead to parallel backend
-fetches, and the second fetch is finished before the first one. The
-second response body must be retained while Varnish waits for the
-first one, since the contents of the top-level client response must be
-delivered in correct order.
-
-If the second response is added to the cache, then buffering is not
-necessary, because it can be retrieved from the cache (this is also
-true if it was a cache hit in the first place). But an uncacheable
-response must be buffered, until its contents are delivered. The VDP
-uses Varnish's `Transient storage allocator`_ for this
-purpose. Transient storage only needs to be used while the VDP is
-waiting to deliver response contents; space is returned as soon as the
-contents have been sent. The amount of Transient storage needed
-depends on the size of all uncacheable included responses being
-processed at any one time.
-
-The VDP runs ESI subrequests (for each ``<esi:include>`` directive at
-every ESI level) in separate threads, unless instructed not to do so
-due to the use of either ``pesi.set(serial, true)`` or ``pesi.set(thread,
-false)``, as documented above. The threads are requested from the
-thread pools managed by Varnish. This means that in most cases, for
-well-configured thread pools, the overhead of starting new threads is
-not incurred during request processing -- the VDP obtains a thread
-that is immediately ready for use.
-
-The VDP uses client workspace at the top-level request (ESI level 0)
-for fixed-sized internal metadata. It also uses client workspace to
-pre-allocate a constant number of nodes in variable-sized structures,
-as described in |pesi.workspace_prealloc()|_ above.  Together these
-make for a fixed-sized demand on client workspace, when
-|pesi.activate()|_ is invoked. The size of the space needed from
-workspace varies on different systems, and depends on
-|pesi.workspace_prealloc()|_ setting, but broadly speaking, it can
-expected to be less than 10 KiB.
-
-As described for |pesi.pool()|_, the VDP uses a memory pool for
-nodes in its internal reconstruction of the ESI tree, if more are
-needed than are pre-allocated in workspace. The same mechanism is
-employed as Varnish's memory pools, so the same considerations apply
-to the configuration and monitoring of the pool.
-
-For each top-level ESI request using the VDP, two locks are employed;
-one to synchronize access to common data structures, and another to
-manage tasks being run in different threads. The VDP uses Varnish's
-mechanisms for implementing locks, so they can be observed with
-``LCK.*`` statistics.
-
-To summarize, the VDP makes use of the following resources:
-
-* Transient storage
-
-* threads from Varnish's thread pools
-
-* client workspace
-
-* the memory pool created for this VDP
-
-* locks
-
-These resources are configured as follows:
-
-.. _Storage Backend: https://varnish-cache.org/docs/trunk/reference/varnishd.html#storage-backend
-
-.. _Storage backends: https://varnish-cache.org/docs/trunk/users-guide/storage-backends.html
-
-.. _Varnish User's Guide: https://varnish-cache.org/docs/trunk/users-guide/index.html
-
-* A maximum size for Transient storage can be set with the ``-s``
-  command-line option for varnishd, using the name ``Transient`` for
-  the storage backend (see `Storage Backend`_ in `varnishd(1)`_, and
-  `Storage backends`_ in the `Varnish User's Guide`_).  If no storage
-  backend with the name ``Transient`` is specified, then Varnish uses
-  unlimited malloc storage for Transient. Set ``-sTransient`` to set
-  an upper bound.
-
-  Example::
-
-    varnishd -sTransient=malloc,500m
-
-* Thread pools are configured with the varnishd parameters
-  ``thread_pools``, ``thread_pool_min`` and ``thread_pool_max``, see
-  `varnishd(1)`_.
-
-  Example::
-
-    varnishd -p thread_pools=4 -p thread_pool_min=500 -p thread_pool_max=1000
-
-* Client workspace is configured with the varnishd parameter
-  ``workspace_client``, see `varnishd(1)`_. The VDP's use of client
-  workspace can be configured in part by using the
-  ``workspace_prealloc()`` function described above.
-
-  Example::
-
-    varnishd -p workspace_client=128k
-
-    # See also the examples for pesi.workspace_prealloc() above.
-
-* The VDP's memory pool is configured with the ``pool()`` function
-  described above.
-
-Statistics counters that are relevant to the resource usage of the VDP
-are:
-
-.. _varnish-counters(7): https://varnish-cache.org/docs/trunk/reference/varnish-counters.html
-
-.. _SMA: https://varnish-cache.org/docs/trunk/reference/varnish-counters.html#sma-malloc-stevedore-counters
-
-.. _LCK: https://varnish-cache.org/docs/trunk/reference/varnish-counters.html#lck-lock-counters
-
-.. _MEMPOOL: https://varnish-cache.org/docs/trunk/reference/varnish-counters.html#mempool-memory-pool-counters
-
-* ``SMA.Transient.*`` for the use of Transient storage, see the `SMA`_
-  section in `varnish-counters(7)`_.
-
-* ``MAIN.threads`` shows the current number of threads in all pools.
-  ``MAIN.threads_limited`` shows the number of times threads were
-  requested from the pools, but the limit imposed by
-  ``thread_pool_max`` was reached.  See `varnish-counters(7)`_.
-
-  You may also want to monitor ``MAIN.thread_queue_len``. This is the
-  length of the queue for sessions that are waiting for a thread so
-  that Varnish can accept new client connections -- a sign that thread
-  pools may be too small.
-
-* ``MAIN.ws_client_overflow`` shows the number of times client
-  workspace was exhausted (see `varnish-counters(7)`_). Workspace
-  overflow will also cause ``pesi.activate()`` to invoke VCL failure
-  (see `ERRORS`_).
-
-* The VDP adds custom counters ``LCK.pesi.buf.*`` and
-  ``LCK.pesi.tasks.*``, so that its locks may be monitored; see the
-  `LCK`_ section in `varnish-counters(7)`_.
-
-  Varnish since version 6.2.0 has the ``lck`` flag for the varnishd
-  parameter ``debug``. When the flag is set, the
-  ``LCK.pesi.*.dbg_busy`` counters are incremented when there is lock
-  contention, see `varnishd(1)`_.
-
-  Example::
-
-    varnishd -p debug=+lck
-
-* The VDP also adds the ``MEMPOOL.pesi.*`` counters, to monitor the
-  memory pool described in the documentation for ``pool()`` above.
-  See the `MEMPOOL`_ section in `varnish-counters(7)`_.
-
-  If the mempool routinely shows a relevant number of `live` objects,
-  consider increasing ``max_nodes`` via |pesi.workspace_prealloc()|_,
-  keeping in mind that prealloc requires free workspace, so adjusting
-  ``workspace_client`` might also be required.
-
-* The VDP adds another counter ``PESI.no_thread``, which is
-  incremented when ``set(thread, false)`` has been set as described
-  above, and an ESI subrequest had to be processed in serial (in the
-  same thread as for the including request), because no thread was
-  available from the thread pools.
-
-THREADS
-=======
-
-For parallel ESI to work as efficiently as possible, it traverses the
-ESI tree *breadth first* by default, processing any ESI object
-completely, with new threads scheduled for any includes encountered.
-
-Once the top ESI object is processed, available data from a subtree
-(an ESI object and anything below) can be sent to the client while
-processing of the remaining tree continues. As soon as ESI object
-processing is complete, the respective thread will be returned to the
-thread pool and become available for any other varnish task (except
-for the request for esi_level 0, which _has_ to wait for completion of
-the entire ESI request anyway and will send data to the client in the
-meantime).
-
-With the `thread`_ setting to ``true`` (the default), this is what
-happens. But a thread may not be immediately available if the thread
-pool is not sufficiently sized for the current load, and thus the
-include request may have to be queued.
-
-With the `thread`_ setting to ``false``, include processing happens in
-the same thread as if ``serial`` mode had been activated if there is
-no new thread immediately available. While this may sound like the
-more sensible option at first, we did not make this the default for
-the following reasons:
-
-* Before completion of ESI processing, the subtree below it is not yet
-  available for delivery to the client because additional VDPs behind
-  pesi cannot be called from a different thread.
-
-* While processing of the include may take an arbitrarily long time
-  (for example because it requires a lengthy backend fetch), we know
-  that the ESI object is fully available in the stevedore (and usually
-  in memory already) when we parse an include, because streaming is
-  not supported for ESI. So we know that completing the processing of
-  the current ESI object will be quick, while descending into a
-  subtree may be take a long time.
-
-* Except for ESI level 0, the current thread will become available as
-  soon as ESI processing has completed.
-
-* The thread herder may breed new threads and other threads may
-  terminate, so queuing a thread momentarily is not a bad thing per
-  se.
-
-In short, keeping the `thread`_ setting at the default ``true`` should
-be the right option, but the alternative exists just in case.
-
-
-LIMITATIONS
-===========
-
-As emphasized above, ``pesi.activate()`` must be called at all ESI
-levels if it is called at any ESI level (and equivalently, if ``pesi``
-is added by hand to ``resp.filters``, it must be present in
-``resp.filters`` at all ESI levels). This is similar to the fact that
-serial ESI processing in standard Varnish cannot be disabled in the
-"middle" of an ESI tree. If ``resp.do_esi`` is set to ``false`` (in
-VCL 4.1) after ESI processing has already begun, Varnish knows to
-ignore it, and ESI processing continues. But the pesi VDP is unable to
-check for this condition -- it can only operate at all if
-``activate()`` has been called (or ``pesi`` is present in
-``resp.filters``).
-
-If VDP pesi has been activated at ESI level 0 but not at another
-level, Varnish is likely to infer that standard serial ESI processing
-should be invoked for the subrequest. The standard ESI VDP and the
-pesi VDP are not compatible with one another, so that this situation
-is very likely to lead to a Varnish panic. There is nothing we can do
-to prevent that, other than urgently advise users to activate VDP pesi
-at all ESI levels, or not at all.
-
-.. _vsl(7): https://varnish-cache.org/docs/trunk/reference/vsl.html
-
-The size of the response body as reported by Varnish log records with
-the ``ReqAcct`` tag (see `vsl(7)`_) may be slightly different for
-different deliveries of the same ESI tree, even though the responses
-as viewed by a client are identical. This has to do with the way
-fragments in the response are transmitted on the wire to clients --
-chunked encoding for HTTP/1, and sequences of DATA frames for
-HTTP/2. The overhead for these transmission methods is included in the
-accounting of ``ReqAcct``. The "chunking" of the response may differ
-at different times, depending on the order of events, and on whether
-or not we use (partial) sequential delivery (for example, when no
-threads are available).
-
-REQUIREMENTS
-============
-
-All versions of the VDP require strict ABI compatibility with Varnish,
-meaning that it must run against the same build version of Varnish as
-the version against which the VDP was built. This means that the
-"commit id" portion of the Varnish version string (the SHA1 hash) must
-be the same at runtime as at build time.
-
-INSTALLATION
-============
-
-See `INSTALL.rst <INSTALL.rst>`_ in the source repository.
-
 ACKNOWLEDGEMENTS
 ================

@@ -843,28 +169,34 @@ The initial release to the public in 2021 has been supported by
 SUPPORT
 =======

-For community support, please use `Gitlab Issues`_.
+.. _gitlab.com issues: https://gitlab.com/uplex/varnish/libvdp-pesi/-/issues
+
+To report bugs, use `gitlab.com issues`_.
+
+For enquiries about professional service and support, please contact
+info@uplex.de\ .
+
+CONTRIBUTING
+============
+
+.. _merge requests on gitlab.com: https://gitlab.com/uplex/varnish/libvdp-pesi/-/merge_requests
+
+To contribute to the project, please use `merge requests on gitlab.com`_.

-For commercial support, please contact varnish-support@uplex.de
+To support the project's development and maintenance, there are
+several options:

-.. _Gitlab Issues: https://gitlab.com/uplex/varnish/libvdp-pesi/-/issues
+.. _paypal: https://www.paypal.com/donate/?hosted_button_id=BTA6YE2H5VSXA

-SEE ALSO
-========
+.. _github sponsor: https://github.com/sponsors/nigoroll

-.. |pesi.activate()| replace:: ``pesi.activate()``
-.. |pesi.set()| replace:: ``pesi.set()``
-.. |pesi.workspace_prealloc()| replace:: ``pesi.workspace_prealloc()``
-.. |pesi.pool()| replace:: ``pesi.pool()``
+* Donate money through `paypal`_. If you wish to receive a commercial
+  invoice, please add your details (address, email, any requirements
+  on the invoice text) to the message sent with your donation.

-.. _Content composition with Edge Side Includes: https://varnish-cache.org/docs/trunk/users-guide/esi.html
+* Become a `github sponsor`_.

-* `varnishd(1)`_
-* `vcl(7)`_
-* `varnishstat(1)`_
-* `varnish-counters(7)`_
-* `varnishadm(1)`_
-* `Content composition with Edge Side Includes`_ in the `Varnish User's Guide`_
+* Contact info@uplex.de to receive a commercial invoice for SWIFT payment.

 COPYRIGHT
 =========

--- a/src/vdp_pesi.vcc
+++ b/src/vdp_pesi.vcc
@@ -62,47 +62,6 @@ SYNOPSIS

 .. _varnishstat(1): https://varnish-cache.org/docs/trunk/reference/varnishstat.html

-TL;DR: QUICK START
-==================
-
-This documentation is detailed on purpose. It aims to explain well
-how this VMOD works and how optimizations can be tuned.
-
-We welcome all users to read the documentation, but many users will
-neither want to nor need to understand the details. Thus, here is what
-you *really* need to know:
-
-* See `INSTALL.rst <INSTALL.rst>`_ in the source repository for
-  installation instructions.
-
-* To use pESI, add to the top of your VCL::
-
-    import pesi;
-
-  and to your ``sub vcl_deliver {}``, add::
-
-    pesi.activate();
-
-  This should be added *after* any modification of ``resp.do_esi``,
-  ``req.http.Accept-Encoding``, ``req.http.Range`` or
-  ``resp.filters``, if these exist.
-
-  To be safe, ``pesi.activate()`` can be called before any
-  ``return(deliver)`` in ``sub vcl_deliver {}``.
-
-* If you call ``pesi.activate()``, call it unconditionally and on all
-  ESI levels. Read this documentation for details.
-
-It is possible that your current configuration of system resources,
-such as thread pools, workspaces, memory allocation and so forth, will
-suffice after this simple change, and will need no further
-optimization.
-
-But that is by no means ensured, since pESI uses system resources
-differently from standard ESI. Understanding these difference, and how
-to monitor and manage resource usage affected by pESI, is a main focus
-of the detailed discussion that follows.
-
 DESCRIPTION
 ===========

@@ -132,11 +91,11 @@ Parallel ESI processing is enabled by invoking |pesi.activate()|_ in
 ``vcl_deliver {}``::

   import pesi;
-   
+
   sub vcl_backend_response {
       set beresp.do_esi = true;
   }
-   
+
   sub vcl_deliver {
       pesi.activate();
   }
@@ -809,42 +768,6 @@ at different times, depending on the order of events, and on whether
 or not we use (partial) sequential delivery (for example, when no
 threads are available).

-REQUIREMENTS
-============
-
-All versions of the VDP require strict ABI compatibility with Varnish,
-meaning that it must run against the same build version of Varnish as
-the version against which the VDP was built. This means that the
-"commit id" portion of the Varnish version string (the SHA1 hash) must
-be the same at runtime as at build time.
-
-INSTALLATION
-============
-
-See `INSTALL.rst <INSTALL.rst>`_ in the source repository.
-
-ACKNOWLEDGEMENTS
-================
-
-.. _Otto GmbH & Co KG: https://www.otto.de/
-
-Most of the development work of this vmod in 2019 and 2020 has been
-sponsored by `Otto GmbH & Co KG`_.
-
-.. _BoardGameGeek: https://boardgamegeek.com/
-
-The initial release to the public in 2021 has been supported by
-`BoardGameGeek`_.
-
-SUPPORT
-=======
-
-For community support, please use `Gitlab Issues`_.
-
-For commercial support, please contact varnish-support@uplex.de
-
-.. _Gitlab Issues: https://gitlab.com/uplex/varnish/libvdp-pesi/-/issues
-
 SEE ALSO
 ========


--- a/src/vmod_pesi.man.rst
+++ b/src/vmod_pesi.man.rst
+..
+.. NB:  This file is machine generated, DO NOT EDIT!
+..
+.. Edit ./vdp_pesi.vcc and run make instead
+..
+
+.. role:: ref(emphasis)
+
+=========
+vmod_pesi
+=========
+
+----------------------------------------------------
+Varnish Delivery Processor for parallel ESI includes
+----------------------------------------------------
+
+:Manual section: 3
+
+SYNOPSIS
+========
+
+::
+
+  import pesi;
+
+  # Enable parallel ESI processing in vcl_deliver {}.
+  VOID pesi.activate()
+
+  # Set a boolean configuration parameter.
+  VOID pesi.set(ENUM, BOOL)
+
+  # Configure workspace pre-allocation for internal variable-sized
+  # data structures.
+  VOID pesi.workspace_prealloc(BYTES min_free, INT max_nodes)
+
+  # Configure the memory pool used when pre-allocated structures
+  # from the workspace are insufficient.
+  VOID pesi.pool(INT min, INT max, DURATION max_age)
+
+  # VDP version
+  STRING pesi.version()
+
+.. _varnishd(1): https://varnish-cache.org/docs/trunk/reference/varnishd.html
+
+.. _vcl(7): https://varnish-cache.org/docs/trunk/reference/vcl.html
+
+.. _varnishadm(1): https://varnish-cache.org/docs/trunk/reference/varnishadm.html
+
+.. _varnishstat(1): https://varnish-cache.org/docs/trunk/reference/varnishstat.html
+
+DESCRIPTION
+===========
+
+.. _standard ESI processing: https://varnish-cache.org/docs/trunk/users-guide/esi.html
+
+VDP pesi is a Varnish Delivery Processor for parallel Edge Side
+Includes (ESI). The VDP implements content composition in client
+responses as specified by ``<esi>`` directives in the response body,
+just as Varnish does with its `standard ESI processing`_. While
+standard Varnish processes ESI subrequests serially, in the order in
+which the ``<esi>`` directives appear in the response, the pesi VDP
+executes the subrequests in parallel. This can lead to a significant
+reduction in latency for the complete response, if Varnish has to wait
+for backend fetches for more than one of the included requests.
+
+Backend applications that use ESI includes for standard Varnish can be
+expected to work without changes with the VDP, provided that they do
+not depend on assumptions about the serialization of ESI subrequests.
+Serial ESI requests are processed in a predictable order, one after
+the other, but the pesi VDP executes them at roughly the same time. A
+backend may conceivably receive a request forwarded for the second
+include in a response before the first one. If the logic of ESI
+composition in a standard Varnish deployment does not depend on the
+serial order, then it will work the same way with VDP pesi.
+
+Parallel ESI processing is enabled by invoking |pesi.activate()|_ in
+``vcl_deliver {}``::
+
+   import pesi;
+
+   sub vcl_backend_response {
+       set beresp.do_esi = true;
+   }
+
+   sub vcl_deliver {
+       pesi.activate();
+   }
+
+Other functions provided by the VDP serve to set configuration
+parameters (or return the VDP version string). If your deployment uses
+the default configuration, then |pesi.activate()|_ in ``vcl_deliver``
+may be the only modification to VCL that you need.
+
+The invocation of |pesi.activate()|_ can of course be subject to
+logic in VCL::
+
+   sub vcl_deliver {
+       # Use parallel ESI only if the request header X-PESI is present.
+       if (req.http.X-PESI) {
+           pesi.activate();
+       }
+   }
+
+But see below for restrictions on the use of |pesi.activate()|_.
+
+All of the computing resources used by the pesi VDP -- threads, storage,
+workspace, locks, and so on -- can be configured, either with Varnish
+runtime parameters or configuration settings made available by the
+pesi VDP. And their usage can be monitored with Varnish statistics. So you
+can limit resource usage, and use monitoring tools such as
+`varnishstat(1)`_ to ensure efficient parallel ESI processing. For
+details see `RESOURCE USAGE, CONFIGURATION AND MONITORING`_ below.
+
+.. _pesi.activate():
+
+VOID activate()
+---------------
+
+Enable parallel ESI processing for the client response.
+
+``pesi.activate()`` MUST be called in ``vcl_deliver {}`` only. If it is
+called in any other VCL subroutine, VCL failure is invoked (see
+`ERRORS`_ below for details).
+
+If ``pesi.activate()`` is called on *any* ESI level (any depth of include
+nesting), then it MUST be called on *all* levels of the response. If
+``pesi.activate()`` is invoked at some ESI levels but not others, then the
+results are undefined, and will very likely lead to a Varnish panic.
+
+It is also safe, for instance, to call ``pesi.activate()`` only if a
+request header is present, as in the example shown above; since the
+same request headers are set for every ESI subrequest, the result is
+the same at every ESI level. But that should *not* be done if you have
+logic that unsets the header at some ESI levels but not at
+others. Under no circumstances should the invocation of ``pesi.activate()``
+depend on the value of ``req.esi_level``, or on ``req.url`` (since
+URLs are different at different ESI levels).
+
+See |pesi.set()|_ below for a way to choose serial
+ESI processing for all of the includes in the response at the current
+ESI level. Even then, ``pesi.activate()`` must be called in ``vcl_deliver
+{}`` in addition to ``pesi.set()``.
+
+As with standard Varnish, ESI processing can be selectively disabled
+for a client response, by setting ``resp.do_esi`` to ``false`` in VCL
+since version 4.1, or setting ``req.esi`` to ``false`` in VCL 4.0 (see
+`vcl(7)`_). The requirement remains: if ESI processing is enabled and
+``pesi.activate()`` is called at any ESI level, then both must happen at
+all levels.
+
+``pesi.activate()`` has the effect of setting the VCL string variable
+``resp.filters``, which is a whitespace-separated list of the names of
+delivery processors to be applied to the client response (see
+`vcl(7)`_). It configures the correct list of filters for the current
+response, analogous to the default filter settings in Varnish when
+sequential ESI is in use. These include the ``gunzip`` VDP for
+uncompressed responses, and ``range`` for responses to range
+requests. ``pesi.activate()`` checks the conditions for which the VDPs are
+required, and arranges them in the correct order.
+
+It is possible to manually set or change ``resp.filters`` to enable
+parallel ESI, instead of calling ``pesi.activate()``, but that is only
+advised to experts. If you do so, use the string ``pesi`` for this
+VDP, and do *not* include ``esi``, for Varnish's standard ESI VDP, in
+the same list with ``pesi``. As with the ``pesi.activate()`` call -- if
+``pesi`` appears in ``resp.filters`` for a response at *any* ESI
+level, it MUST be in ``resp.filters`` at *all* ESI levels.
+
+Notice that all VCL code affecting ESI (such as setting
+``resp.do_esi``), gzip (such as changes to
+``req.http.Accept-Encoding``) or range processing (such as changes
+``req.http.Range``) must execute before this function is called to
+have an effect.
+
+Example::
+
+  vcl 4.1;
+
+  import pesi;
+
+  sub vcl_recv {
+      # Disable gzipped responses by removing Accept-Encoding.
+      unset req.http.Accept-Encoding;
+  }
+
+  sub vcl_backend_response {
+      set beresp.do_esi = true;
+  }
+
+  sub vcl_deliver {
+      # If the request header X-Debug-ESI is present, then disable ESI
+      # for the current response.
+      if (req.http.X-Debug-ESI) {
+          set resp.do_esi = false;
+      }
+      pesi.activate()
+  }
+
+.. _pesi.set():
+
+VOID set(ENUM {serial, thread} parameter, [BOOL bool])
+------------------------------------------------------
+
+Set a configuration parameter for the VDP, which holds for the current
+(sub)request, as documented below. The parameter to be set is
+identified by the ENUM ``parameter``. Currently the parameters can
+only be set with a boolean value in ``bool`` (but future versions of
+this function may allow for setting other data types).
+
+``pesi.set()`` MUST be called in ``vcl_deliver {}`` only; otherwise VCL
+failure is invoked (see `ERRORS`_).
+
+The parameters that can be set are currently ``serial`` and ``thread``:
+
+``serial``
+----------
+
+Activates serial mode if ``bool`` is ``true``; default is ``false``.
+
+In serial mode, the ESI subrequests processed for includes in the
+current response body are processed in serial, in the current thread.
+In other words, all ESI subrequests at the next level will be
+processed without requesting threads from the thread pool (which
+potentially starts new threads, if necessary). This setting only
+affects include processing at the current ESI level, not nested
+includes at the next level.
+
+It is strongly recommended to *not* use serial mode from ESI level 0
+(the top level request received from a client), because the ESI level
+0 thread can send available data to the client concurrently with other
+parallel ESI threads.
+
+Serial mode may sensibly be used to reduce overhead and the number of
+threads required without relevant drawbacks
+
+* at ESI level > 0 _and_
+
+* when the VCL author knows that all objects included by the current
+  request are cacheable, and thus are highly likely to lead to cache
+  hits.
+
+Example::
+
+  # Activate serial mode at ESI level > 0, if we know that all includes
+  # in the response at this level lead to cacheable responses.
+
+  sub vcl_deliver {
+      pesi.activate();
+      if (req.esi_level > 0 && req.url ~ "^/all/cacheable/includes") {
+          pesi.set(serial, true);
+      }
+  }
+
+.. _thread:
+
+``thread``
+----------
+
+Whether we always request a new thread for includes, default is
+``true``.
+
+* ``false``
+
+  Only use a new thread if immediately available, process the include
+  in the same thread otherwise.
+
+* ``true``
+
+  Request a new thread, potentially waiting for one to become
+  available.
+
+See the detailled discussion in `THREADS`_ for details.
+
+.. _pesi.workspace_prealloc():
+
+VOID workspace_prealloc(BYTES min_free, INT max_nodes)
+------------------------------------------------------
+
+::
+
+   VOID workspace_prealloc(BYTES min_free=4096, INT max_nodes=32)
+
+Configure the maximum amount of workspace used for pesi internal data
+structures.
+
+The pesi VDP builds a structure, whose size is roughly proportional to
+the size of the ESI tree -- the conceptual tree with the top-level
+response at the root, and its includes and all of their nested
+includes as branches. The nodes in this structure have a fixed size,
+but the number of nodes used by the VDP varies with the size of the
+ESI tree.
+
+For each (sub)request, the VDP pre-allocates a constant number of such
+nodes in client workspace, and initially uses the pre-allocation for
+child nodes of that (sub)request. If more are needed, they are
+obtained from a global memory pool as described below. The use of
+pre-allocated nodes from workspace is preferred, since it never
+requires new system memory allocations (workspaces themselves are
+pre-allocated by Varnish), and because they are local to each request,
+so locking is never required to access them (but is required for the
+memory pool).
+
+The pre-allocation only uses workspace available after ``vcl_deliver
+{}`` returns, keeping at least ``min_free`` bytes free, if
+possible. Thus, the number of nodes configured by ``max_nodes`` may
+not actually be available, unless the ``workspace_client`` parameter
+is set sufficiently high.
+
+``pesi.workspace_prealloc()`` configures the pre-allocation. The default
+values of its parameters are defaults used by the VDP; that is, the
+configuration if ``pesi.workspace_prealloc()`` is never called.
+
+The ``min_free`` parameter sets the minimum amount of space that the
+pre-allocation will always leave free in client workspace; if the
+targeted number of pre-allocated nodes would result in less free space
+than ``min_free`` bytes in workspace, then fewer nodes are
+allocated. This ensures that free workspace is always left over for
+other VMODs, VCL usage, and so forth. Note that most of the operations
+typically requiring workspace have already finished when VDP pesi
+makes the pre-allocation, because it starts after `vcl_deliver
+{}`. Thus, the reservation is mostly for other VDPs and VMODs using
+`PRIV_TOP`. ``min_free`` defaults to 4 KiB.
+
+If other VDPs or VMODs using `PRIV_TOP` report workspace overflows,
+``min_free`` should be increased.
+
+The ``max_nodes`` parameter sets the number of nodes to be allocated,
+unless the limit imposed by ``min_free`` is exceeded; ``max_nodes``
+defaults to 32. ``max_nodes`` MUST be >= 0; otherwise, VCL failure is
+invoked (see `ERRORS`_). If ``max_nodes`` is set to 0, then no nodes
+are pre-allocated; they are all taken from the memory pool described
+below.
+
+Ideally, ``max_nodes`` matches the number of includes any one ESI
+object can have plus the number of fragments before, after and
+in between the includes. For all practical purposes, ``max_nodes``
+should match twice the number of expected ESI includes. However, if
+the number of ESI includes across objects varies substantially, it
+might be better to use less memory and set ``max_nodes`` according to
+the number of includes of a typical object, so that objects with
+more includes use the memory pool.
+
+When ``pesi.workspace_prealloc()`` is called, its configuration becomes
+effective immediately for all new requests processed by the VDP. The
+configuration remains valid for all instances of VCL, for as long as
+the VDP remains loaded; that is, until the last instance of VCL using
+the VDP is discarded.
+
+``pesi.workspace_prealloc()`` can be called in ``vcl_init`` to set the
+configuration at VCL load time.  But you can also write VCL that calls
+the function when a request is received by Varnish, for example using
+a special URL for system administrators. This is similar to using the
+``param.set`` command for `varnishadm(1)`_ to change a Varnish
+parameter at runtime. Such a request should be protected, for example
+with an ACL and/or Basic Authentication, so that it can be invoked
+only by admins. Remember that as soon as such a request is processed
+and ``pesi.workspace_prealloc()`` is executed, the changed configuration is
+globally valid.
+
+Examples::
+
+  # Configure workspace pre-allocation at VCL load time.
+  sub vcl_init {
+      pesi.workspace_prealloc(min_free=8k, max_nodes=64);
+  }
+  
+  # Change the configuration at runtime, when Varnish receives an
+  # admin request.
+  import pesi;
+  import std;
+  
+  sub vcl_recv {
+      if (req.url ~ "^/admin/pesi_ws") {
+  
+          # Reject the request with "403 Forbidden" unless the client
+          # IP matches an ACL for admin requests.
+          if (client.ip !~ admin_acl) {
+              return (synth(403));
+          }
+  
+          # Set min_free from a GET parameter, if present.
+          if (req.url ~ "\bmin_free=\d+[kmgtp]?") {
+              # Extract the BYTES parameter.
+              set req.http.Tmp-Bytes
+                  = regsub(req.url, "^.+\bmin_free=(\d+[kmgtp]?).*$", "\1");
+              pesi.workspace_prealloc(std.bytes(req.http.Tmp-Bytes));
+          }
+  
+          # Set max_nodes from a GET parameter.
+          if (req.url ~ "\bmax_nodes=\d+") {
+              # Extract the INT parameter.
+              set req.http.Tmp-Nodes
+                  = regsub(req.url, "^.+\bmax_nodes=(\d+).*$", "\1");
+              pesi.workspace_prealloc(max_nodes=std.integer(req.http.Tmp-Nodes));
+          }
+  
+          # Return status 204 to indicate success.
+          return (synth(204));
+      }
+  }
+
+.. _pesi.pool():
+
+VOID pool(INT min=10, INT max=100, DURATION max_age=10)
+-------------------------------------------------------
+
+Configure the memory pool used by the VDP for internal variable-sized
+data structures, when more is needed than is provided by the client
+workspace pre-allocation described above. The objects in the memory
+pool are the nodes used in structures whose size is proportional to
+the size of the ESI tree, as discussed above.
+
+The VDP uses the same mechanism that Varnish uses for its memory
+pools, and the configuration values have the same meaning and defaults
+as the Varnish runtime parameters ``pool_req``, ``pool_sess`` and
+``pool_vbo`` (see `varnishd(1)`_). ``min`` and ``max`` control the
+size of the pool -- the number of pre-allocated nodes available for
+allocation requests. ``max_age`` is the maximum lifetime for nodes in
+the pool -- when there are no pending allocation requests, nodes in
+the pool that are older than ``max_age`` are freed, down to the limit
+imposed by ``min``.
+
+The values of the parameters MUST fulfill the following requirements,
+otherwise VCL failure is invoked (see `ERRORS`_):
+
+* ``min`` and ``max`` MUST be both > 0.
+
+* ``max`` MUST be >= ``min``.
+
+* ``max_age`` MUST be >= 0s (and <= one million seconds).
+
+Note that ``max`` is a soft limit. The memory pool satisfies all
+allocation requests, even if ``max`` is execeeded when nodes are
+returned to the pool. But the pool size will then be reduced to
+``max``, without waiting for ``max_age`` to expire.
+
+As with |pesi.workspace_prealloc()|_: when ``pesi.pool()`` is called, the
+changed configuration immediately becomes valid (although it may take
+some time for the memory pool to adjust to the new values). It remains
+vaild for as long as the VDP is still loaded, unless ``pesi.pool()`` is
+called again. ``pesi.pool()`` may be called in ``vcl_init`` to set a
+configuration at VCL load time, but may also be called elsewhere in
+VCL, for example to enable changing configurations at runtime using a
+special "admin" request.
+
+Examples::
+
+  # Configure the memory pool at VCL load time.
+  sub vcl_init {
+      pesi.pool(min=50, max=500, max_age=30s);
+  }
+  
+  # Change the configuration at runtime, when Varnish receives an
+  # admin request.
+  import pesi;
+  import std;
+  
+  sub vcl_recv {
+      if (req.url ~ "^/admin/pesi_pool") {
+  
+          # Protect the call with an ACL, as in the example above.
+          if (client.ip !~ admin_acl) {
+              return (synth(403));
+          }
+
+          # Set max_age from a GET parameter.
+          if (req.url ~ "\bmax_age=\d+(\.\d+)?(ms|s|m|h|d|w|y)") {
+              # Extract the DURATION parameter.
+              set req.http.Tmp-Duration
+                  = regsub(req.url,
+                           "^.\bmax_age=(\d+(?:\.\d+)?(?:ms|s|m|h|d|w|y))+.*$",
+                           "\1");
+              pesi.pool(max_age=std.duration(req.http.Tmp-Duration));
+          }
+
+          # Set min from a GET parameter.
+          if (req.url ~ "\bmin=\d+") {
+              # Extract the INT parameter.
+              set req.http.Tmp-Min = regsub(req.url, "^.+\bmin=(\d+).*$", "\1");
+              pesi.pool(min=std.integer(req.http.Tmp-Min));
+          }
+
+          # Extract max from a GET parameter, the same way as for min,
+          # not repeated here ...
+
+          # Status 204 indicates success.
+          return (synth(204));
+      }
+  }
+
+.. _pesi.version():
+
+STRING version()
+----------------
+
+Return the version string for this VDP.
+
+Example::
+
+  std.log("Using VDP pesi version: " + pesi.version());
+
+ERRORS
+======
+
+As documented above, VCL failure is invoked under some of the error
+conditions for functions provided by the VDP. VCL failure has the same
+results as if ``return(fail)`` is called from a VCL subroutine:
+
+* If the failure occurs in ``vcl_init``, then the VCL load fails with
+  an error message.
+
+* If the failure occurs in any other subroutine besides ``vcl_synth``,
+  then a ``VCL_Error`` message is written to the log, and control is
+  directed immediately to ``vcl_synth``, with ``resp.status`` set to
+  503 and ``resp.reason`` set to ``"VCL failed"``.
+
+* If the failure occurs in ``vcl_synth``, then ``vcl_synth`` is
+  aborted, and the response line "503 VCL failed" is sent.
+
+RESOURCE USAGE, CONFIGURATION AND MONITORING
+============================================
+
+.. _Transient storage allocator: https://varnish-cache.org/docs/trunk/users-guide/storage-backends.html#transient-storage
+
+To understand the way computing resources are used by the VDP, and
+thus how they can be configured and monitored, first note that
+response bodies returned for ESI subrequests running in parallel may
+have to be buffered. Consider a response body with two
+``<esi:include>`` directives, both of which lead to parallel backend
+fetches, and the second fetch is finished before the first one. The
+second response body must be retained while Varnish waits for the
+first one, since the contents of the top-level client response must be
+delivered in correct order.
+
+If the second response is added to the cache, then buffering is not
+necessary, because it can be retrieved from the cache (this is also
+true if it was a cache hit in the first place). But an uncacheable
+response must be buffered, until its contents are delivered. The VDP
+uses Varnish's `Transient storage allocator`_ for this
+purpose. Transient storage only needs to be used while the VDP is
+waiting to deliver response contents; space is returned as soon as the
+contents have been sent. The amount of Transient storage needed
+depends on the size of all uncacheable included responses being
+processed at any one time.
+
+The VDP runs ESI subrequests (for each ``<esi:include>`` directive at
+every ESI level) in separate threads, unless instructed not to do so
+due to the use of either ``pesi.set(serial, true)`` or ``pesi.set(thread,
+false)``, as documented above. The threads are requested from the
+thread pools managed by Varnish. This means that in most cases, for
+well-configured thread pools, the overhead of starting new threads is
+not incurred during request processing -- the VDP obtains a thread
+that is immediately ready for use.
+
+The VDP uses client workspace at the top-level request (ESI level 0)
+for fixed-sized internal metadata. It also uses client workspace to
+pre-allocate a constant number of nodes in variable-sized structures,
+as described in |pesi.workspace_prealloc()|_ above.  Together these
+make for a fixed-sized demand on client workspace, when
+|pesi.activate()|_ is invoked. The size of the space needed from
+workspace varies on different systems, and depends on
+|pesi.workspace_prealloc()|_ setting, but broadly speaking, it can
+expected to be less than 10 KiB.
+
+As described for |pesi.pool()|_, the VDP uses a memory pool for
+nodes in its internal reconstruction of the ESI tree, if more are
+needed than are pre-allocated in workspace. The same mechanism is
+employed as Varnish's memory pools, so the same considerations apply
+to the configuration and monitoring of the pool.
+
+For each top-level ESI request using the VDP, two locks are employed;
+one to synchronize access to common data structures, and another to
+manage tasks being run in different threads. The VDP uses Varnish's
+mechanisms for implementing locks, so they can be observed with
+``LCK.*`` statistics.
+
+To summarize, the VDP makes use of the following resources:
+
+* Transient storage
+
+* threads from Varnish's thread pools
+
+* client workspace
+
+* the memory pool created for this VDP
+
+* locks
+
+These resources are configured as follows:
+
+.. _Storage Backend: https://varnish-cache.org/docs/trunk/reference/varnishd.html#storage-backend
+
+.. _Storage backends: https://varnish-cache.org/docs/trunk/users-guide/storage-backends.html
+
+.. _Varnish User's Guide: https://varnish-cache.org/docs/trunk/users-guide/index.html
+
+* A maximum size for Transient storage can be set with the ``-s``
+  command-line option for varnishd, using the name ``Transient`` for
+  the storage backend (see `Storage Backend`_ in `varnishd(1)`_, and
+  `Storage backends`_ in the `Varnish User's Guide`_).  If no storage
+  backend with the name ``Transient`` is specified, then Varnish uses
+  unlimited malloc storage for Transient. Set ``-sTransient`` to set
+  an upper bound.
+
+  Example::
+
+    varnishd -sTransient=malloc,500m
+
+* Thread pools are configured with the varnishd parameters
+  ``thread_pools``, ``thread_pool_min`` and ``thread_pool_max``, see
+  `varnishd(1)`_.
+
+  Example::
+
+    varnishd -p thread_pools=4 -p thread_pool_min=500 -p thread_pool_max=1000
+
+* Client workspace is configured with the varnishd parameter
+  ``workspace_client``, see `varnishd(1)`_. The VDP's use of client
+  workspace can be configured in part by using the
+  ``workspace_prealloc()`` function described above.
+
+  Example::
+
+    varnishd -p workspace_client=128k
+
+    # See also the examples for pesi.workspace_prealloc() above.
+
+* The VDP's memory pool is configured with the ``pool()`` function
+  described above.
+
+Statistics counters that are relevant to the resource usage of the VDP
+are:
+
+.. _varnish-counters(7): https://varnish-cache.org/docs/trunk/reference/varnish-counters.html
+
+.. _SMA: https://varnish-cache.org/docs/trunk/reference/varnish-counters.html#sma-malloc-stevedore-counters
+
+.. _LCK: https://varnish-cache.org/docs/trunk/reference/varnish-counters.html#lck-lock-counters
+
+.. _MEMPOOL: https://varnish-cache.org/docs/trunk/reference/varnish-counters.html#mempool-memory-pool-counters
+
+* ``SMA.Transient.*`` for the use of Transient storage, see the `SMA`_
+  section in `varnish-counters(7)`_.
+
+* ``MAIN.threads`` shows the current number of threads in all pools.
+  ``MAIN.threads_limited`` shows the number of times threads were
+  requested from the pools, but the limit imposed by
+  ``thread_pool_max`` was reached.  See `varnish-counters(7)`_.
+
+  You may also want to monitor ``MAIN.thread_queue_len``. This is the
+  length of the queue for sessions that are waiting for a thread so
+  that Varnish can accept new client connections -- a sign that thread
+  pools may be too small.
+
+* ``MAIN.ws_client_overflow`` shows the number of times client
+  workspace was exhausted (see `varnish-counters(7)`_). Workspace
+  overflow will also cause ``pesi.activate()`` to invoke VCL failure
+  (see `ERRORS`_).
+
+* The VDP adds custom counters ``LCK.pesi.buf.*`` and
+  ``LCK.pesi.tasks.*``, so that its locks may be monitored; see the
+  `LCK`_ section in `varnish-counters(7)`_.
+
+  Varnish since version 6.2.0 has the ``lck`` flag for the varnishd
+  parameter ``debug``. When the flag is set, the
+  ``LCK.pesi.*.dbg_busy`` counters are incremented when there is lock
+  contention, see `varnishd(1)`_.
+
+  Example::
+
+    varnishd -p debug=+lck
+
+* The VDP also adds the ``MEMPOOL.pesi.*`` counters, to monitor the
+  memory pool described in the documentation for ``pool()`` above.
+  See the `MEMPOOL`_ section in `varnish-counters(7)`_.
+
+  If the mempool routinely shows a relevant number of `live` objects,
+  consider increasing ``max_nodes`` via |pesi.workspace_prealloc()|_,
+  keeping in mind that prealloc requires free workspace, so adjusting
+  ``workspace_client`` might also be required.
+
+* The VDP adds another counter ``PESI.no_thread``, which is
+  incremented when ``set(thread, false)`` has been set as described
+  above, and an ESI subrequest had to be processed in serial (in the
+  same thread as for the including request), because no thread was
+  available from the thread pools.
+
+THREADS
+=======
+
+For parallel ESI to work as efficiently as possible, it traverses the
+ESI tree *breadth first* by default, processing any ESI object
+completely, with new threads scheduled for any includes encountered.
+
+Once the top ESI object is processed, available data from a subtree
+(an ESI object and anything below) can be sent to the client while
+processing of the remaining tree continues. As soon as ESI object
+processing is complete, the respective thread will be returned to the
+thread pool and become available for any other varnish task (except
+for the request for esi_level 0, which _has_ to wait for completion of
+the entire ESI request anyway and will send data to the client in the
+meantime).
+
+With the `thread`_ setting to ``true`` (the default), this is what
+happens. But a thread may not be immediately available if the thread
+pool is not sufficiently sized for the current load, and thus the
+include request may have to be queued.
+
+With the `thread`_ setting to ``false``, include processing happens in
+the same thread as if ``serial`` mode had been activated if there is
+no new thread immediately available. While this may sound like the
+more sensible option at first, we did not make this the default for
+the following reasons:
+
+* Before completion of ESI processing, the subtree below it is not yet
+  available for delivery to the client because additional VDPs behind
+  pesi cannot be called from a different thread.
+
+* While processing of the include may take an arbitrarily long time
+  (for example because it requires a lengthy backend fetch), we know
+  that the ESI object is fully available in the stevedore (and usually
+  in memory already) when we parse an include, because streaming is
+  not supported for ESI. So we know that completing the processing of
+  the current ESI object will be quick, while descending into a
+  subtree may be take a long time.
+
+* Except for ESI level 0, the current thread will become available as
+  soon as ESI processing has completed.
+
+* The thread herder may breed new threads and other threads may
+  terminate, so queuing a thread momentarily is not a bad thing per
+  se.
+
+In short, keeping the `thread`_ setting at the default ``true`` should
+be the right option, but the alternative exists just in case.
+
+
+LIMITATIONS
+===========
+
+As emphasized above, ``pesi.activate()`` must be called at all ESI
+levels if it is called at any ESI level (and equivalently, if ``pesi``
+is added by hand to ``resp.filters``, it must be present in
+``resp.filters`` at all ESI levels). This is similar to the fact that
+serial ESI processing in standard Varnish cannot be disabled in the
+"middle" of an ESI tree. If ``resp.do_esi`` is set to ``false`` (in
+VCL 4.1) after ESI processing has already begun, Varnish knows to
+ignore it, and ESI processing continues. But the pesi VDP is unable to
+check for this condition -- it can only operate at all if
+``activate()`` has been called (or ``pesi`` is present in
+``resp.filters``).
+
+If VDP pesi has been activated at ESI level 0 but not at another
+level, Varnish is likely to infer that standard serial ESI processing
+should be invoked for the subrequest. The standard ESI VDP and the
+pesi VDP are not compatible with one another, so that this situation
+is very likely to lead to a Varnish panic. There is nothing we can do
+to prevent that, other than urgently advise users to activate VDP pesi
+at all ESI levels, or not at all.
+
+.. _vsl(7): https://varnish-cache.org/docs/trunk/reference/vsl.html
+
+The size of the response body as reported by Varnish log records with
+the ``ReqAcct`` tag (see `vsl(7)`_) may be slightly different for
+different deliveries of the same ESI tree, even though the responses
+as viewed by a client are identical. This has to do with the way
+fragments in the response are transmitted on the wire to clients --
+chunked encoding for HTTP/1, and sequences of DATA frames for
+HTTP/2. The overhead for these transmission methods is included in the
+accounting of ``ReqAcct``. The "chunking" of the response may differ
+at different times, depending on the order of events, and on whether
+or not we use (partial) sequential delivery (for example, when no
+threads are available).
+
+SEE ALSO
+========
+
+.. |pesi.activate()| replace:: ``pesi.activate()``
+.. |pesi.set()| replace:: ``pesi.set()``
+.. |pesi.workspace_prealloc()| replace:: ``pesi.workspace_prealloc()``
+.. |pesi.pool()| replace:: ``pesi.pool()``
+
+.. _Content composition with Edge Side Includes: https://varnish-cache.org/docs/trunk/users-guide/esi.html
+
+* `varnishd(1)`_
+* `vcl(7)`_
+* `varnishstat(1)`_
+* `varnish-counters(7)`_
+* `varnishadm(1)`_
+* `Content composition with Edge Side Includes`_ in the `Varnish User's Guide`_
+
+COPYRIGHT
+=========
+
+::
+
+  Copyright 2019 - 2021 UPLEX Nils Goroll Systemoptimierung
+  All rights reserved
+ 
+  Authors: Geoffrey Simmons <geoffrey.simmons@uplex.de>
+           Nils Goroll <nils.goroll@uplex.de>
+ 
+  Redistribution and use in source and binary forms, with or without
+  modification, are permitted provided that the following conditions
+  are met:
+  1. Redistributions of source code must retain the above copyright
+     notice, this list of conditions and the following disclaimer.
+  2. Redistributions in binary form must reproduce the above copyright
+     notice, this list of conditions and the following disclaimer in the
+     documentation and/or other materials provided with the distribution.
+ 
+  THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND
+  ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+  IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
+  ARE DISCLAIMED.  IN NO EVENT SHALL AUTHOR OR CONTRIBUTORS BE LIABLE
+  FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
+  DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
+  OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
+  HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
+  LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
+  OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
+  SUCH DAMAGE.