Add lru_exponent parameter

parent 9342b7c6
......@@ -21,6 +21,11 @@ fellow
.. https://gitlab.com/uplex/varnish/slash/-/commit/
* To cater for massively parallel systems with dozens of CPUs, the
parameter ``lru_exponent`` has been introduced to scale the number
of LRU lists (and corresponding eviction threads) between 1 and 64
(corresponding to ``lru_exponent = 0`` to ``lru_exponent = 6``).
* The allocation policy for disk regions has been improved. This
should reduce fragmentation and pressure on LRU as well as improve
response times (`a0e8e8f779f4ad8569ccc9c3b7eaee08dc79cfa4`_).
......
......@@ -35,6 +35,12 @@ recommendations for optimal fellow storage performance
Note that a fellow storage using any of the `xxhash`_ hashes can
only be loaded by an instance with `xxhash`_ support compiled in.
* On big systems with many CPUs, ``lru_exponent`` can be tuned to
achieve maximum performance with hundreds of thousands of requests per
second.
Reasonable values are yet to be determined experimentally.
compiling
~~~~~~~~~
......
......@@ -692,7 +692,6 @@ struct fellow_busy {
struct fellow_cache_lrus {
unsigned magic;
#define FELLOW_CACHE_LRUS_MAGIC 0xadad56fb
uint8_t exponent;
pthread_mutex_t mtx;
struct fellow_cache_lru *lru[1 << MAX_NLRU_EXPONENT];
};
......@@ -750,6 +749,7 @@ fellow_cache_get_lru(struct fellow_cache *fc, uint64_t n)
{
struct fellow_cache_lrus *lrus;
struct fellow_cache_lru *lru;
struct stvfe_tune *tune;
uint8_t exponent;
pthread_t thr;
size_t i;
......@@ -757,8 +757,10 @@ fellow_cache_get_lru(struct fellow_cache *fc, uint64_t n)
CHECK_OBJ_NOTNULL(fc, FELLOW_CACHE_MAGIC);
lrus = fc->lrus;
CHECK_OBJ_NOTNULL(lrus, FELLOW_CACHE_LRUS_MAGIC);
tune = fc->tune;
CHECK_OBJ_NOTNULL(tune, STVFE_TUNE_MAGIC);
exponent = lrus->exponent;
exponent = tune->lru_exponent;
assert(exponent <= MAX_NLRU_EXPONENT);
i = exponent ? fib(n, exponent) : 0;
......
......@@ -83,6 +83,7 @@ stvfe_tune_check(struct stvfe_tune *tune)
}
sz = tune->memsz >> (tune->chunk_exponent + 3);
sz >>= tune->lru_exponent;
assert(sz <= UINT_MAX);
l = (unsigned)sz;
if (tune->mem_reserve_chunks > l) {
......
......@@ -42,6 +42,7 @@ TUNE(float, log_rewrite_ratio, 0.5, 0.001, FLT_MAX);
// reserve chunk is the larger of chunk_exponent and result from logbuffer size
TUNE(unsigned, chunk_exponent, 20 /* 1MB*/, 12 /* 4KB */, 30 /* 1GB */);
TUNE(uint8_t, wait_table_exponent, 10, 6, 32);
TUNE(uint8_t, lru_exponent, 0, 0, 6);
TUNE(unsigned, dsk_reserve_chunks, 4, 2, UINT_MAX);
TUNE(unsigned, mem_reserve_chunks, 1, 0, UINT_MAX);
TUNE(size_t, objsize_hint, 256 * 1024, 4096, SIZE_MAX);
......
......@@ -481,8 +481,8 @@ will be used (which might fail of insufficient memory is available).
.. _xfellow.tune():
STRING xfellow.tune([INT logbuffer_size], [DURATION logbuffer_flush_interval], [REAL log_rewrite_ratio], [INT chunk_exponent], [BYTES chunk_bytes], [INT wait_table_exponent], [INT dsk_reserve_chunks], [INT mem_reserve_chunks], [BYTES objsize_hint], [BYTES objsize_max], [INT cram], [INT readahead], [BYTES discard_immediate], [INT io_batch_min], [INT io_batch_max], [ENUM hash_obj], [ENUM hash_log], [ENUM ioerr_obj], [ENUM ioerr_log], [ENUM allocerr_obj], [ENUM allocerr_log])
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
STRING xfellow.tune([INT logbuffer_size], [DURATION logbuffer_flush_interval], [REAL log_rewrite_ratio], [INT chunk_exponent], [BYTES chunk_bytes], [INT wait_table_exponent], [INT lru_exponent], [INT dsk_reserve_chunks], [INT mem_reserve_chunks], [BYTES objsize_hint], [BYTES objsize_max], [INT cram], [INT readahead], [BYTES discard_immediate], [INT io_batch_min], [INT io_batch_max], [ENUM hash_obj], [ENUM hash_log], [ENUM ioerr_obj], [ENUM ioerr_log], [ENUM allocerr_obj], [ENUM allocerr_log])
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
::
......@@ -493,6 +493,7 @@ STRING xfellow.tune([INT logbuffer_size], [DURATION logbuffer_flush_interval], [
[INT chunk_exponent],
[BYTES chunk_bytes],
[INT wait_table_exponent],
[INT lru_exponent],
[INT dsk_reserve_chunks],
[INT mem_reserve_chunks],
[BYTES objsize_hint],
......@@ -589,6 +590,24 @@ fellow storage can be fine tuned:
disk. Once an object is read, its body data is read in parallel
independent of this limit.
* *lru_exponent*
TL;DR: 2-logarithm of number of LRU lists
- unit: number of LRU lists as a power of two
- default: 0
- minimum: 0
- maximum: 6
On large systems, with mostly memory bound access, the LRU
list becomes the main contender as segments are removed and
re-added from/to LRU frequently.
A single LRU (``lru_exponent=0``) is most fair, only the absolute
least recently used segment is eviced ever. But more LRUs reduce
contention on the LRU lists significantly and improve parallelism of
evictions.
* *dsk_reserve_chunks*
- unit: scalar
......@@ -614,10 +633,10 @@ fellow storage can be fine tuned:
- minimum: 0
- maximum: memsize / 8 / chunk_bytes
specifies a number of chunks to reserve in memory. The reserve is
used to provide memory for new objects or objects staged from disk
to memory when memory is otherwise full. It can help reduce
latencies in these situations at the expense of some memory
specifies a number of chunks to reserve in memory per LRU. The
reserve is used to provide memory for new objects or objects staged
from disk to memory when memory is otherwise full. It can help
reduce latencies in these situations at the expense of some memory
unavailable for caching.
The value is capped suck that the number of reserved chunks times
......@@ -832,8 +851,8 @@ Restricted to: ``vcl_init``.
.. _slash.tune_fellow():
STRING tune_fellow(STEVEDORE storage, [INT logbuffer_size], [DURATION logbuffer_flush_interval], [REAL log_rewrite_ratio], [INT chunk_exponent], [BYTES chunk_bytes], [INT wait_table_exponent], [INT dsk_reserve_chunks], [INT mem_reserve_chunks], [BYTES objsize_hint], [BYTES objsize_max], [INT cram], [INT readahead], [BYTES discard_immediate], [INT io_batch_min], [INT io_batch_max], [ENUM hash_obj], [ENUM hash_log], [ENUM ioerr_obj], [ENUM ioerr_log], [ENUM allocerr_obj], [ENUM allocerr_log])
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
STRING tune_fellow(STEVEDORE storage, [INT logbuffer_size], [DURATION logbuffer_flush_interval], [REAL log_rewrite_ratio], [INT chunk_exponent], [BYTES chunk_bytes], [INT wait_table_exponent], [INT lru_exponent], [INT dsk_reserve_chunks], [INT mem_reserve_chunks], [BYTES objsize_hint], [BYTES objsize_max], [INT cram], [INT readahead], [BYTES discard_immediate], [INT io_batch_min], [INT io_batch_max], [ENUM hash_obj], [ENUM hash_log], [ENUM ioerr_obj], [ENUM ioerr_log], [ENUM allocerr_obj], [ENUM allocerr_log])
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
::
......@@ -845,6 +864,7 @@ STRING tune_fellow(STEVEDORE storage, [INT logbuffer_size], [DURATION logbuffer_
[INT chunk_exponent],
[BYTES chunk_bytes],
[INT wait_table_exponent],
[INT lru_exponent],
[INT dsk_reserve_chunks],
[INT mem_reserve_chunks],
[BYTES objsize_hint],
......
......@@ -430,6 +430,7 @@ $Method STRING .tune(
[ INT chunk_exponent ],
[ BYTES chunk_bytes ],
[ INT wait_table_exponent ],
[ INT lru_exponent ],
[ INT dsk_reserve_chunks ],
[ INT mem_reserve_chunks ],
[ BYTES objsize_hint ],
......@@ -525,6 +526,24 @@ fellow storage can be fine tuned:
disk. Once an object is read, its body data is read in parallel
independent of this limit.
* *lru_exponent*
TL;DR: 2-logarithm of number of LRU lists
- unit: number of LRU lists as a power of two
- default: 0
- minimum: 0
- maximum: 6
On large systems, with mostly memory bound access, the LRU
list becomes the main contender as segments are removed and
re-added from/to LRU frequently.
A single LRU (``lru_exponent=0``) is most fair, only the absolute
least recently used segment is eviced ever. But more LRUs reduce
contention on the LRU lists significantly and improve parallelism of
evictions.
* *dsk_reserve_chunks*
- unit: scalar
......@@ -550,10 +569,10 @@ fellow storage can be fine tuned:
- minimum: 0
- maximum: memsize / 8 / chunk_bytes
specifies a number of chunks to reserve in memory. The reserve is
used to provide memory for new objects or objects staged from disk
to memory when memory is otherwise full. It can help reduce
latencies in these situations at the expense of some memory
specifies a number of chunks to reserve in memory per LRU. The
reserve is used to provide memory for new objects or objects staged
from disk to memory when memory is otherwise full. It can help
reduce latencies in these situations at the expense of some memory
unavailable for caching.
The value is capped suck that the number of reserved chunks times
......@@ -761,6 +780,7 @@ $Function STRING tune_fellow(
[ INT chunk_exponent ],
[ BYTES chunk_bytes ],
[ INT wait_table_exponent ],
[ INT lru_exponent ],
[ INT dsk_reserve_chunks ],
[ INT mem_reserve_chunks ],
[ BYTES objsize_hint ],
......
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment