Commit 3b72a3bd authored by Nils Goroll's avatar Nils Goroll

polish documentation

parent c1184875
Pipeline #24 skipped
...@@ -8,18 +8,3 @@ DISTCHECK_CONFIGURE_FLAGS = \ ...@@ -8,18 +8,3 @@ DISTCHECK_CONFIGURE_FLAGS = \
EXTRA_DIST = README.rst LICENSE EXTRA_DIST = README.rst LICENSE
doc_DATA = README.rst LICENSE doc_DATA = README.rst LICENSE
dist_man_MANS = vmod_shard.3
MAINTAINERCLEANFILES = $(dist_man_MANS)
vmod_shard.3: README.rst
%.1 %.2 %.3 %.4 %.5 %.6 %.7 %.8 %.9:
if HAVE_RST2MAN
${RST2MAN} $< $@
else
@echo "========================================"
@echo "You need rst2man installed to make dist"
@echo "========================================"
@false
endif
...@@ -25,63 +25,68 @@ Director vmod to implement backend sharding with consistent hashing, ...@@ -25,63 +25,68 @@ Director vmod to implement backend sharding with consistent hashing,
previously known also as the VSLP (Varnish StateLess Persistence) previously known also as the VSLP (Varnish StateLess Persistence)
director. director.
The basic concept behind this director is: Introduction
============
* Generate a load balancing key, which will be used to select the
backend. The key values should be as uniformly distributed as The shard director selects backends by a key, which can be provided
possible. For all requests which need to hit the same backend directly or derived from strings. For the same key, the shard director
server, the same key must be generated. For strings, a hash will always return the same backend, unless the backend configuration
function can be used to generate the key. or health state changes. Conversely, for differing keys, the shard
director will likely choose different backends. In the default
* Select the preferred backend server using an implementation of configuration, unhealthy backends are not selected.
consistent hashing (cf. Karger et al, references below), which
ensures that the same backend are always chosen for every key (for The shard director resembles the hash director, but its main advantage
instance hash of incoming URL) in the same order (i.e. if the is that, when the backend configuration or health states change, the
preferred host is down, then alternative hosts are always chosen in association of keys to backends remains as stable as possible.
a fixed and deterministic, but seemingly random order).
In addition, the rampup and warmup features can help to further
* The consistent hashing circular data structure gets built from hash improve user-perceived response times.
values of "ident%d" (default ident being the backend name) for each
backend and for a running number from 1 to n (n is the number of Sharding
"replicas"). --------
* For the load balancing key, find the smallest hash value in the This basic technique allows for numerious applications like optimizing
circle that is larger than the key (searching clockwise and wrapping backend server cache efficiency, Varnish clustering or persisting
around as necessary). sessions to servers without keeping any state, and, in particular,
without the need to synchronize state between nodes of a cluster of
* If the backend thus selected is down, choose alternative hosts by Varnish servers:
continuing to search clockwise in the circle.
* Many applications use caches for data objects, so, in a cluster of
On consistent hashing see: application servers, requesting similar objects from the same server
may help to optimize efficiency of such caches.
* http://www8.org/w8-papers/2a-webserver/caching/paper2.html
* http://www.audioscrobbler.net/development/ketama/ For example, sharding by URL or some `id` component of the url has
* svn://svn.audioscrobbler.net/misc/ketama been shown to drastically improve the efficiency of many content
* http://en.wikipedia.org/wiki/Consistent_hashing management systems.
This technique allowes to create shards of backend servers without * As special case of the previous example, in clusters of Varnish
keeping any state, and, in particular, without the need to synchronize servers without additional request distribution logic, each cache
state between nodes of a cluster of Varnish servers. Sharding by some will need store all hot objects, so the effective cache size is
request property (for instance by URL) may help optimize cache approximately the smallest cache size of any server in the cluster.
efficiency.
Sharding allows to segregate objects within the cluster such that
One particular applicatoin of sharding is to implement persistence of each object is only cached on one of the servers (or on one primary
backend requests, such that all requests sharing a certain criterium and one backup, on a primary for long and others for short
(such as an IP address or session ID) get forwarded to the same etc...). Effectively, this will lead to a cache size in the order of
backend server. the sum of all individual caches, with the potential to drastically
increase efficiency (scales by the number of servers).
* Another application is to implement persistence of backend requests,
such that all requests sharing a certain criterium (such as an IP
address or session ID) get forwarded to the same backend server.
When used with clusters of varnish servers, the shard director will, When used with clusters of varnish servers, the shard director will,
if otherwise configured equally, make the same shard decision on all if otherwise configured equally, make the same decision on all
servers. In other words, requests sharing a common criterium used as servers. In other words, requests sharing a common criterium used as
the shard key will be balanced onto the same backend server(s) no matter the shard key will be balanced onto the same backend server(s) no
which Varnish server handles the request. matter which Varnish server handles the request.
The drawbacks are: The drawbacks are:
* the distribution of requests depends on the number of requests per key and * the distribution of requests depends on the number of requests per
the uniformity of the distribution of key values. In short, this technique key and the uniformity of the distribution of key values. In short,
will generally lead to less good load balancing compared to stateful while this technique may lead to much better efficiency overall, it
techniques. may also lead to less good load balancing for specific cases.
* When a backend server becomes unavailable, every persistence * When a backend server becomes unavailable, every persistence
technique has to reselect a new backend server, but this technique technique has to reselect a new backend server, but this technique
...@@ -91,6 +96,7 @@ The drawbacks are: ...@@ -91,6 +96,7 @@ The drawbacks are:
a selected server for as long as possible (or dictated by a TTL)). a selected server for as long as possible (or dictated by a TTL)).
INSTALLATION INSTALLATION
============ ============
......
...@@ -22,6 +22,15 @@ nodist_libvmod_shard_la_SOURCES = \ ...@@ -22,6 +22,15 @@ nodist_libvmod_shard_la_SOURCES = \
parse_vcc_enums.h \ parse_vcc_enums.h \
parse_vcc_enums.c parse_vcc_enums.c
dist_man_MANS = vmod_shard.3
vmod_shard.3: vmod_shard.man.rst
${RST2MAN} $< $@
vmod_shard.lo: vcc_if.c
vmod_shard.man.rst vcc_if.c: vcc_if.h
parse_vcc_enums.h: parse_vcc_enums.c parse_vcc_enums.h: parse_vcc_enums.c
parse_vcc_enums.c: gen_enum_parse.pl parse_vcc_enums.c: gen_enum_parse.pl
......
This diff is collapsed.
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment