polish documentation

3b72a3bd · Nils Goroll · c1184875 · 3b72a3bd · 3b72a3bd · 3b72a3bd
Commit 3b72a3bd authored Sep 04, 2016 by Nils Goroll
Expand all Hide whitespace changes
Inline Side-by-side

Showing with 183 additions and 138 deletions

Makefile.am Makefile.am +0 -15

README.rst README.rst +57 -51

Makefile.am src/Makefile.am +9 -0

vmod_shard.vcc src/vmod_shard.vcc +117 -72

No files found.
--- a/Makefile.am
+++ b/Makefile.am
@@ -8,18 +8,3 @@ DISTCHECK_CONFIGURE_FLAGS = \
 EXTRA_DIST = README.rst LICENSE
 doc_DATA = README.rst LICENSE
-dist_man_MANS = vmod_shard.3
-MAINTAINERCLEANFILES = $(dist_man_MANS)
-vmod_shard.3: README.rst
-%.1 %.2 %.3 %.4 %.5 %.6 %.7 %.8 %.9:
-if HAVE_RST2MAN
-	${RST2MAN} $< $@
-else
-	@echo "========================================"
-	@echo "You need rst2man installed to make dist"
-	@echo "========================================"
-	@false
-endif
--- a/README.rst
+++ b/README.rst
@@ -25,63 +25,68 @@ Director vmod to implement backend sharding with consistent hashing,
 previously known also as the VSLP (Varnish StateLess Persistence)
 director.
-The basic concept behind this director is:
+Introduction
+============
-* Generate a load balancing key, which will be used to select the
-  backend. The key values should be as uniformly distributed as
+The shard director selects backends by a key, which can be provided
-  possible.  For all requests which need to hit the same backend
+directly or derived from strings. For the same key, the shard director
-  server, the same key must be generated.  For strings, a hash
+will always return the same backend, unless the backend configuration
-  function can be used to generate the key.
+or health state changes. Conversely, for differing keys, the shard
+director will likely choose different backends. In the default
-* Select the preferred backend server using an implementation of
+configuration, unhealthy backends are not selected.
-  consistent hashing (cf. Karger et al, references below), which
-  ensures that the same backend are always chosen for every key (for
+The shard director resembles the hash director, but its main advantage
-  instance hash of incoming URL) in the same order (i.e. if the
+is that, when the backend configuration or health states change, the
-  preferred host is down, then alternative hosts are always chosen in
+association of keys to backends remains as stable as possible.
-  a fixed and deterministic, but seemingly random order).
+In addition, the rampup and warmup features can help to further
-* The consistent hashing circular data structure gets built from hash
+improve user-perceived response times.
-  values of "ident%d" (default ident being the backend name) for each
-  backend and for a running number from 1 to n (n is the number of
+Sharding
-  "replicas").
+--------
-* For the load balancing key, find the smallest hash value in the
+This basic technique allows for numerious applications like optimizing
-  circle that is larger than the key (searching clockwise and wrapping
+backend server cache efficiency, Varnish clustering or persisting
-  around as necessary).
+sessions to servers without keeping any state, and, in particular,
+without the need to synchronize state between nodes of a cluster of
-* If the backend thus selected is down, choose alternative hosts by
+Varnish servers:
-  continuing to search clockwise in the circle.
+* Many applications use caches for data objects, so, in a cluster of
-On consistent hashing see:
+  application servers, requesting similar objects from the same server
+  may help to optimize efficiency of such caches.
-* http://www8.org/w8-papers/2a-webserver/caching/paper2.html
-* http://www.audioscrobbler.net/development/ketama/
+  For example, sharding by URL or some `id` component of the url has
-* svn://svn.audioscrobbler.net/misc/ketama
+  been shown to drastically improve the efficiency of many content
-* http://en.wikipedia.org/wiki/Consistent_hashing
+  management systems.
-This technique allowes to create shards of backend servers without
+* As special case of the previous example, in clusters of Varnish
-keeping any state, and, in particular, without the need to synchronize
+  servers without additional request distribution logic, each cache
-state between nodes of a cluster of Varnish servers. Sharding by some
+  will need store all hot objects, so the effective cache size is
-request property (for instance by URL) may help optimize cache
+  approximately the smallest cache size of any server in the cluster.
-efficiency.
+  Sharding allows to segregate objects within the cluster such that
-One particular applicatoin of sharding is to implement persistence of
+  each object is only cached on one of the servers (or on one primary
-backend requests, such that all requests sharing a certain criterium
+  and one backup, on a primary for long and others for short
-(such as an IP address or session ID) get forwarded to the same
+  etc...). Effectively, this will lead to a cache size in the order of
-backend server.
+  the sum of all individual caches, with the potential to drastically
+  increase efficiency (scales by the number of servers).
+* Another application is to implement persistence of backend requests,
+  such that all requests sharing a certain criterium (such as an IP
+  address or session ID) get forwarded to the same backend server.
 When used with clusters of varnish servers, the shard director will,
-if otherwise configured equally, make the same shard decision on all
+if otherwise configured equally, make the same decision on all
 servers. In other words, requests sharing a common criterium used as
-the shard key will be balanced onto the same backend server(s) no matter
+the shard key will be balanced onto the same backend server(s) no
-which Varnish server handles the request.
+matter which Varnish server handles the request.
 The drawbacks are:
-* the distribution of requests depends on the number of requests per key and
+* the distribution of requests depends on the number of requests per
-  the uniformity of the distribution of key values. In short, this technique
+  key and the uniformity of the distribution of key values. In short,
-  will generally lead to less good load balancing compared to stateful
+  while this technique may lead to much better efficiency overall, it
-  techniques.
+  may also lead to less good load balancing for specific cases.
 * When a backend server becomes unavailable, every persistence
  technique has to reselect a new backend server, but this technique
@@ -91,6 +96,7 @@ The drawbacks are:
  a selected server for as long as possible (or dictated by a TTL)).
 INSTALLATION
 ============

--- a/src/Makefile.am
+++ b/src/Makefile.am
@@ -22,6 +22,15 @@ nodist_libvmod_shard_la_SOURCES = \
 	parse_vcc_enums.h \
 	parse_vcc_enums.c
+dist_man_MANS = vmod_shard.3
+vmod_shard.3: vmod_shard.man.rst
+	${RST2MAN} $< $@
+vmod_shard.lo: vcc_if.c
+vmod_shard.man.rst vcc_if.c: vcc_if.h
 parse_vcc_enums.h: parse_vcc_enums.c
 parse_vcc_enums.c: gen_enum_parse.pl

--- a/src/vmod_shard.vcc
+++ b/src/vmod_shard.vcc