polish documentation

3b72a3bd · Nils Goroll · c1184875 · 3b72a3bd · 3b72a3bd · 3b72a3bd
Commit 3b72a3bd authored Sep 04, 2016 by Nils Goroll
Expand all Hide whitespace changes
Inline Side-by-side

Showing with 183 additions and 138 deletions

Makefile.am Makefile.am +0 -15

README.rst README.rst +57 -51

Makefile.am src/Makefile.am +9 -0

vmod_shard.vcc src/vmod_shard.vcc +117 -72

No files found.
--- a/Makefile.am
+++ b/Makefile.am
@@ -8,18 +8,3 @@ DISTCHECK_CONFIGURE_FLAGS = \
 EXTRA_DIST = README.rst LICENSE

 doc_DATA = README.rst LICENSE
-
-dist_man_MANS = vmod_shard.3
-MAINTAINERCLEANFILES = $(dist_man_MANS)
-
-vmod_shard.3: README.rst
-
-%.1 %.2 %.3 %.4 %.5 %.6 %.7 %.8 %.9:
-if HAVE_RST2MAN
-	${RST2MAN} $< $@
-else
-	@echo "========================================"
-	@echo "You need rst2man installed to make dist"
-	@echo "========================================"
-	@false
-endif
--- a/README.rst
+++ b/README.rst
@@ -25,63 +25,68 @@ Director vmod to implement backend sharding with consistent hashing,
 previously known also as the VSLP (Varnish StateLess Persistence)
 director.

-The basic concept behind this director is:
-
-* Generate a load balancing key, which will be used to select the
-  backend. The key values should be as uniformly distributed as
-  possible.  For all requests which need to hit the same backend
-  server, the same key must be generated.  For strings, a hash
-  function can be used to generate the key.
-
-* Select the preferred backend server using an implementation of
-  consistent hashing (cf. Karger et al, references below), which
-  ensures that the same backend are always chosen for every key (for
-  instance hash of incoming URL) in the same order (i.e. if the
-  preferred host is down, then alternative hosts are always chosen in
-  a fixed and deterministic, but seemingly random order).
-
-* The consistent hashing circular data structure gets built from hash
-  values of "ident%d" (default ident being the backend name) for each
-  backend and for a running number from 1 to n (n is the number of
-  "replicas").
-
-* For the load balancing key, find the smallest hash value in the
-  circle that is larger than the key (searching clockwise and wrapping
-  around as necessary).
-
-* If the backend thus selected is down, choose alternative hosts by
-  continuing to search clockwise in the circle.
-
-On consistent hashing see:
-
-* http://www8.org/w8-papers/2a-webserver/caching/paper2.html
-* http://www.audioscrobbler.net/development/ketama/
-* svn://svn.audioscrobbler.net/misc/ketama
-* http://en.wikipedia.org/wiki/Consistent_hashing
-
-This technique allowes to create shards of backend servers without
-keeping any state, and, in particular, without the need to synchronize
-state between nodes of a cluster of Varnish servers. Sharding by some
-request property (for instance by URL) may help optimize cache
-efficiency.
-
-One particular applicatoin of sharding is to implement persistence of
-backend requests, such that all requests sharing a certain criterium
-(such as an IP address or session ID) get forwarded to the same
-backend server.
+Introduction
+============
+
+The shard director selects backends by a key, which can be provided
+directly or derived from strings. For the same key, the shard director
+will always return the same backend, unless the backend configuration
+or health state changes. Conversely, for differing keys, the shard
+director will likely choose different backends. In the default
+configuration, unhealthy backends are not selected.
+
+The shard director resembles the hash director, but its main advantage
+is that, when the backend configuration or health states change, the
+association of keys to backends remains as stable as possible.
+
+In addition, the rampup and warmup features can help to further
+improve user-perceived response times.
+
+Sharding
+--------
+
+This basic technique allows for numerious applications like optimizing
+backend server cache efficiency, Varnish clustering or persisting
+sessions to servers without keeping any state, and, in particular,
+without the need to synchronize state between nodes of a cluster of
+Varnish servers:
+
+* Many applications use caches for data objects, so, in a cluster of
+  application servers, requesting similar objects from the same server
+  may help to optimize efficiency of such caches.
+
+  For example, sharding by URL or some `id` component of the url has
+  been shown to drastically improve the efficiency of many content
+  management systems.
+
+* As special case of the previous example, in clusters of Varnish
+  servers without additional request distribution logic, each cache
+  will need store all hot objects, so the effective cache size is
+  approximately the smallest cache size of any server in the cluster.
+
+  Sharding allows to segregate objects within the cluster such that
+  each object is only cached on one of the servers (or on one primary
+  and one backup, on a primary for long and others for short
+  etc...). Effectively, this will lead to a cache size in the order of
+  the sum of all individual caches, with the potential to drastically
+  increase efficiency (scales by the number of servers).
+
+* Another application is to implement persistence of backend requests,
+  such that all requests sharing a certain criterium (such as an IP
+  address or session ID) get forwarded to the same backend server.

 When used with clusters of varnish servers, the shard director will,
-if otherwise configured equally, make the same shard decision on all
+if otherwise configured equally, make the same decision on all
 servers. In other words, requests sharing a common criterium used as
-the shard key will be balanced onto the same backend server(s) no matter
-which Varnish server handles the request.
+the shard key will be balanced onto the same backend server(s) no
+matter which Varnish server handles the request.

 The drawbacks are:

-* the distribution of requests depends on the number of requests per key and
-  the uniformity of the distribution of key values. In short, this technique
-  will generally lead to less good load balancing compared to stateful
-  techniques.
+* the distribution of requests depends on the number of requests per
+  key and the uniformity of the distribution of key values. In short,
+  while this technique may lead to much better efficiency overall, it
+  may also lead to less good load balancing for specific cases.

 * When a backend server becomes unavailable, every persistence
  technique has to reselect a new backend server, but this technique
@@ -91,6 +96,7 @@ The drawbacks are:
  a selected server for as long as possible (or dictated by a TTL)).


+
 INSTALLATION
 ============


--- a/src/Makefile.am
+++ b/src/Makefile.am
@@ -22,6 +22,15 @@ nodist_libvmod_shard_la_SOURCES = \
 	parse_vcc_enums.h \
 	parse_vcc_enums.c

+dist_man_MANS = vmod_shard.3
+
+vmod_shard.3: vmod_shard.man.rst
+	${RST2MAN} $< $@
+
+vmod_shard.lo: vcc_if.c
+
+vmod_shard.man.rst vcc_if.c: vcc_if.h
+
 parse_vcc_enums.h: parse_vcc_enums.c

 parse_vcc_enums.c: gen_enum_parse.pl

--- a/src/vmod_shard.vcc
+++ b/src/vmod_shard.vcc