Give up on reusing shmem files, it is not worth the added code

complexity. Do a -n collision check, and create a new file always. git-svn-id: http://www.varnish-cache.org/svn/trunk/varnish-cache@4877 d4fa192b-c00b-0410-8231-f00ffab90ce4

Give up on reusing shmem files, it is not worth the added code
complexity. Do a -n collision check, and create a new file always. git-svn-id: http://www.varnish-cache.org/svn/trunk/varnish-cache@4877 d4fa192b-c00b-0410-8231-f00ffab90ce4
c4f8cb74 · Poul-Henning Kamp · 261b11b2 · c4f8cb74 · c4f8cb74
Commit c4f8cb74 authored Jun 02, 2010 by Poul-Henning Kamp
Hide whitespace changes
Inline Side-by-side

Showing with 41 additions and 54 deletions

mgt_shmem.c bin/varnishd/mgt_shmem.c +13 -32

shmem.rst doc/sphinx/reference/shmem.rst +28 -22

No files found.
--- a/bin/varnishd/mgt_shmem.c
+++ b/bin/varnishd/mgt_shmem.c
@@ -115,12 +115,12 @@ mgt_SHM_Alloc(unsigned size, const char *class, const char *type, const char *id
 }

 /*--------------------------------------------------------------------
- * Try to reuse an existing shmem file, but try to not disturb another
- * varnishd using the file.
+ * Check that we are not started with the same -n argument as an already
+ * running varnishd
 */

-static int
-vsl_goodold(int fd, unsigned size)
+static void
+vsl_n_check(int fd)
 {
 	struct shmloghead slh;
 	int i;
@@ -133,11 +133,11 @@ vsl_goodold(int fd, unsigned size)
 	memset(&slh, 0, sizeof slh);	/* XXX: for flexelint */
 	i = read(fd, &slh, sizeof slh);
 	if (i != sizeof slh)
-		return (0);
+		return;
 	if (slh.magic != SHMLOGHEAD_MAGIC)
-		return (0);
+		return;
 	if (slh.hdrsize != sizeof slh)
-		return (0);
+		return;

 	if (slh.master_pid != 0 && !kill(slh.master_pid, 0)) {
 		fprintf(stderr,
@@ -148,24 +148,6 @@ vsl_goodold(int fd, unsigned size)
 		    "instances.)\n");
 		exit(2);
 	}
-
-	if (slh.child_pid != 0 && !kill(slh.child_pid, 0)) {
-		fprintf(stderr,
-		    "SHMFILE used by orphan varnishd child process (pid=%jd)\n",
-		    (intmax_t)slh.child_pid);
-		fprintf(stderr, "(We assume that process is busy dying.)\n");
-		return (0);
-	}
-
-	/* Sanity checks */
-
-	if (slh.shm_size != size)
-		return (0);
-
-	if (st.st_size != size)
-		return (0);
-
-	return (1);
 }

 /*--------------------------------------------------------------------
@@ -278,14 +260,13 @@ mgt_SHM_Init(const char *fn, const char *l_arg)
 	size &= ~(ps - 1);

 	i = open(fn, O_RDWR, 0644);
-	if (i >= 0 && vsl_goodold(i, size)) {
-		fprintf(stderr, "Using old SHMFILE\n");
-		vsl_fd = i;
-	} else {
-		fprintf(stderr, "Creating new SHMFILE\n");
+	if (i >= 0) {
+		vsl_n_check(i);
 		(void)close(i);
-		vsl_buildnew(fn, size, fill);
-	}
+	} 
+	fprintf(stderr, "Creating new SHMFILE\n");
+	(void)close(i);
+	vsl_buildnew(fn, size, fill);

 	loghead = (void *)mmap(NULL, size,
 	    PROT_READ|PROT_WRITE,

--- a/doc/sphinx/reference/shmem.rst
+++ b/doc/sphinx/reference/shmem.rst
@@ -6,39 +6,45 @@ Varnish uses shared memory for logging and statistics, because it
 is faster and much more efficient.  But it is also tricky in ways
 a regular logfile is not.

-Collision Detection
-------------------
-
 When you open a file in "append" mode, the operating system guarantees
 that whatever you write will not overwrite existing data in the file.
 The neat result of this is that multiple procesess or threads writing
-to the same file does not even need to know about each other it all
+to the same file does not even need to know about each other, it all
 works just as you would expect.

-With shared memory you get no such seatbelts.
+With a shared memory log, we get no help from the kernel, the writers
+need to make sure they do not stomp on each other, and they need to
+make it possible and safe for the readers to access the data.
+
+The "CS101" way, is to introduce locks, and much time is spent examining
+the relative merits of the many kinds of locks available.

-When Varnishd starts, it could find an existing shared memory file,
-being used by another varnishd, either because somebody gave the wrong
-(or no) -n argument, or because the old varnishd was not dead when
-some kind of process-nanny restarted varnishd anew.
+Inside the varnishd (worker) process, we use mutexes to guarantee
+consistency, both with respect to allocations, log entries and stats
+counters.

-If the shared memory file has a different version or layout it will
-be deleted and a new created.
+We do not want a varnishncsa trying to push data through a stalled
+ssh connection to stall the delivery of content, so readers like
+that are purely read-only, they do not get to affect the varnishd
+process and that means no locks for them.

-If the process listed in the "master_pid" field is running,
-varnishd will abort startup, assuming you got a wrong -n argument.
+Instead we use "stable storage" concepts, to make sure the view
+seen by the readers is consistent at all times.

-If the process listed in the "child_pid" field is (still?) running,
-or if the file as a size different from that specified in the -l 
-argument, it will be deleted and a new file created.
+As long as you only add stuff, that is trivial, but taking away
+stuff, such as when a backend is taken out of the configuration,
+we need to give the readers a chance to discover this, a "cooling
+off" period.

-The upshot of this, is that programs subscribing to the shared memory
-file should periodically do a stat(2) on the name, and if the
-st_dev or st_inode fields changed, switch to the new shared memory file.
+When Varnishd starts, if it finds an existing shared memory file,
+and it can safely read the master_pid field, it will check if that
+process is running, and if so, fail with an error message, indicating
+that -n arguments collide.

-Also, the master_pid field should be monitored, if it changes, the
-shared memory file should be "reopened" with respect to the linked
-list of allocations.
+In all other cases, it will delete and create a new shmlog file,
+in order to provide running readers a cooling off period, where
+they can discover that there is a new shmlog file, by doing a
+stat(2) call and checking the st_dev & st_inode fields.

 Allocations
 -----------