Release the fco mtx during "unbusy" log submission

the test case from the next commit exposed a deadlock because the only object in the test would consume all memory und could not get LRUd because fellow_cache_async_write_complete() would hold the fco mtx during log submission.

Release the fco mtx during "unbusy" log submission
the test case from the next commit exposed a deadlock because the only object in the test would consume all memory und could not get LRUd because fellow_cache_async_write_complete() would hold the fco mtx during log submission.
871f2565 · Nils Goroll · 2c302579 · 871f2565
Unverified Commit 871f2565 authored Sep 19, 2023 by Nils Goroll
Hide whitespace changes
Inline Side-by-side

Showing with 24 additions and 5 deletions

fellow_cache.c src/fellow_cache.c +24 -5

No files found.
--- a/src/fellow_cache.c
+++ b/src/fellow_cache.c
@@ -2812,12 +2812,27 @@ fellow_cache_async_write_complete(struct fellow_cache *fc,

 		switch (fco->logstate) {
 		case FCOL_WANTLOG:
-			/* XXX could probably this outside the lock
-			 * by deleting the obj again if the logstate
-			 * was changed to FCOL_TOOLATE
+			/* fellow_cache_obj_delete() can not race
+			 * us because of wait for FCO_WRITING.
+			 *
+			 * unlock during busy_log_submit because
+			 * of LRU and waiting allocs
+			 *
+			 * sfe_oc_event DOES race us and we may
+			 * lose events between the time we write
+			 * the log here and call stvfe_oc_log_submitted()
+			 *
+			 * the whole event thing is racy anyway, not sure
+			 * how relevant...
 			 */
+			fellow_cache_lru_chgbatch_apply(lcb);
+			AZ(pthread_mutex_unlock(&fco->mtx));
+
 			fellow_busy_log_submit(fbo);
 			stvfe_oc_log_submitted(fco->oc);
+
+			AZ(pthread_mutex_lock(&fco->mtx));
+			assert(fco->logstate == FCOL_WANTLOG);
 			fco->logstate = FCOL_INLOG;
 			break;
 		case FCOL_NOLOG:
@@ -4275,7 +4290,7 @@ fdr_compar(const void *aa, const void *bb)
 }
 #endif

-/* under fco mtx */
+/* NOT fco mtx */
 static void
 fellow_busy_log_submit(const struct fellow_busy *fbo)
 {
@@ -4823,9 +4838,13 @@ fellow_cache_obj_delete(struct fellow_cache *fc,

 	switch (fco->logstate) {
 	case FCOL_DUNNO:
-	case FCOL_WANTLOG:
 		fco->logstate = FCOL_TOOLATE;
 		break;
+	case FCOL_WANTLOG:
+		// SYNC WITH fellow_cache_async_write_complete()
+		// see comment there
+		WRONG("fellow_cache_obj_delete FCOL_WANTLOG - can't race");
+		break;
 	case FCOL_NOLOG:
 		break;
 	case FCOL_TOOLATE: