pkg/controller/service.go · 9cd41fab882f5770107f05371b5877ea7eeb3309 · uplex-varnish / k8s-ingress

Refactor the return status for sync operations in the main loop. · d5c1528b
Geoff Simmons authored Jul 06, 2020
Previously all sync operations (add/update/delete for the resource
types that the controller watches, and the cluster changes brought
about for them if necessary) were only characterized by a Go error
variable. The only conditions that mattered were nil or not-nil.

On a nil return, success was logged, and a SyncSuccess event was
generated for the resource that was synced. But this was done even
if no change was required in the cluster, and the resource had
nothing to do with Ingress or the viking application. This led
to many superfluous Events.

On a non-nil return, a warning Event was generated the sync
operation was re-queued, using the workqueue's rate-limiting delay.
This was done regardless of the type of error. Since the initial delay
is quite rapid (subsequent re-queues begin to back off the delay), it
led to many re-queues, and many Events. But it is very common that a
brief delay is predictable, when not all necessary information is
available to the controller, so that the rapid retries just
generated a lot of noise. In other cases, retries will not improve
the situation -- an invalid config, for example, will still be
invalid on the next attempt.

This commit introduces pkg/update and the type Status, which classifies
the result of a sync operation. All of the sync methods now return
an object of this type, which in turn determines how the controller
handles errors, logs results, and generates Events. The Status types
are:

Success: a cluster change was necessary and was executed successfully.
The result is logged, and an Event with Reason SyncSuccess is generated
(as before).

Noop: no cluster change was necessary. The result is logged, but no
Event is generated. This reduces Event generation considerably.

Fatal: an unrecoverable error (retries won't help). The result is
logged, a SyncFatalError warning Event is generated, but no retries
are attempted.

Recoverable: an error that might do better on retry. The result is
logged, a SyncRecoverableError is generated, and the operation is
re-queued with the rate-limiting delay (as before).

Incomplete: a cluster change is necessary, but some information is
missing. The result is logged, a SyncIncomplete warning Event is
generated, and the operation is re-queued with a delay. The delay
is currently hard-wired to 5s, but will be made configurable.

We'll probably tweak some of the decisions about which status types
are chosen for which results. But this has already improved the
controller's error handling, and has considerably reduced its
verbosity, with respect to both logging and event generation.
d5c1528b
service.go 11.4 KB
Replace service.go