-
Geoff Simmons authored
Previously all sync operations (add/update/delete for the resource types that the controller watches, and the cluster changes brought about for them if necessary) were only characterized by a Go error variable. The only conditions that mattered were nil or not-nil. On a nil return, success was logged, and a SyncSuccess event was generated for the resource that was synced. But this was done even if no change was required in the cluster, and the resource had nothing to do with Ingress or the viking application. This led to many superfluous Events. On a non-nil return, a warning Event was generated the sync operation was re-queued, using the workqueue's rate-limiting delay. This was done regardless of the type of error. Since the initial delay is quite rapid (subsequent re-queues begin to back off the delay), it led to many re-queues, and many Events. But it is very common that a brief delay is predictable, when not all necessary information is available to the controller, so that the rapid retries just generated a lot of noise. In other cases, retries will not improve the situation -- an invalid config, for example, will still be invalid on the next attempt. This commit introduces pkg/update and the type Status, which classifies the result of a sync operation. All of the sync methods now return an object of this type, which in turn determines how the controller handles errors, logs results, and generates Events. The Status types are: Success: a cluster change was necessary and was executed successfully. The result is logged, and an Event with Reason SyncSuccess is generated (as before). Noop: no cluster change was necessary. The result is logged, but no Event is generated. This reduces Event generation considerably. Fatal: an unrecoverable error (retries won't help). The result is logged, a SyncFatalError warning Event is generated, but no retries are attempted. Recoverable: an error that might do better on retry. The result is logged, a SyncRecoverableError is generated, and the operation is re-queued with the rate-limiting delay (as before). Incomplete: a cluster change is necessary, but some information is missing. The result is logged, a SyncIncomplete warning Event is generated, and the operation is re-queued with a delay. The delay is currently hard-wired to 5s, but will be made configurable. We'll probably tweak some of the decisions about which status types are chosen for which results. But this has already improved the controller's error handling, and has considerably reduced its verbosity, with respect to both logging and event generation.
d5c1528b