• Geoff Simmons's avatar
    Refactor the return status for sync operations in the main loop. · d5c1528b
    Geoff Simmons authored
    Previously all sync operations (add/update/delete for the resource
    types that the controller watches, and the cluster changes brought
    about for them if necessary) were only characterized by a Go error
    variable. The only conditions that mattered were nil or not-nil.
    
    On a nil return, success was logged, and a SyncSuccess event was
    generated for the resource that was synced. But this was done even
    if no change was required in the cluster, and the resource had
    nothing to do with Ingress or the viking application. This led
    to many superfluous Events.
    
    On a non-nil return, a warning Event was generated the sync
    operation was re-queued, using the workqueue's rate-limiting delay.
    This was done regardless of the type of error. Since the initial delay
    is quite rapid (subsequent re-queues begin to back off the delay), it
    led to many re-queues, and many Events. But it is very common that a
    brief delay is predictable, when not all necessary information is
    available to the controller, so that the rapid retries just
    generated a lot of noise. In other cases, retries will not improve
    the situation -- an invalid config, for example, will still be
    invalid on the next attempt.
    
    This commit introduces pkg/update and the type Status, which classifies
    the result of a sync operation. All of the sync methods now return
    an object of this type, which in turn determines how the controller
    handles errors, logs results, and generates Events. The Status types
    are:
    
    Success: a cluster change was necessary and was executed successfully.
    The result is logged, and an Event with Reason SyncSuccess is generated
    (as before).
    
    Noop: no cluster change was necessary. The result is logged, but no
    Event is generated. This reduces Event generation considerably.
    
    Fatal: an unrecoverable error (retries won't help). The result is
    logged, a SyncFatalError warning Event is generated, but no retries
    are attempted.
    
    Recoverable: an error that might do better on retry. The result is
    logged, a SyncRecoverableError is generated, and the operation is
    re-queued with the rate-limiting delay (as before).
    
    Incomplete: a cluster change is necessary, but some information is
    missing. The result is logged, a SyncIncomplete warning Event is
    generated, and the operation is re-queued with a delay. The delay
    is currently hard-wired to 5s, but will be made configurable.
    
    We'll probably tweak some of the decisions about which status types
    are chosen for which results. But this has already improved the
    controller's error handling, and has considerably reduced its
    verbosity, with respect to both logging and event generation.
    d5c1528b
Name
Last commit
Last update
charts Loading commit data...
cmd Loading commit data...
container Loading commit data...
deploy Loading commit data...
docs Loading commit data...
examples Loading commit data...
hack Loading commit data...
pkg Loading commit data...
test Loading commit data...
.gitignore Loading commit data...
.gitlab-ci.yml Loading commit data...
.kind-gitlab.yaml Loading commit data...
COPYING Loading commit data...
LICENSE Loading commit data...
Makefile Loading commit data...
README.md Loading commit data...
go.mod Loading commit data...
go.sum Loading commit data...