1. 06 Jul, 2020 1 commit
    • Geoff Simmons's avatar
      Refactor the return status for sync operations in the main loop. · d5c1528b
      Geoff Simmons authored
      Previously all sync operations (add/update/delete for the resource
      types that the controller watches, and the cluster changes brought
      about for them if necessary) were only characterized by a Go error
      variable. The only conditions that mattered were nil or not-nil.
      
      On a nil return, success was logged, and a SyncSuccess event was
      generated for the resource that was synced. But this was done even
      if no change was required in the cluster, and the resource had
      nothing to do with Ingress or the viking application. This led
      to many superfluous Events.
      
      On a non-nil return, a warning Event was generated the sync
      operation was re-queued, using the workqueue's rate-limiting delay.
      This was done regardless of the type of error. Since the initial delay
      is quite rapid (subsequent re-queues begin to back off the delay), it
      led to many re-queues, and many Events. But it is very common that a
      brief delay is predictable, when not all necessary information is
      available to the controller, so that the rapid retries just
      generated a lot of noise. In other cases, retries will not improve
      the situation -- an invalid config, for example, will still be
      invalid on the next attempt.
      
      This commit introduces pkg/update and the type Status, which classifies
      the result of a sync operation. All of the sync methods now return
      an object of this type, which in turn determines how the controller
      handles errors, logs results, and generates Events. The Status types
      are:
      
      Success: a cluster change was necessary and was executed successfully.
      The result is logged, and an Event with Reason SyncSuccess is generated
      (as before).
      
      Noop: no cluster change was necessary. The result is logged, but no
      Event is generated. This reduces Event generation considerably.
      
      Fatal: an unrecoverable error (retries won't help). The result is
      logged, a SyncFatalError warning Event is generated, but no retries
      are attempted.
      
      Recoverable: an error that might do better on retry. The result is
      logged, a SyncRecoverableError is generated, and the operation is
      re-queued with the rate-limiting delay (as before).
      
      Incomplete: a cluster change is necessary, but some information is
      missing. The result is logged, a SyncIncomplete warning Event is
      generated, and the operation is re-queued with a delay. The delay
      is currently hard-wired to 5s, but will be made configurable.
      
      We'll probably tweak some of the decisions about which status types
      are chosen for which results. But this has already improved the
      controller's error handling, and has considerably reduced its
      verbosity, with respect to both logging and event generation.
      d5c1528b
  2. 01 Jul, 2020 5 commits
  3. 30 Jun, 2020 8 commits
  4. 18 Jun, 2020 1 commit
  5. 12 Jun, 2020 6 commits
  6. 11 Jun, 2020 2 commits
  7. 10 Jun, 2020 2 commits
  8. 05 Jun, 2020 1 commit
  9. 04 Jun, 2020 2 commits
  10. 03 Jun, 2020 9 commits
  11. 02 Jun, 2020 2 commits
  12. 28 May, 2020 1 commit