Commit 6af04dd6 authored by Geoff Simmons's avatar Geoff Simmons

Add docs: Logging, Events and the Varnish Service monitor

parent 82c2430e
......@@ -7,6 +7,8 @@ The docs in this folder cover these topics:
* [``VarnishConfig`` Custom Resource](ref-varnish-cfg.md)
* [controller command-line options](ref-cli-options.md)
* [Logging, Events and the Varnish Service monitor](monitor.md)
* [Varnish as a Kubernetes Ingress](varnish-as-ingress.md)
* [Self-sharding Varnish cluster](self-sharding.md): High-level
......
# Logging, Events and the Varnish Service monitor
The controller provides a number of means to report its actions,
report errors, check the state of Varnish instances implementing
Ingress, and ensure that they realize the desired state of the
cluster.
* Ingress controller log
* Kubernetes Events generated by the controller
* Varnish Service monitor
## Ingress controller log
The controller logs to the standard output stream, and can be viewed
in the usual ways for Kubernetes:
```
# Find the name of the Pod executing varnish-ingress-controller
$ kubectl get pods -n kube-system
[...]
varnish-ingress-controller-64fd4944bd-gph24 1/1 Running 0 1h
[...]
# Tail the log
$ kubectl logs -f -n kube-system varnish-ingress-controller-64fd4944bd-gph24
time="2019-01-03T14:29:12Z" level=info msg="Starting Varnish Ingress controller version:96ccb7f"
time="2019-01-03T14:29:12Z" level=info msg="Starting Varnish controller"
Varnish controller logging at level: info
[...]
```
The log level can be set with the ``-log-level``
[command-line option](/docs/ref-cli-options.md), default ``INFO``.
The log reports information obtained from watching the cluster API,
actions taken to sync the cluster with the desired state, errors
encountered, and information gathered by the Varnish Service monitor
(see below). Currently the content and format of log output is
intentionally not documented, since it is subject to change (so an
external parser cannot depend on formats staying the same).
## Events
The controller generates Kubernetes Events for the cluster components
that it acts upon, to report successful operations or failures. These
include Ingress, Services (running Varnish to implement Ingress, or
running as Ingress backends), the
[VarnishConfig Custom Resource](/docs/ref-varnish-cfg.md), Secrets and
Endpoints.
Events can be viewed in the usual ways, for example with
``kubectl get events``, ``kubectl describe`` or the Kubernetes dashboard:
```
$ kubectl describe ing cafe-ingress-varnish
[...]
Events:
FirstSeen LastSeen Count From SubObjectPath Type Reason Message
--------- -------- ----- ---- ------------- -------- ------ -------
2m 2m 1 varnish-ingress-controller Warning SyncFailure Error, requeueing: [{172.17.0.11:6081: No known admin secret}{172.17.0.12:6081: No known admin secret}]
2m 2m 6 varnish-ingress-controller Normal SyncSuccess Successfully synced
```
The EventSource (in the ``From`` column) is always
``varnish-ingress-controller``, and the Reason for actions taken after
a notification from the cluster API is ``SyncSuccess`` or
``SyncFailure``.
The Message in the case of ``SyncFailure`` reports the errors. The
addresses shown in the error message (such as ``172.17.0.11:6081`` in
the example) are Endpoints for a Service running Varnish, showing the
admin port (to which the controller connects to issue commands). The
controller attempts to sync all Endpoints belonging to a Varnish
Service, even if errors are encountered along the way, and collects
error reports from each Endpoint into a single message, in the "array"
format shown above.
``requeuing`` indicates that the work item has been placed back onto
the controller's queues, and will be re-attempted (with backoff delays
for repeated errors). The controller always repeats attempts to sync
the state of the cluster until the sync is successful. If there is an
unrecoverable error, such as an error in your configuration that must
be corrected, then the ``SyncFailure`` event will be repeatedly
generated (with backoff delays, and with the aggregation and de-dup of
Events by Kubernetes), and ``SyncSuccess`` will never appear, until
the problem is fixed. It is a good idea to monitor your Event logs for
this situation.
It is not unusual for ``SyncFailure`` to be closely followed by
``SyncSuccess``, as in the example above, if the controller has not
received complete information when it first attempts the sync. In the
example, the controller had received the Ingress configuration before
the Secret needed to connect with the Varnish admin ports, so it was
unable to load configurations to realize the Ingress. The Secret was
received shortly afterward, and then the Ingress could be successfully
synced. If the most recent event is ``SyncSuccess`` shortly after
``SyncFailure``, then probably all is well.
The controller is notified about all Services, Ingresses and so on in
the cluster, by default in every namespace, including components that
have nothing to do with Ingress or Varnish. These are ignored -- for
example, Ingresses without the ``ingress.class`` annotation set to
``varnish``, or Secrets that do not have the label
``app: varnish-ingress``. The controller may generate ``SyncSuccess``
Events for such objects, but in fact it has done nothing for them.
The controller log usually contains a message at the ``INFO`` level
that it has ignored information about a component.
## Varnish Service monitor
When the controller starts, it launches a monitor (a goroutine) that
at regular intervals checks the state of each Endpoint of a Varnish
Service implementing Ingress, and loads the current configuration if
necessary (if for some reason the regular sync operations have not
brought the Varnish instance up to date with the desired state).
By default, the monitor runs every 30 seconds. The interval can be
set with the
[``-monitorintvl`` command-line option](/docs/ref-cli-options.md).
For every Endpoint of a Varnish Service, the monitor does the
following:
* Connect to the admin port
* Issue the
[``ping`` command](https://varnish-cache.org/docs/6.1/reference/varnish-cli.html#ping-timestamp)
* Issue the
[``status`` command](https://varnish-cache.org/docs/6.1/reference/varnish-cli.html#status)
to read the state of the Varnish cache process.
* Check if a
[panic](https://varnish-cache.org/docs/6.1/reference/varnish-cli.html#panic-show)
has occurred
* [Discard](https://varnish-cache.org/docs/6.1/reference/varnish-cli.html#vcl-discard-configname-label)
configurations that were loaded by the controller and have gone cold.
* Update to the instance with a configuration for the current desired state,
if necessary.
When all of the operations succeed for every Endpoint in the Service,
the controller generates an Event with the Reason ``MonitorGood`` for
the Service:
```
$ kubectl describe svc varnish-ingress
Name: varnish-ingress
Namespace: default
[...]
Events:
FirstSeen LastSeen Count From SubObjectPath Type Reason Message
--------- -------- ----- ---- ------------- -------- ------ -------
1h 4m 151 varnish-ingress-controller Normal MonitorGood Monitor check good
```
When all is well, many of these accumulate over time. But Kubernetes'
aggregation and de-dup of Events will show the Event series only once,
and the ``count`` and ``lastTimestamp`` fields (reflected in the
"Count" and "Last Seen" columns) update less frequently.
The controller log contains more detailed information about the work
of the monitor, such as the response received for ping, and the
"banner" returned when admin connections are established, which
contains information about the platform, the Varnish version, and a
bit about the Varnish configuration.
If any of the operations listed above fail, then the controller
generates an Event at the ``Warning`` level for the Service, with a
Reason string that may be one of:
* ``NoAdminSecret``: no admin Secret currently known
* ``ConnectFailure``: could not connect to the admin port
* ``PingFailure``: error issuing the ``ping`` command (after the admin
connection succeeded)
* ``StatusFailure``: error issuing the ``status`` command
* ``StatusNotRunning``: the state of the child process was not
``running``. The current state is shown in the Message.
* ``PanicFailure``: error attempting to read the most recent panic
message
* ``Panic``: a panic message was found. The panic is included as an
unstructuted string in the Message.
* ``VCLListFailure``: error issuing the ``vcl.list`` command (to list
configurations, and find any that may be in the cold state)
* ``VCLDiscardFailure``: error attempting to discard a cold
configuration
* ``UpdateFailure``: error attempting to update to the current desired
configuration
For each such Event, the Message contains the address of the Endpoint
at which the error occurred, with the admin port (as in the example
further up). And the Message also contains an error description.
If the state of the child process is not ``running``, the controller
reports this fact, but does not attempt to restart the process, since
an administrator may have changed the state intentionally. Note that
the Pod running Varnish goes to the not ready state when this happens,
since it no longer answers the HTTP readiness check.
When there has been a panic, the Varnish management process attempts
to restart the child process. If successful, Varnish may be running
normally again by the time the panic is detected. So it is a good idea
to monitor your Event logs for this case, and save the panic message
if found. Otherwise the Varnish cluster may appear superficially to be
running normally, when in fact there has been a severe error.
......@@ -23,10 +23,14 @@ Usage of ./k8s-ingress:
log to standard error instead of files
-masterurl string
cluster master URL, for out-of-cluster runs
-monitorintvl duration
interval at which the monitor thread checks and updates
instances of Varnish deployed to implement Ingress (default 30s)
-namespace string
namespace in which to listen for resources (default all)
-readyfile string
path of a file to touch when the controller is ready, for readiness probes
path of a file to touch when the controller is ready,
for readiness probes
-stderrthreshold value
logs at or above this threshold go to stderr
-templatedir string
......@@ -76,6 +80,11 @@ the file at that path immediately at startup, if any exists, and
touches it when it is ready. Readiness probes can then test the file
for existence. By default, no readiness file is created.
``-monitorintvl`` sets the interval for the
[monitor](/docs/monitor.md), default 30 seconds. The monitor sleeps
this long between monitor runs for Varnish Services. See the
documentation at the link for more details.
``-log-level`` sets the log level for the main controller code,
``INFO`` by default.
......
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment