Commits · tls-refactor · uplex-varnish / k8s-ingress

03 Aug, 2020 1 commit
- Update helm charts for the refactored TLS solution. · 4b3ef9f6
  Geoff Simmons authored Aug 03, 2020
```
Ref #36
```
  4b3ef9f6
31 Jul, 2020 1 commit

Refactor the mechansim for making TLS Secrets available to haproxy. · 0ed73905

Geoff Simmons authored Jul 31, 2020

The haproxy container now runs the app k8s-crt-dnldr, and no longer
runs http-faccess. See https://code.uplex.de/k8s/k8s-crt-dnldr

k8s-crt-dnldr runs a k8s client that reads Secrets, filtered for
RLS (type:kubernetes.io/tls). It provides a REST API with which a
client can instruct it to write (PUT) or remove (DELETE) a pem
file (concatenated crt and key) corresponding to a TLS Secret in
the cluster. By default, these are written to /etc/ssl/private,
where haproxy reads certificates. After the next haproxy reload
following the write or delete, haproxy will use or not use the
certificate.

Once k8s-crt-dnldr has been instructed to store a Secret, it
responds to Update and Delete events for the Secret by updating
or deleting the file on its own. The controller currently sends
commands to do so as well, but in practice the k8s-crt-dnldr has
already changed the certificate itself (this is not an error).

This means that viking Pods must have RBAC rights to read
Secrets (the fact that these are filtered for TLS is not
expressible in RBAC). That in turn means that viking Pods
must be assigned a service account name, to get the RBAC
role binding.

The controller no longer needs RBAC write privileges for Secrets,
and the "tls-cert" Secret with the hard-wired name is no longer
necessary. The Secret volume that projects "tls-cert" into viking
Pods has been removed.

The port faccess in the headless Service for viking admin
has been renamed to crt-dnldr.

Addresses #36

0ed73905

23 Jul, 2020 1 commit
- Log another statement for democracy. · 697f86e1
  Geoff Simmons authored Jul 23, 2020
  
  697f86e1
21 Jul, 2020 2 commits
- Bugfix: TLS Secret invalid (status Fatal) if tls.crt or .key is empty. · 4cc092c4
  Geoff Simmons authored Jul 21, 2020
  
  4cc092c4
- Speak out for democracy. · 611199e1
  Geoff Simmons authored Jul 21, 2020
  
  611199e1
20 Jul, 2020 1 commit
- Update Helm charts to latest changes · 7a00f207
  Lars Fenneberg authored Jul 20, 2020
  
  7a00f207
10 Jul, 2020 7 commits
- Yet another ... fix CI script for the deleteTLSSecret regression test. · 0addc51c
  Geoff Simmons authored Jul 10, 2020
  
  0addc51c
- Fix CI script for the deleteTLSSecret regression test. · a91f9801
  Geoff Simmons authored Jul 10, 2020
  
  a91f9801
- Fix syntax errors in CI config. · fa7dd912
  Geoff Simmons authored Jul 10, 2020
  
  fa7dd912
- Add the regression test for deleting a TLS Secret to the CI pipeline. · e4ecf8ac
  Geoff Simmons authored Jul 10, 2020
  
  e4ecf8ac
- Add the ExternalName test to the CI pipeline (but without repetition). · 9cd41fab
  Geoff Simmons authored Jul 10, 2020
```
In test/e2e.sh, we run the test twice, to confirm that VMOD dynamic
picks up the changed IP for the DNS name. For now, just run the test
once in the pipeline.
```
  9cd41fab
- Re-order the varnish_pod_template tests in the CI pipeline. · f11f6e7c
  Geoff Simmons authored Jul 10, 2020
```
These are unfortunately set up to run in a specific order. The order
should currently match the order in test/e2e.sh.
```
  f11f6e7c
- Dump the controller log when the varnish_pod_template proxy test fails. · 5158b241
  Geoff Simmons authored Jul 10, 2020
```
Since it has been failing consistently in the CI pipeline.
```
  5158b241
09 Jul, 2020 4 commits

Internal renaming to reflect changing "readiness" to "configured". · f8220d1c
Geoff Simmons authored Jul 09, 2020
```
VCL label and source file names changed.
```
f8220d1c
Empty response bodies for readiness and configured checks. · 159f919a
Geoff Simmons authored Jul 09, 2020
```
Update comments and fix up whitespace while we're here.
```
159f919a

Varnish Pods are Ready when Varnish is running, even without an Ingress. · ba45234c

Geoff Simmons authored Jul 09, 2020

Add the port "configured" to the headless Varnish admin Service,
which responds with status 200 when an Ingress is configured, 503
otherwise. This replaces the previous purpose of the Ready state,
to determine if the Pods are currently implementing an Ingress.

This is actually a small change to the Varnish images and the admin
Service, but a wide-ranging change for testing, since we now check
the configured port before verifying a configuration (rather than
wait for the Ready state). Common test code is now in the bash
library test/utils.sh.

This commit also includes a fix for the repeated test of the
ExternalName example, which verifies that the changed IP addresses
for ExternalName Services are picked up by VMOD dynamic. The test
waits for the Ready state of the IngressBackends. The second time
around, kubectl wait sometimes picked up previous versions of the
Pods that were in the process of terminating. These of course never
became Ready, and the wait timed out. Now we wait for those Pods
to delete before proceeding with the second test.

ba45234c

Bugfix: sync is incomplete if Endpoints undefined for an IngressBackend. · 449af81d
Geoff Simmons authored Jul 09, 2020

449af81d

07 Jul, 2020 1 commit
- Bugfix a potential nil dereference. · 4f8b60b4
  Geoff Simmons authored Jul 07, 2020
  
  4f8b60b4
06 Jul, 2020 3 commits

Add vmod-dynamic · d0f2f5c3
Tim Leers authored Jul 06, 2020

d0f2f5c3
Add the CLI option incompleteRetryDelay, default 5s, must be > 0s. · a00536cb
Geoff Simmons authored Jul 06, 2020

a00536cb

Refactor the return status for sync operations in the main loop. · d5c1528b

Geoff Simmons authored Jul 06, 2020

Previously all sync operations (add/update/delete for the resource
types that the controller watches, and the cluster changes brought
about for them if necessary) were only characterized by a Go error
variable. The only conditions that mattered were nil or not-nil.

On a nil return, success was logged, and a SyncSuccess event was
generated for the resource that was synced. But this was done even
if no change was required in the cluster, and the resource had
nothing to do with Ingress or the viking application. This led
to many superfluous Events.

On a non-nil return, a warning Event was generated the sync
operation was re-queued, using the workqueue's rate-limiting delay.
This was done regardless of the type of error. Since the initial delay
is quite rapid (subsequent re-queues begin to back off the delay), it
led to many re-queues, and many Events. But it is very common that a
brief delay is predictable, when not all necessary information is
available to the controller, so that the rapid retries just
generated a lot of noise. In other cases, retries will not improve
the situation -- an invalid config, for example, will still be
invalid on the next attempt.

This commit introduces pkg/update and the type Status, which classifies
the result of a sync operation. All of the sync methods now return
an object of this type, which in turn determines how the controller
handles errors, logs results, and generates Events. The Status types
are:

Success: a cluster change was necessary and was executed successfully.
The result is logged, and an Event with Reason SyncSuccess is generated
(as before).

Noop: no cluster change was necessary. The result is logged, but no
Event is generated. This reduces Event generation considerably.

Fatal: an unrecoverable error (retries won't help). The result is
logged, a SyncFatalError warning Event is generated, but no retries
are attempted.

Recoverable: an error that might do better on retry. The result is
logged, a SyncRecoverableError is generated, and the operation is
re-queued with the rate-limiting delay (as before).

Incomplete: a cluster change is necessary, but some information is
missing. The result is logged, a SyncIncomplete warning Event is
generated, and the operation is re-queued with a delay. The delay
is currently hard-wired to 5s, but will be made configurable.

We'll probably tweak some of the decisions about which status types
are chosen for which results. But this has already improved the
controller's error handling, and has considerably reduced its
verbosity, with respect to both logging and event generation.

d5c1528b

01 Jul, 2020 5 commits

Update BackendConfig with parameters for ExternalName Services. · 0f1ea9c5

Geoff Simmons authored Jul 01, 2020

These correspond to properties that can be set in VMOD dynamic.
Currently we have:

dnsRetryDelay: this is set as the ttl in the VMOD. Since we get
DNS TTLs from the server, it effectively sets the retry delay
after a lookup gets negative results. More recent versions of the
VMOD have a separate parameter for this purpose, so this should be
updated soon.

domainUsageTimeout: corresponds to the domain_usage_timeout param
of the VMOD director.

firstLookupTimeout: corresponds to the first_lookup_timeout param
of the VMOD director.

resolverTimeout: set with the .set_timeout() method of the VMOD
resolver object.

resolverIdleTimeout: set with the .set_idle_timeout() method of the
VMOD resolver object.

maxDNSQueries: set with the .set_limit_outstanding_queries() method
of the VMOD resolver object.

followDNSRedirects: set with the .set_follow_redirects() method of
the VMOD resolver object.

0f1ea9c5

Update copyright boilerplate in generated client code. · 20124907
Geoff Simmons authored Jul 01, 2020

20124907
Update modules for client code generation. · f9fb5a36
Geoff Simmons authored Jul 01, 2020

f9fb5a36
Bugfix a potential nil dereference. · c542e46e
Geoff Simmons authored Jul 01, 2020

c542e46e
Fix e2e test for ExternalName Services. · be739765
Geoff Simmons authored Jul 01, 2020

be739765

30 Jun, 2020 8 commits
- Apply a BackendConfig in the example for ExternalName Services. · efb6498a
  Geoff Simmons authored Jun 30, 2020
```
Ref gitlab issue #20
```
  efb6498a
- Fix generation of probes for backends from ExternalName Services. · 08f6e38d
  Geoff Simmons authored Jun 30, 2020
```
Ref gitlab issue #20
```
  08f6e38d
- Support ExternalName Services as IngressBackends (more to come). · bd43dbbd
  Geoff Simmons authored Jun 30, 2020
```
Uses VMOD dynamic, and requires that the getdns library is installed
in the image running Varnish. This allows us to use dynamic.resolve(),
in particular so that TTLs from DNS are honored.

Currently sets ttl to a hard-wired value of 30s. Since the TTLs for
lookup are obtained from DNS, this actually sets the delay until
lookups are retried after negative results (default 1h).

The next step is to test and extend BackendConfig support to configure
properties of VMOD dynamic. That will make it possible to configure
the ttl value (although we might stay with a much shorter ttl than
1h).

Partially addresses gitlab issue #20.
```
  bd43dbbd
- Minor formatting improvement. · 67bea72d
  Geoff Simmons authored Jun 30, 2020
  
  67bea72d
- Bugfix a potential nil reference. · 7a875506
  Geoff Simmons authored Jun 29, 2020
  
  7a875506
- Raise some log levels from trace to debug for Ingress update. · e1675758
  Geoff Simmons authored Jun 29, 2020
  
  e1675758
- Install VMOD dynamic version 2.1.1 in the Varnish image. · 73b04bc3
  Geoff Simmons authored Jun 29, 2020
  
  73b04bc3
- Install VMOD dynamic in the Varnish image. · 804eae10
  Geoff Simmons authored Jun 22, 2020
  
  804eae10
18 Jun, 2020 1 commit
- Helm charts: Fix naming of the admin service · 0cbc71ca
  Lars Fenneberg authored Jun 18, 2020
  
  0cbc71ca
12 Jun, 2020 5 commits
- On Ingress update, update the viking Service Endpoints. · c8d9ad8b
  Geoff Simmons authored Jun 12, 2020
```
These weren't necessarily being changed after an Endpoints update.
```
  c8d9ad8b
- Wait 1 second longer before verifying initial deployment (sigh). · 54c8490d
  Geoff Simmons authored Jun 12, 2020
  
  54c8490d
- Add a String() method to vcl.Address. · 720ad2f6
  Geoff Simmons authored Jun 12, 2020
  
  720ad2f6
- varnish.Controller.HasConfig() checks for changed viking Endpoints. · a5c865d8
  Geoff Simmons authored Jun 12, 2020
  
  a5c865d8
- Update offloader endpoints when an Ingress is updated. · aa6c23e2
  Geoff Simmons authored Jun 11, 2020
```
The Ingress update may have followed an update for Endpoints.
```
  aa6c23e2