k8s GitOps like it's 1985

Find a file

Nils Goroll 8861967b1e Release 0.1.3		2025-11-14 19:27:08 +01:00
charts/deflux-compensator	Release 0.1.3	2025-11-14 19:27:08 +01:00
scripts	Initial public release	2025-11-14 19:26:31 +01:00
.gitignore	Initial public release	2025-11-14 19:26:31 +01:00
LICENSE	Initial public release	2025-11-14 19:26:31 +01:00
Makefile	Initial public release	2025-11-14 19:26:31 +01:00
README.md	Initial public release	2025-11-14 19:26:31 +01:00

README.md

DEFLUX.COMPENSATOR: k8s GitOps like it's 1985

or: Replacing gigabytes of operators with ~100 lines of bash (and helm)

This README is an attempt to start with just facts, but it is getting more and more opinionated, because the reasons for its existence are opinions.

What is this?

This project aims to provide minimal scaffolding to interact with a kubernetes (k8s) cluster API in a GitOps manner.

It deliberately does not aim to be smart, but it very deliberately aims to be simple transparent about what it does.

Who is this for?

For people who like to know what they are doing and have issues working with magic tools.

What does the DEFLUX.COMPENSATOR do?

In a k8s container:

loop:
- clone/fetch a git repo
- run scripts from it

deploy_loop.sh.tpl is really all it does.

There only is a pull mode, because any web hook or alternative methods of accepting external communication from increases the attack surface.

What is this useful for?

The scripts called by the DEFLUX.COMPENSATOR can do whatever they want, but in particular, they are intended to call helm to install/upgrade cluster resources.

test.chart.deploy.sh is an example for this functionality: The script checks if there are any changes to deploy and, if so, attempts a helm upgrade or helm install.

If you want automatic reconciliation, you can modify the script to skip the diff step and always run helm upgrade or use plugins like helm-drift.

PROs

They main strengths of DEFLUX.COMPENSATOR are:

transparency through extreme simplicity
a simple option to suspend any action easily by creating a file in the watched repository (default: .SUSPEND.DEFLUX)
Friendly coexistence and consistency in results with manual helm invocations
least privileges through the used serviceaccount, bound to a role or cluster role
easily expandable through shell scripting

CONs

Neither modern nor fashionable
No fancy GUIs

How to use?

This project comes as a helm chart itself. The basic steps to use it, are:

create a values.yaml based upon charts/deflux-compensator/values.yaml.
- most certainly, you will want to change at least deflux.repo, deflux.branch and deflux.delay.
  
  if you need additional commands to be executed, put them under deflux.setup.
- The privileges under rbac should be changed to be minimal and sufficient for the deployment.
  
  The example is for the test chart to be installed:
  - helm itself creates a secret called sh.helm.release.v1.test.v* with a dynamic version at the end, which we can not properly map to resourceNames, because those are either specific or the catch-all "*". So, unfortunately, we need to allow access to all secrets.
  - for create and list, limiting privileges by resourceNames does not work.
  - all other permissions are only on resources called test.
- The privileges granted under role.rules are relatively extensive for the helm example chart. You should edit these, and in particular resrouceNames (which can also be omitted).
  
  A good set of privileges is both minimal and sufficient.

install the chart by whatever means you usually do this. For command line helm, the steps are:

helm repo add uplex https://code.uplex.de/api/packages/slink/helm
helm repo update
helm install deploy uplex/deflux-compensator -f my.values.yaml

The above will tell you how to watch the logs of the container running the git fetch / run script loop.
In the deflux.repo, create one or more scripts to run the actual job(s) in the directory specified by the deflux.scripts value (default: deflux.d). These scripts need to have the .sh file name suffix and be executable (chmod 755 for good measure).

For an example, see test.chart.deploy.sh.

Why?

This small project was motivated by problems experienced in real, day to day, DevOps life working with modern CI/CD tools. Such tools solve the task of deploying resources to kubernetes from git repositories with massive amounts of overhead.

The problem with modern CI/CD tools

Talking about one tool in particular, which happens to have a name similar to this project, what basically happens to deploy resources is this¹:

The tool somehow found it's way onto the cluster and is running with some approximation of root privileges (because it usually has full rights on the k8s API).
It gets told GIT URLs to clone
In these repositories it knows about, it looks for yaml files and creates cluster resources from them
Some of these cluster resources represent themselves deployment instructions.
Following these instructions, the tool might implement any of this
- bundle helm charts and upload them to a registry, potentially (likely) externally hosted
- template yaml files into yaml files in ConfigMaps, created on the cluster
- execute a helm chart with values pulled once again from these ConfigMaps and/or elsewhere
- maybe, eventually, consolidate a correct and complete deployment (or not)

Real life problems the author of this little tool has personally experienced, are:

the whole process taking minutes, sometimes more than 10, to complete, for small changes which would be applied in seconds using helm directly
the actions taken by the CI/CD tool being largely opaque
human errors taking a very long time to surface (and becoming obvious at just "some point in time", whenever the tool has arrived at the actual deployment), which are then hard to track down because of all the opacity
these tools sometimes just doing the wrong thing and templating something different than a local helm setup
recovering from errors being largely trial and error

There are two main problems here, besides all the complexity, which is inherently bad: Insecurity on the part of the person using these tools, because they are opaque and, related, a total lack of transparency because the tools can not run locally and sometimes produce different results than a locally invoked helm, for example.

So why use these simple shell scripts?

The reason is transparency: Using such a simple approach gives us verbose logs which clearly and directly show the root cause of errors.

The overarching rant: Why this is all crazy madness

One might have opinions about kubernetes as a whole, but among other properties which can be advantageous for very good reasons, there is one concept which might be convincing: declarative configuration. You tell the cluster what you want and the cluster will try to install/run what is specified, no matter the exact details, and no matter changing circumstances like lost nodes or network failures.

So a k8s API is a highly abstracted meta layer: You fill out an order form and eventually you might get what you want.

Initially, the way to configure the cluster was to create yaml/json and send it to the cluster API. The cluster would then store the resource definitions in its etcd database and keep them there for as long as the cluster stayed alive.

Then people started realizing that maintaining these order forms was tedious, and the helm chart was born which would add another abstraction layer: Now we define templates of order forms to be filled in automatically with (hopefully more simple) values and hand the result to the cluster.

But then you still need a tool running somewhere external to the cluster which can install things. Hm, bad, how are we going to make sure that this is not abused? Would it not be better to run stuff from within the cluster?

That's a main advantage of GitOps CI/CD. But now what we get is meta-meta-deployment: We define an order form which defines templating which defines cluster resources.

But not enough, because the CI/CD now runs in the cluster, it also uses cluster resources for everything, including it's own state and transient, intermediate results. A whole lot of back and forth and added complexity is the result.

From a security perspective, tiller was removed from helm to avoid the security implications of a server component, and now we are back to an all mighty binary with hundreds of components baked in, running with the k8s equivalent of root privileges. What could possibly go wrong...

The Reconciliation argument

Another selling point for these CI/CD tools is the reconciliation argument, that they would ensure the state of the cluster matching the state declared in a (git) repository.

Well, wait, isn't that what the cluster itself should do? Didn't we just understand that the cluster creates resources based on a declaration and keeps them until changed?

So why do we need reconciliation for something which already provides reconciliation?

Answer: Because people use kubectl ... to change the running state.

To this problem, there are two answers: If such changes are unintentional, then why do people have accounts privileged enough to apply them? And if they are intentional, it is probably a good idea to avoid some robot intervening, at least for the moment.

So yes, in practice, one might actually want visibility into differences and maybe even some process to pull things straight once in a while (nightly?), but having big brother constantly enforcing the rules will make agile troubleshooting impossible: While the CI/CD tools do support some way to suspend them, the means of deployment are different to what a developer/administrator uses locally, so there is no (good, clean) way to intervene when things go haywire.

The author takes full responsibility for this description being superficial and incomplete. ↩︎