flux v2 monorepo experience
Hello, my name is Aleksei, I am a DevOps engineer, and today I want to talk a little bit about an infrastructure solution of my key client.
A little about my work and responsibility split. In Altenar I provide services for setting up and maintaining cloud infrastructure based on GCP, as well as monitoring, alerting, logs, etc. I am part of the automation team, which also deals with release pipelines, improving various processes in the company, and other important things. CI, builds, network, and access are handled by other teams, and I touch these parts very little or not at all.
In my team, we have been using flux v1 for self-hosted clusters and for GKE based in the format of “one cluster — one repository”, with “templates” in the form of Git submodules attached. This approach has its drawbacks, one of the most significant being that the first version of flux is no longer supported since the end of 2020.
Thus, we had a certain technical debt in the form of the first flux, and the need sooner or later to upgrade to version 2 or switch to other tools. The first option was chosen, and at the same time, I decided to improve something in our current structure, getting rid of a bunch of different repositories and abandoning “template-submodules”.
The peculiarity of the move is that I could recreate some clusters without problems immediately with the new installation of flux v2, for example, my sandbox and dev cluster, and some — in production, for which it was necessary to consider DR issues if something went wrong and create a manual for migration with minimal downtime.
How I built a new monorepo and why.
The old scheme.
repo-1
└── flux-v1-cluster
├── submodule-base
│ └── charts
└── submodule-gke
│ └── charts
└── charts
repo-2
└── submodule-base
└── charts
repo-3
└── submodule-gke
└── charts
Let’s say I need to see what chart is being deployed to the cluster. I see that it’s part of flux, so I go to the flux-v1-cluster repo, try to find what I’m looking for and don’t find it. I search further, remember that there are submodules, go to submodule-base, look for it there, and it’s not there… I go to the neighboring repo submodule-gke and voila! It’s there! :)
And the next time you remember finding a certain chart in submodule-gke, you go there immediately and it’s not there… because it’s in another repo submodule-base… Anyway, if you’ve ever worked with linked repositories once or twice a month, you already know what the problem is.
In addition, there is a problem with submodule versioning. At some point, I couldn’t figure out what was wrong, because a chart was being deployed to a cluster that wasn’t in any submodule. It turned out that this chart was added to a custom submodule branch, and it would be quite difficult to guess this without scrutinizing the version of the submodule being used.
AI comment (thank you so much, ChatGPT!):
This code organization has several drawbacks:
1. Difficulty managing dependencies: In this structure, managing dependencies between different repositories and their submodules can be quite complex and can lead to dependency conflicts.
2. Increased deployment complexity: This code organization can lead to a more complicated deployment process, as updates need to be coordinated across different repositories and submodules.
3. Limited flexibility: With this code organization, changing one component may require updating dependencies in multiple repositories and submodules, which can limit flexibility and increase the time it takes to make changes to the system.
4. Increased development time: If multiple teams are working on different repositories and submodules, this can lead to increased development time and reduced team productivity overall.
5. Difficulty scaling: If a new component needs to be added or an existing one needs to be changed, this can make scaling the system difficult as changes will need to be coordinated across multiple repositories and submodules.
New scheme.
Despite the fact that flux v2 also supports “submodules” as separate repositories, I implemented a monorepo based on one of the examples in the documentation:
https://fluxcd.io/flux/guides/repository-structure
.
├── apps
│ ├── common
│ │ └── monitoring
│ │ ├── prometheus
│ │ └── prometheus-rules
│ ├── kafka
│ ├── rabbitmq
│ └── redis
├── clusters
│ ├── cluster-1
│ │ ├── custom-resources
│ │ └── sync-code
│ ├── cluster-2
│ │ ├── custom-resources
│ │ └── sync-code
│ ├── gke-template
│ └── self-hosted-template
└── infrastructure
├── controllers
│ └── nginx-ingress
└── lb
└── metallb
Advantages:
- All the necessary “modules” and applications are in one place, in a single repository.
- It’s easy to add new applications both to the cluster and to the generalized “templates” of clusters.
- When using a custom branch for the cluster, all the applications are also at hand, and their versions will not accidentally change due to the work of colleagues with the master branch.
- When creating PRs in the master monorepo, unlike PRs in a submodule, you are sure that it affects all resources, and therefore you will be more careful 🙂
Disadvantages:
- If you poorly thought out the cluster structure and decided to significantly change it on the fly, when several clusters are already using the monorepo, it will be quite painful.
- Regarding PRs and their impact on the entire infrastructure, it is also fair to consider it a disadvantage.
The neural network has its own opinion on the matter, of course, I asked her to compare which is better:
Assessing whether it is better or worse depends on project requirements and team organization.
A single-repository structure has the following advantages:
1. Simplified dependency management: this structure simplifies dependency management between components, as they all reside in a single repository.
2. Simplified deployment process: simplifying dependency management also leads to simplifying the deployment process, as updates can be coordinated within a single repository.
3. Improved flexibility: in case a component needs to be changed, changes can be made within a single repository, improving flexibility and reducing the time for system changes.
4. Improved team performance: a single-repository structure can improve team performance by facilitating coordination of work on different components.
However, a single-repository structure also has some disadvantages:
1. Increased repository size: all components are located in one repository, which can increase its size and complicate version control management.
2. Scaling limitations: if a new component needs to be added or an existing one changed, it can be difficult to scale the system because code changes need to be made within one repository.
3. Increased development time: increased repository size can lead to longer development time and decreased overall team performance.
Thus, both structures have their advantages and disadvantages, and the choice between them depends on specific project requirements and developer team
A little bit about packaging applications
Even after reading the documentation several times, for a long time, I could not understand what a “packaged application for flux v2” actually represents.
As a result, the simplest and most understandable scheme for describing this for me turned out to be like this:
Essentially, to package an application for a monorepo, you collect a set of resources that the application needs, such as namespace, policy, and other row manifests. If the application has a Helm chart, you add the Helm release. And you simply put all this in one directory and add a kustomization that lists the files used.
Best practice from my experience:
If the application has any configurable parameters (for example, classic values for a Helm chart), connect them immediately as a configmap and make its use mandatory. If you need to install with default parameters, just add an empty configmap.
If you want to add any components to the cluster that require CRDs, think about how you will package them into a separate “application” in Flux v2.
Example — kafka-alerts in my repository.
Homework — try adding any Prometheus rules directly to the cluster, and you will immediately understand why I split the cluster code into 2 directories: custom-resources and sync-code.
You can see an example of packaging Prometheus in the example repository and even deploy it yourself, more details at the end of the article.
Flux v1 -> Flux v2 Migration Order
Here everything is more or less simple, but there are some nuances.
We use the practice of preparing a change plan “as for a five-year-old child,” so that any of the available engineers can go through the checklist items and not forget anything critical (although situations can be different).
If you have a severely outdated Prometheus, as I did, it will not reconcile because its CRD will not match the new version. It is necessary to manually delete all old Prometheus resources and definitions before rolling out the new one.
In general:
- Scale deployment of Flux v1 to 0
- Bootstrap flux v2 in the cluster (there are several ways to do this, the simplest one is described in my repository)
- Apply the code of the new cluster
- Manually change or delete old annotations for components that were managed by flux v1 (but I prefer to delete the old installation and install the application again, as I can afford it during the maintenance window)
Overall, everything should go smoothly. And in case you are doing this for the first time, I recommend experimenting on a demo cluster.
An example of code for creating a sandbox in gke using terraform can be found here:
https://github.com/ksemele/tf-gke-test
An example of code for the flux v2 monorepo for this cluster can be found here: