Devops is all made up

February 17, 2021 6 minutes

DevOps is touted as unilaterally increasing productivity and value delivered by teams. It is my opinion that much of DevOps implementations are slow and impractical, and many devs will find your average DevOps to be less than useful.

No one agrees on what and how DevOps actually is meant to be

There are about two things that pretty much anyone claiming to exercise DevOps will probably agree are necessary.

  • CI/CD
  • Infrastructure as Code

The one thing pretty much anyone claiming to be exercising devops is CI/CD.

The Developer Experience

You think: DevOps is great, you can leverage CI/CD to verify your changes, run tests for you and all of that. You may not even have to bother setting up a development environment with all the matching dependencies!

Your CI builds images very slowly

Docker in Docker and Caching is broken

Starting up Docker in Docker is slow, before you even start building your image, your CI probably needs to spin up a docker in docker instance against which you can build an image. This usually takes a few seconds on Gitlab, if your CI isn’t well setup, it may mean pulling the dind image on builds, painfully slowing down the whole process.

The problems don’t end there, docker layer cache won’t be in your CI builds if you are using Github Actions, Gitlab, Tekon, Drone or pretty much anything else I’ve looked at. You have the option of pushing layer cache inline to the registry with recent version of docker or kaniko, but that also invariably ends up being slow once your images become a little bit bigger (maybe you need tensorflow) or very small (fast go builds), due to having to pull the full image at the start of every build. In many situations building without cache is actually quicker than pulling the cache at the start of every build. Some CI’s have ways to save artifacts, which you can try and utilise as layer cache, but in my experience this also ends up being slower and requires abuse to get working. Busting cache, by say upgrading a dependency, is also very painful, as you spend time pulling inline cache, only to not use it.

The only quick way to build containers, with instant access to layer cache as you would have on local docker builds, is via a static buildkitd instance exposed over tcp and making sure all of you dockerfile use RUN with cache mounts. This will let you go from builds taking several minutes to builds taking seconds.

Your Tests Run slowly

After you have waited through the arduous process of rebuilding the container, perhaps due to a single one line change, it’s time to run all the tests. Your CI probably runs all the tests, rather than the one unit tests you know you are interested in. Your test suite may take several minutes, to get the results of your test, you have to wait multiple minutes for your container to build, only to see that one test fail over 10 minutes later. You fix your commit based on the feedback again, and exhale as you get ready for another slow wait.

Your CI is a pain to use

The average CI feedback loop is not only painful slow at spitting out results, but is also not easy to get results from. You commit your fix, you push, then you click around the UI as you look for the new CI run that you hopefully launched and wait for it to progress. If it fails, it will probably be at the test stage you think, but you don’t click on it yet as the build or any stage before the actual test could also in theory fail, as the test bubble starts running, you finally click one final time to see your test output, at which point you may have to scroll a little bit more to find what you actually want as soon as it comes up.

The end result is a CI which no developer wants to use while actually developing. Because your CI pipeline is written specifically for the CI runner and requires some push/pull credentials, you probably can’t just run it locally. As a developer you’ll probably just resolve to using docker build locally and then running the tests manually. If they don’t like docker they’ll probably end up installing all the dependencies the traditional way and hope the tests all end up passing in the CI at the end of the day.

CD via your CI is a security flaw

At this point you may say, that may complaints are in vain, as CI/CD testing isn’t meant to save the developer from testing manually, it’s only to validate that the production container passes the test before it deployed by the CD. This massive reduces the effectiveness of your CI value chain and make everything slower, hindering you from using it quickly in pull requests and for other to review the validity of your changes in PRs.

CD is often just a fancy name for the final stages of the CI to which one has given all the production keys. If we are to follow the advice of GitLab, GitHub, Tekton and other CI products, they will promote deployment to production via CI. This is a massive security risk. Above that it is probably not as reliable as it is claimed to be, especially if it’s deploying yaml to Kubernetes, as the CI really has no knowledge of the cluster it is deploying to or its state. This ties me into my next point Infrastructure as Code.

Your Infrastructure as Code is flaky

If you follow the ways of the big CI vendors, you probably have bunch of yaml in git, in which somehow you avoided to add yall your production secrets, which you probably stuffed in CI env variables. You then relly on the CI to make your infrastructure via the code. It’s an inflexible solution that isn’t very robust, likely extremely repetitive, with zero feedback loop with the actual infrastructure. You’re bound to need to manually check and ensure that what you’ve defined through code is the actual infrastructure that you currently have.

Your CI is a unique Snowflake of shell scripts sprinkled in yaml

Lets be honest, your average CI is a long yaml file full of shell commands run in various docker containers. You’re not really sure if you are running in a fully bash compatible shell with the number of alpine based containers, and you are probably heavily relying on all sorts environment variables. Your shell scripts are quite different from everyone else, the way you version your code is probably different from everyone elses, do you use git tags, probably language library version or distro package versions (assuming you also package your program for a distro). Every project you pick up does it slightly different and you are not sure what you are really meant to do and how it all fits together. You fear having to read the massive shell+yaml CI soup of the repo to figure out how to push out a quick fix you have already somehow tested without waiting fora really long CI running on a feature branch.

What is the Good Setup

  • Buildkitd on a machine with lots of disk of cache. Use cache mounts in your dockerfile for npm, go, pip, apt, whatever.
  • Define your K8s manifests with Kustomize and have production, staging and localdev overlays.
  • Use skaffold for a quick feedback loop for local dev, without manual dev env setup.
  • Flux or ArgoCD to reconcile IAC into the cluster intelligently and securely. Avoid “CIOps”.