Skip to main content

Efficient Kubernetes Log Aggregation with Vector

·8 mins
Table of Contents

The problem with ElasticSearch and Loki #

While Kubernetes has become the de facto standard for container orchestration, one of the problems that the distributed nature of Kubernetes introduces is the long term storage of logs for containers which are no longer actively running in the cluster.

Typical solutions which one may reach for are ElasticSearch (or it’s sister project OpenSearch), Grafana Loki, AWS Cloudwatch or similar.

While these solutions do solve the problem in a way, they introduce a whole new set of problems. They are a maintanence burden, requiring carefully speced nodes to run on and can become prohibitevely expensive for clusters which generate large amount of logs. In my experience Loki is unable to deal with the high cardinality logs generated by kubernetes, and OOMs under extremely simple queries. Stripping away high cardinality labels like container id, pod names or pod template hashes may work but is a lot of work. OpenSearch isn’t really meant for logs and takes an entire datacentre for what I would consider a quite reasonable amount of logs, while everything else it generally too unstable or expensive. Many of the new logging solutions aim to support storing logs in S3, which by all measures and metrics is terribly suited for constant the appending and grepping needed for logs.

Cluster operators may find that these solutions make up a significant fraction of the entire cluster cost, aren’t as reliable as they claim to be under resource constraints. Moreover, one often finds they can not store as much logs as they would like to, without spending and arm and a leg.

The Cheap, Grepable and “Free” Logs of Old #

Traditionally (before systemd), logs in Linux-based systems are stored in the /var/log directory.

To manage log files, old-school systems use a utility called logrotate, which is typically configured to run daily.

By default, logrotate compresses rotated log files using the gzip compression algorithm, which reduces the size of the log files and saves disk space. Compressed log files are typically named using the pattern <log-filename>.<date>.gz, where <log-filename> is the name of the log file, <date> is the date on which the log file was rotated, and .gz indicates that the file has been compressed using gzip.

For example, if the nginx.log file is rotated on January 1, 2022, the rotated and compressed log file would be named nginx.log.20220101.gz. This convention makes it easy to identify and manage log files by date, as compressed log files can be easily sorted and searched using file management tools such as zgrep and zless.

This is what I set out to achieve with Kubernetes, I wanted all my logs available in a structured directory tree, gzipped and rotated by day.

The goal is to have logs stored on a persistent volume by namespace and “application instance”:

├── ingress-nginx-20230415.log.gz
└── ingress-nginx-external-20230415.log.gz
├── calico-kube-controllers-20230415.log.gz
├── calico-node-20230415.log.gz
├── kube-apiserver-20230415.log.gz
├── kube-controller-manager-20230415.log.gz
└── snapshot-controller-20230415.log.gz

Vector #

Vector is a new, lightweight and efficient log forwarding tool written is rust that is gaining popularity in the Kubernetes ecosystem. It can grab logs from kubernetes nodes and forward them on to pretty much anything, including Loki, OpenSearch and more.

In this article, we will discuss how to use two separate configurations of vector, which will result in neatly sorted logs written directly to a Persistent Volume Claim (PVC) without the use of any other aggregation system.

If you don’t have flux installed already, I recommend you install it so you can install vector helm charts with the HelmRelease custom resource instead of directly running helm. Nevertheless, you may just use the values section and install it directly with the helm cli if you like.

We first install vector in “agent” mode. This will deploy a daemonset whereby each host node will have a vector agent which grabs local log files and attempts to forward them and respective metadata to the vector-aggregator service, which we’ll in a moment:

kind: HelmRepository
  name: vector-charts
kind: HelmRelease
  name: vector-agent
      chart: vector
      version: 0.24.1
        kind: HelmRepository
        name: vector-charts
        namespace: flux-system
    role: Agent
      data_dir: /vector-data-dir
          type: kubernetes_logs
          type: vector
          inputs: [kubernetes_logs]
          address: "vector-aggregator:6000"

The config couldn’t really be simpler, vector abstracts all the details about grabbing kubernetes logs away from us and just does it when the kubernetes_logs source is specified.

Our single sink is will be an aggregator, which will be the one receiving all the logs and writing them to disk. While in theory all agents could mount an NFS volume and write to files directly themselves, we’d run into performance issue and interleaving issues if we want logs for the same deployment to go into the same file.

Vector as an Aggregator and Writer #

The config for the aggregator is a little bit longer but still relatively compact:

kind: HelmRelease
  name: vector-aggregator
      chart: vector
      version: 0.24.1
        kind: HelmRepository
        name: vector-charts
        namespace: flux-system
    role: Aggregator
      data_dir: /vector-data-dir
          type: vector
          version: "2"
          type: remap
          inputs: [vector]
          source: |-
            .labels = .kubernetes.pod_labels
            .app = .labels."" || || .labels."k8s-app"
            .filename = .app || .kubernetes.container_name || "unlabeled"

            .folder = .kubernetes.pod_namespace || "unlabeled"

            .pod = .kubernetes.pod_name
            .container = .kubernetes.container_name            
          type: file
          inputs: [sort]
            codec: json
              - timestamp
              - message
              - stream
              - pod
              - container
          path: /var/log/k8s/{{ "{{" }} .folder {{ "}}" }}/{{ "{{" }} .filename {{ "}}" }}-%Y%m%d.log
      - name: logging-pvc
          claimName: logging-pvc
      - name: logging-pvc
        mountPath: /var/log/k8s

We use “vector” as the source, telling the aggregator to listen on port 6000 for logs sent in by the agents running on each node.

The transforms is where the magic happens. We create an additional filename field equal to the label if it exists. If it doesn’t exist we then try the app label and fallback to container_name

We then set the directory label to the namespace of the pod which logged the line.

For some very short-lived containers (like jobs which completed in seconds) vector is unable to grab associated metadata with the log line, so the kubernetes metadata is missing, in that case we set both directory and filename to unlabeled so we don’t lose the log line. This problem isn’t limited to vector and happens with fluent-bit as well.

Finally we configure the sink which is a PVC mounted to the aggregator. The manifest above assumes a separately created a logging-pvc which can just be mounted on /var/log/k8s in the vector-aggregator pod.

We then configure the log path to use the folder and filename (namespace and app instance) via yaml templating. Note that in our example we hade to escape the templating to work around helm.

Lastly we use only_fields to reduce the size of the log. By default each log lines is massive as it includes pod labels, container hash, replicaset id and more. As we already sort logs into files by namespace and app instance, the only valuable labels are the pod name and container name which let you filter down log lines a bit more. Ofcourse we also keep the timestamp and the actual log line message.

You can actually configure vector log straight to gzip or zstd files, but note that it introduced delays and risk of log file corruption in case of restarts/interruption. Furthermore it’s often useful to tail -f today’s file in real time, which you can’t do as easily with buffered and compressed files.

Lets have a look at the logs that the aggregator is writing:

▶ k -n default get deploy blog
blog   3/3     3            3           219d

And are written to the same log file:

% tail -1 blog-20230909.log | jq
  "container": "blog",
  "message": " - - [09/Sep/2023:18:49:21 +0100] \"GET /favicon-32x32.png HTTP/1.1\" 200 1457 \"\" \"Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/ Safari/537.36\" \"\"",
  "pod": "blog-7d75d75449-llpmc",
  "stream": "stdout",
  "timestamp": "2023-09-09T17:49:21.836351874Z"

Archiving #

As the vector aggregator creates new files daily, we’ll want to setup some archiving in order to compress old log files and eventually delete them. The following CronJob comes to the rescue:

apiVersion: batch/v1
kind: CronJob
  name: archiver
  annotations: disabled
  successfulJobsHistoryLimit: 3
  suspend: false
  concurrencyPolicy: Forbid
  failedJobsHistoryLimit: 1
  # Every 5 past
  schedule: '5 * * * *'
          - name: archive
            command: [sh, -c]
            # Delete all files older than 30 days
            # Compress all files older than 1 day
            # Symlink latest foo-$date.log to foo.log
            - >
              find /var/logs/k8s -type f -mtime '+30' -delete -print &&
              find /var/logs/k8s -type f -mtime '+0' -name '*.log' -exec gzip {} \; &&
              find /var/logs/k8s -type f -name "*$(date +%Y%m%d).log" | while read latest; do
                ln -s -f $(basename "$latest") "$target";
                cpu: 10m
                memory: 10Mi
              allowPrivilegeEscalation: false
                drop: ["ALL"]
            - name: logging-pvc
              mountPath: /var/logs/k8s
          restartPolicy: Never
          - name: logging-pvc
              claimName: logging-pvc

The cron will:

  • Delete files older than 30 days
  • Compress all but todays logs
  • Symlink foo-$today.log to foo.log for convenience

Note that we can’t actually use the real logrotate, as we need to use vector’s date based templating. At present vector does not recreate the file if it’s moved and keeps writing to the old file handle. While you could use copytruncate it’s not recommended as you’ll likely lose data and have broken json lines at the time of rotation.

Conclusion #

That’s it! Assuming /var/log/k8s on the vector aggregator side is persistent, your logs wil be retained and an easily searchable and viewable fashion, allowing you to use all the cli tools you are used to. No clunky web frontend, or fat processing nodes necessary.

You can find all the manifests for this post on my github repository