First8 is gespecialiseerd in het pragmatisch ontwikkelen van bedrijfskritische Java toepassingen waarbij integratie van systemen, hoge eisen aan beveiliging en veel transacties een belangrijke rol spelen. Op deze pagina vindt je onze blogs. Op www.first8.nl vind je ons nieuws, de agenda, vacatures en meer.

Kubernetes & Logging

Kubernetes & Logging

When you are running services in a Kubernetes cluster they are probably writing log statements that contain all sorts of useful information you need to have look at now and then. Being the good developer you behaved and made sure all logging statements are written to the console. This gives you the possibility to use kubectl logs -f <pod-name> -c <container> to tail the logs of a specific pod and container. More that enough when you are developing the services. Not so useful once you get to production. A more robust logging solution is needed.

Elasticserach, Fluentd, Kibana

The combination of these three open source projects keeps popping up more and more when it comes to centralized log collection and analysis. Elasticsearch is an open source search engine known for its ease of use. Fluentd is an open source data collector, which lets you unify the data collection and consumption for a better use and understanding of data. Kibana is an open source Web UI that makes Elasticsearch user friendly for marketers, engineers and data scientists alike. By combining these three tools EFK (Elasticsearch + Fluentd + Kibana) we get a scalable, flexible, easy to use log collection and analytics pipeline.

Collecting Logs

When gathering the logs of a Kubernetes cluster there are two types of logs you want to collect. On one side we have the system logs of the cluster itself and on the other side we have the container logs. Before Fluentd will start collecting the logs we need to tell it where to find the logs by updating the fluent.conf file.

System Logs

Kubernetes components not running in a container write log information to files on the host system. They can mostly be found in the /var/log directory. To mark a host system log file for collection you simply add a <source> section to the configuration file for each file you want to collect. The <source> section for kubelet would look something like this.

Container Logs

Containers by default log there stdout en stderr to a file on the host file system. This is taken care of by the docker daemon. The files are located in /var/lib/docker/containers/. Each container has its own directory, where the directory name is the container id and the logs file looks like <container-id>=json.log. For example:

To have Fluentd collect the container logs we can add a <source> section like this:

This is pretty neat and gives us all containers, but since we are running Kubernetes we can do better. The Kubernetes kubelet makes a symbolic link to this file on the host machine in the /var/log/containers directory which includes the pod name and the Kubernetes container name:

The filename has meta data we can use in combination with the Fluentd Kubernetes Metadata plugin to add meta data to our log events. To achieve this we have to change the container <source> section to:

This source will product log event tagged:

If we then apply the Fluentd Kubernetes Metadata filter by adding the following to the Fluentd config:

We get a log event with Kubernetes metadata embedded:

So far so good. The Kubernetes meta data plugin added meta data for our container. The only thing missing is proper parsing of the log attribute of the event. The sample service is a Spring Boot application using Spring Cloud Sleuth for tracing and it outputs log information using a very specific format.

Before we write our log event to Elasticsearch we would like to parse this line to extract the different fields. As we are reading a lot of log files not every event originates from a Spring Boot application. We only want to parse the log attribute when the event is generated by a Spring Boot application. We can achieve this by using the rewrite tag filter plugin to mark events as spring-boot parsable.

This will effectively prefix the tag of the event with the value of the fluentd-log-format label if you specified it in the labels section of the container metadata in the Kubernetes descriptor for the pod. The second rule is to make sure we match all other events too, or else they disappear.

Now we can use a filter, which triggers on the spring-boot tag prefix, to parse the log attribute and extract all the different fields.

When the events go through the filter we end up with:

When we send this to Elasticsearch we have a lot of indexed fields we can use for creating awesome dashboards in Kibana.

This is a cross-post from my personal blog. Follow me on @acuriouscoder if you like what you’re reading or subscribe to my blog on https://acuriouscoder.net.