centralized logging with loki

I have a lot of hosts to manage, from VMs to LXC containers to remote servers. When something goes wrong, it’s tedious to figure out what went wrong on which one of the servers. It’s just not feasible and requires a lot of manual debugging everytime.

I had a basic setup of some bash scripts which would manage syslog of every host but I wanted to make it a bit organized. Therefore, I looked at different options and naturally stumbled upon the most popular log stack which is loki, promtail, grafana.

The most important thing was to find the most lightweight log stack and also host it natively in an alpine LXC container which will take minimal resources.

loki and promtail are definitely lightweight but not much in comparison to VictoriaLogs. The below benchmarks, taken from VictoriaLogs github repository, shows that loki takes way more RAM than VL:

lok

setup

We won’t be using docker or any other “orchestrator” for this one. We will run everything on alpine LXC container, which you can create on proxmox.

Luckily, alpine’s community repository includes the latest builds for loki, promtail and grafana so we will use that.

I have allocated 2vCPU, 2GiB RAM and 40GiB for disk space. We will run some benchmarks later on to see if we can smoothly run with this much resources or not.

Let’s install ssh and enable it:

apk add openssh
service sshd start
rc-update add sshd

We will also install supervisord, which you can think of as a lightweight systemd (without being privileged ofcourse) for running daemon processes, which we are gonna do.

apk add supervisor

configuration

Vector is an optional program, which we aren’t gonna use for this post.

Install loki, grafana and promtail:

apk add loki promtail-loki grafana

Alright, let’s start with the basic loki config:

auth_enabled: false

server:
  http_listen_port: 3100

# Configuration for common settings across Loki components
common:
  # Prefix for paths used by Loki, useful for running multiple instances
  path_prefix: /loki

  # Storage configuration for Loki
  storage:
    filesystem:
      # Directory where chunk data is stored
      chunks_directory: /loki/chunks
      # Directory where rule data is stored
      rules_directory: /loki/rules

  # Number of replicas for each log stream
  replication_factor: 1

  # Ring configuration for Loki's distributed hash table
  ring:
    kvstore:
      # Store type for the ring's key-value store
      store: inmemory

# Cache configuration to improve performance
cache_config:
  # Enable caching for the index
  index_cache:
    enable: true
    max_size_mb: 1024  # Maximum size of the cache in megabytes
    validity: 1h       # How long to keep cache entries before expiring them

  # Enable caching for chunks
  chunk_cache:
    enable: true
    max_size_mb: 1024  # Maximum size of the cache in megabytes
    validity: 1h       # How long to keep cache entries before expiring them

# Schema configuration for Loki's data storage
schema_config:
  configs:
    - from: 2024-10-24  # Date from which this schema configuration is valid
      store: tsdb       # Storage type for the schema
      object_store: filesystem  # Object store type for the schema
      schema: v13       # Schema version
      index:
        prefix: index_  # Prefix for index files
        period: 24h     # Time period for index files

We are gonna use inmemory store for faster transactions and keep caching for 1h. It will run on port 3100.

Now, we can write the promtail config (replace ansible_hostname with your hostname if you don’t plan on using the playbook). For now, we are gonna focus on getting syslog only:

server:
  http_listen_address: 0.0.0.0
  http_listen_port: 9080
  grpc_listen_port: 9096

positions:
  filename: /tmp/positions.yaml

clients:
  - url: http://<loki-ip-or-hostname>:3100/loki/api/v1/push

scrape_configs:
  - job_name: system
    static_configs:
      - targets:
          - localhost
        labels:
          instance: 
          __path__: /var/log/*

There is a job named system in scrape_configs. We specified the path from which to parse the logs using __path__ and gave a name to it (instance label).

Replace the appropriate hostname or IP for loki.

For systemd distros, you will need to use journal module to scrape logs:

server:
  http_listen_address: 0.0.0.0
  http_listen_port: 9080
  grpc_listen_port: 9096

positions:
  filename: /tmp/positions.yaml

clients:
  - url: http://192.168.31.55:3100/loki/api/v1/push

scrape_configs:
  - job_name: systemd-journal
    journal:
      path: /var/log/journal
      max_age: 12h
      labels:
        instance: 
    relabel_configs:
      - source_labels: ['__journal__systemd_unit']
        target_label: 'unit'
      - source_labels: ['__journal__hostname']
        target_label: 'hostname'

For installing promtail on different hosts, you can use the following playbook where the labels are replaced with the ansible_hostname value in your inventory file:

---
- name: Install and configure Promtail
  hosts: all
  become: yes
  tasks:
    - name: Download Promtail
      get_url:
        url: https://github.com/grafana/loki/releases/download/v3.4.2/promtail-linux-amd64.zip
        dest: /tmp/promtail-linux-amd64.zip

    - name: Install unzip
      apt:
        name: unzip
        state: present
      when: ansible_os_family == 'Debian'

    - name: Extract Promtail
      unarchive:
        src: /tmp/promtail-linux-amd64.zip
        dest: /usr/local/bin
        remote_src: yes

    - name: Remove zip file
      file:
        path: /tmp/promtail-linux-amd64.zip
        state: absent

    - name: Rename Promtail binary
      command: mv /usr/local/bin/promtail-linux-amd64 /usr/local/bin/promtail

    - name: Ensure /usr/local/bin is in PATH
      lineinfile:
        path: /etc/environment
        state: present
        regexp: '^PATH='
        line: 'PATH="/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin"'

    - name: Source /etc/environment
      shell: source /etc/environment
      args:
        executable: /bin/bash

    - name: Create Promtail config directory
      file:
        path: /etc/promtail
        state: directory
        owner: root
        group: root
        mode: '0755'

    - name: Check if systemd is present
      command: cat /proc/1/comm
      register: init_system
      changed_when: false
      ignore_errors: true

    - name: Set fact for systemd presence
      set_fact:
        is_systemd: "systemd"

    - name: Copy Promtail config for systemd
      template:
        src: ./promtail/systemd.yaml.j2
        dest: /etc/promtail/promtail-config.yaml
        owner: root
        group: root
        mode: '0644'
      when: is_systemd

    - name: Copy Promtail config for non-systemd systems
      template:
        src: ./promtail/syslog.yaml.j2
        dest: /etc/promtail/promtail-config.yaml
        owner: root
        group: root
        mode: '0644'
      when: not is_systemd

You can tweak it a little for non-debian distros accordingly.

Now let’s configure supervisord on the alpine container. The default config is at /etc/supervisord. If you’d like to read it, then go ahead otherwise you can paste the following config:

[unix_http_server]
file=/run/supervisord.sock  ; the path to the socket file
;chmod=0700                 ; socket file mode (default 0700)
;chown=nobody:nogroup       ; socket file uid:gid owner
username=user
password=super-secret-password

[supervisord]
logfile=/var/log/supervisord.log ; main log file; default $CWD/supervisord.log

[rpcinterface:supervisor]
supervisor.rpcinterface_factory = supervisor.rpcinterface:make_main_rpcinterface

[supervisorctl]
serverurl=unix:///run/supervisord.sock
;serverurl=http://127.0.0.1:9001
username=user
password=same-password-from-unix-http-server

[program:grafana]
command=grafana-server --homepath "/usr/share/grafana" --config="/root/logs/grafana/config.yaml"
autostart=true
autorestart=true
stderr_logfile=/var/log/grafana.err.log
stdout_logfile=/var/log/grafana.out.log

[program:loki]
command=loki --config.file="/root/logs/loki/config.yaml"
autostart=true
autorestart=true
stderr_logfile=/var/log/loki.err.log
stdout_logfile=/var/log/loki.out.log

[program:promtail]
command=promtail -config.file="/root/logs/promtail/syslog.yaml"
autostart=true
autorestart=true
stderr_logfile=/var/log/promtail.err.log
stdout_logfile=/var/log/promtail.out.log

[include]
files = /etc/supervisor.d/*.ini

Most of it is self-explanatory. The programs that we are gonna run also have a very simple configuration.

We specify the name after program: and then the command to run as a daemon process.

Enable and start supervisord on boot:

rc-update add supervisord
service supervisord start

Running rc-status will give you the details on which programs run at boot:

logs:~# rc-status
Runlevel: default
 supervisord       [  started  ]
 crond             [  started  ]
 sshd              [  started  ]
 networking        [  started  ]
Dynamic Runlevel: hotplugged
Dynamic Runlevel: needed/wanted
 localmount        [  started  ]
Dynamic Runlevel: manual

Sweat. Now start the service:

service supervisord start

If you get the following logs, indicating RUNNING state, your processes have started just fine:

2025-03-05 04:57:03,210 CRIT Supervisor is running as root.  Privileges were not dropped because no user is specified in the config file.  If you intend to run as root, you can set user=root in the config file to avoid this message.
2025-03-05 04:57:03,220 WARN No file matches via include "/etc/supervisor.d/*.ini"
2025-03-05 04:57:03,233 INFO RPC interface 'supervisor' initialized
2025-03-05 04:57:03,233 INFO supervisord started with pid 519
2025-03-05 04:57:04,243 INFO spawned: 'grafana' with pid 529
2025-03-05 04:57:04,264 INFO spawned: 'loki' with pid 530
2025-03-05 04:57:04,265 INFO spawned: 'promtail' with pid 531
2025-03-05 04:57:05,266 INFO success: grafana entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2025-03-05 04:57:05,266 INFO success: loki entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2025-03-05 04:57:05,266 INFO success: promtail entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)

If there is an exit, restart or anything like that, you might wanna take a look at the respective process’s error logs which we specified in the program field on supervisord config.

Grafana runs on port 3000. Let’s use the following simple dashboard which gives you a nice little dropdown for choosing the instance label:

{
  "uid": "loki-dashboard",
  "title": "Loki Dashboard",
  "rows": [
    {
      "title": "Logs",
      "panels": [
        {
          "id": 1,
          "title": "Logs",
          "type": "logs",
          "datasource": "loki",
          "targets": [
            {
              "expr": "{instance=\"$instance\"}",
              "legendFormat": "",
              "refId": "A"
            }
          ],
          "options": {
            "showLabels": true,
            "scrollToBottom": true
          },
          "gridPos": {
            "h": 20,
            "w": 24,
            "x": 0,
            "y": 0
          }
        }
      ]
    }
  ],
  "templating": {
    "list": [
      {
        "name": "instance",
        "query": "label_values(instance)",
        "multi": false,
        "includeAll": true
      }
    ]
  }
}

Now, you can run the promtail playbook to install the agents on other hosts.

ansible-playbook -i inventory logs.yaml