Building a Complete Monitoring Stack with Grafana Alloy: From Zero to Production

Introduction

Managing observability across a growing infrastructure is challenging. The traditional approach involves deploying multiple agents per server:

Prometheus for metrics collection
Node Exporter for host metrics
MySQL Exporter for database metrics
Apache Exporter for web server metrics
Promtail for log shipping
OpenTelemetry Collector for traces

That's potentially 5-6 different agents per server, each with its own configuration, update cycle, and failure modes.

Grafana Alloy changes this paradigm by providing a single, unified agent that handles metrics, logs, and traces collection. In this guide, I'll walk you through building a production-ready monitoring stack from scratch.

Why Grafana Alloy?

Grafana Alloy (formerly Grafana Agent) is a vendor-agnostic OpenTelemetry Collector distribution with programmable pipelines. Here's what it replaces:

Traditional Component	Alloy Equivalent
Prometheus (scraping only)	Built-in scraping
Node Exporter	`prometheus.exporter.unix`
MySQL Exporter	`prometheus.exporter.mysql`
Apache Exporter	`prometheus.exporter.apache`
Redis Exporter	`prometheus.exporter.redis`
Promtail	`loki.source.file`, `loki.source.docker`
OpenTelemetry Collector	Native OTLP support

Key Benefits

Single Binary: One agent to deploy, configure, and maintain
River Configuration: Intuitive, declarative configuration language
Built-in Service Discovery: Automatic target discovery for Kubernetes, Docker, EC2, etc.
Lower Resource Footprint: Optimized for edge deployment
Native Remote Write: Push metrics directly to Prometheus, Mimir, or Grafana Cloud
Programmable Pipelines: Transform, filter, and route telemetry data

When NOT to Use Alloy

You need local PromQL querying (use Prometheus directly)
You're already running a well-optimized stack with no issues
You need exporters that Alloy doesn't have built-in (though you can still scrape external exporters)

Data Flow

Alloy runs on each target server
Metrics are scraped locally and remote-written to Prometheus
Logs are tailed and pushed to Loki
Grafana queries both Prometheus and Loki for visualization

Prerequisites

Monitoring Server

Ubuntu 22.04/24.04 LTS
Docker and Docker Compose installed
Minimum 2 CPU, 4GB RAM (for small deployments)
50GB+ disk for metrics and logs storage

Target Servers

Ubuntu 20.04/22.04/24.04 LTS
Network access to monitoring server (ports 9090, 3100)
Root or sudo access

Network Requirements

Source	Destination	Port	Protocol	Purpose
Alloy agents	Prometheus	9090	TCP	Metrics remote write
Alloy agents	Loki	3100	TCP	Log push
Admin	Grafana	3000	TCP	Web UI

Part 1: Setting Up the Central Monitoring Server

1.1 Create Directory Structure

```

mkdir -p /opt/monitoring/{prometheus,loki,grafana/provisioning/datasources,alloy}
cd /opt/monitoring

```

1.2 Docker Compose Configuration

Create docker-compose.yml:

1.3 Prometheus Configuration

Create prometheus/prometheus.yml:

global:
  scrape_interval: 15s
  evaluation_interval: 15s

alerting:
  alertmanagers:
    - static_configs:
        - targets: []

rule_files: []

scrape_configs:
  - job_name: 'prometheus'
    static_configs:
      - targets: ['localhost:9090']

Key settings:

--web.enable-remote-write-receiver: Allows Alloy to push metrics
--storage.tsdb.retention.time=30d: Keep 30 days of data
--storage.tsdb.retention.size=40GB: Cap storage at 40GB

1.4 Loki Configuration

Create loki/loki-config.yml:

auth_enabled: false

server:
  http_listen_port: 3100
  grpc_listen_port: 9096

common:
  instance_addr: 127.0.0.1
  path_prefix: /loki
  storage:
    filesystem:
      chunks_directory: /loki/chunks
      rules_directory: /loki/rules
  replication_factor: 1
  ring:
    kvstore:
      store: inmemory

query_range:
  results_cache:
    cache:
      embedded_cache:
        enabled: true
        max_size_mb: 100

schema_config:
  configs:
    - from: 2020-10-24
      store: tsdb
      object_store: filesystem
      schema: v13
      index:
        prefix: index_
        period: 24h

ruler:
  alertmanager_url: http://localhost:9093

limits_config:
  retention_period: 30d
  ingestion_rate_mb: 10
  ingestion_burst_size_mb: 20
  max_streams_per_user: 10000
  max_line_size: 256kb

compactor:
  working_directory: /loki/compactor
  compaction_interval: 10m
  retention_enabled: true
  retention_delete_delay: 2h
  retention_delete_worker_count: 150
  delete_request_store: filesystem

analytics:
  reporting_enabled: false

Important: The delete_request_store: filesystem line is required when retention_enabled: true. Without it, Loki will fail to start.

1.5 Grafana Datasource Provisioning

Create grafana/provisioning/datasources/datasources.yml:

apiVersion: 1

datasources:
  - name: Prometheus
    type: prometheus
    access: proxy
    url: http://prometheus:9090
    isDefault: true
    editable: false

  - name: Loki
    type: loki
    access: proxy
    url: http://loki:3100
    editable: false
    jsonData:
      maxLines: 1000

1.6 Start the Stack

cd /opt/monitoring
docker compose up -d

1.7 Verify Installation

# Check all containers are running
docker compose ps

# Test Prometheus
curl -s http://localhost:9090/-/ready
# Expected: Prometheus Server is Ready.

# Test Loki (may take 15-30 seconds on first start)
curl -s http://localhost:3100/ready
# Expected: ready

# Test Grafana
curl -s http://localhost:3000/api/health
# Expected: {"commit":"...","database":"ok","version":"..."}

Access Grafana at http://YOUR_SERVER_IP:3000 with admin/changeme.

Part 2: Monitoring MySQL Database Servers

2.1 Install Alloy

Run on each MySQL server:

# Add Grafana repository
curl -fsSL https://apt.grafana.com/gpg.key | gpg --dearmor -o /usr/share/keyrings/grafana.gpg
echo "deb [signed-by=/usr/share/keyrings/grafana.gpg] https://apt.grafana.com stable main" | tee /etc/apt/sources.list.d/grafana.list

# Install Alloy
apt update && apt install alloy -y

2.2 Create MySQL Monitoring User

Connect to MySQL and create a dedicated monitoring user:

CREATE USER 'alloy'@'localhost' IDENTIFIED BY 'your_secure_password_here';
GRANT PROCESS, REPLICATION CLIENT, SELECT ON *.* TO 'alloy'@'localhost';
FLUSH PRIVILEGES;

Permissions explained:

PROCESS: View running queries and connections
REPLICATION CLIENT: View replication status
SELECT: Read table statistics

2.3 Configure Alloy

Create /etc/alloy/config.alloy:

// =============================================================================
// ALLOY CONFIGURATION FOR MYSQL SERVER
// Server: mysql-01 (change for each server)
// =============================================================================

// -----------------------------------------------------------------------------
// NODE/HOST METRICS
// Replaces: node_exporter
// -----------------------------------------------------------------------------
prometheus.exporter.unix "node" { }

prometheus.scrape "node" {
  targets    = prometheus.exporter.unix.node.targets
  forward_to = [prometheus.relabel.add_labels.receiver]
  
  scrape_interval = "15s"
}

// -----------------------------------------------------------------------------
// MYSQL METRICS
// Replaces: mysqld_exporter
// -----------------------------------------------------------------------------
prometheus.exporter.mysql "database" {
  data_source_name = "alloy:your_secure_password_here@(localhost:3306)/"
}

prometheus.scrape "mysql" {
  targets    = prometheus.exporter.mysql.database.targets
  forward_to = [prometheus.relabel.add_labels.receiver]
  
  scrape_interval = "15s"
}

// -----------------------------------------------------------------------------
// LABELS
// Add consistent labels to all metrics
// -----------------------------------------------------------------------------
prometheus.relabel "add_labels" {
  rule {
    action       = "replace"
    target_label = "server"
    replacement  = "mysql-01"  // CHANGE THIS FOR EACH SERVER
  }
  
  rule {
    action       = "replace"
    target_label = "environment"
    replacement  = "production"
  }
  
  forward_to = [prometheus.remote_write.default.receiver]
}

// -----------------------------------------------------------------------------
// REMOTE WRITE TO PROMETHEUS
// -----------------------------------------------------------------------------
prometheus.remote_write "default" {
  endpoint {
    url = "http://192.168.0.23:9090/api/v1/write"
    
    queue_config {
      max_samples_per_send = 1000
      batch_send_deadline  = "5s"
      min_backoff          = "30ms"
      max_backoff          = "5s"
    }
  }
}

// -----------------------------------------------------------------------------
// LOG COLLECTION
// Replaces: promtail
// -----------------------------------------------------------------------------
loki.source.file "mysql_logs" {
  targets = [
    {
      __path__  = "/var/log/mysql/error.log",
      job       = "mysql-error",
      server    = "mysql-01",
      component = "mysql",
    },
    {
      __path__  = "/var/log/mysql/mysql-slow.log",
      job       = "mysql-slow",
      server    = "mysql-01",
      component = "mysql",
    },
  ]
  forward_to = [loki.write.default.receiver]
}

loki.source.file "system_logs" {
  targets = [
    {
      __path__ = "/var/log/syslog",
      job      = "syslog",
      server   = "mysql-01",
    },
    {
      __path__ = "/var/log/auth.log",
      job      = "authlog",
      server   = "mysql-01",
    },
  ]
  forward_to = [loki.write.default.receiver]
}

// -----------------------------------------------------------------------------
// LOKI WRITE
// -----------------------------------------------------------------------------
loki.write "default" {
  endpoint {
    url = "http://192.168.0.23:3100/loki/api/v1/push"
    
    batch_wait   = "1s"
    batch_size   = 1048576  // 1MB
  }
}

2.4 Configure Permissions

# Allow Alloy to read log files
usermod -aG adm alloy
usermod -aG mysql alloy

# Verify permissions
su - alloy -s /bin/bash -c "cat /var/log/mysql/error.log | head -1"

2.5 Start Alloy

systemctl enable alloy
systemctl start alloy

# Check status
systemctl status alloy

# View logs
journalctl -u alloy -f --no-pager

2.6 Verify Data Collection

# Check metrics are being scraped
curl -s http://localhost:12345/metrics | grep mysql_up

# Check data is reaching Prometheus (run from monitoring server)
curl -s 'http://192.168.0.23:9090/api/v1/query?query=mysql_up' | jq '.data.result[].metric.server'

Part 3: Monitoring Apache + PHP-FPM Web Servers

3.1 Install Alloy

curl -fsSL https://apt.grafana.com/gpg.key | gpg --dearmor -o /usr/share/keyrings/grafana.gpg
echo "deb [signed-by=/usr/share/keyrings/grafana.gpg] https://apt.grafana.com stable main" | tee /etc/apt/sources.list.d/grafana.list
apt update && apt install alloy -y

3.2 Enable Apache Server Status

Enable the status module:

a2enmod status

Create a dedicated vhost for status endpoints. This is important if you're running Laravel or any framework that catches all routes.

Create /etc/apache2/sites-available/000-localhost-status.conf:


    ServerName 127.0.0.1
    
    # Apache server status
    
        SetHandler server-status
        Require local
    
    
    # PHP-FPM status (optional)
    
        SetHandler "proxy:unix:/run/php/php8.1-fpm.sock|fcgi://localhost/status"
        Require local

Enable the vhost:

a2enmod proxy proxy_fcgi
a2ensite 000-localhost-status
systemctl restart apache2

The 000- prefix ensures this vhost loads first, before your application vhosts.

Verify:

curl http://127.0.0.1/server-status?auto

3.3 Enable PHP-FPM Status (Optional)

Edit /etc/php/8.1/fpm/pool.d/www.conf:

pm.status_path = /status

Restart PHP-FPM:

systemctl restart php8.1-fpm

Verify:

curl http://127.0.0.1/status

3.4 Configure Alloy

Create /etc/alloy/config.alloy:

// =============================================================================
// ALLOY CONFIGURATION FOR WEB SERVER
// Server: web-server
// =============================================================================

// -----------------------------------------------------------------------------
// NODE/HOST METRICS
// -----------------------------------------------------------------------------
prometheus.exporter.unix "node" { }

prometheus.scrape "node" {
  targets    = prometheus.exporter.unix.node.targets
  forward_to = [prometheus.relabel.add_labels.receiver]
}

// -----------------------------------------------------------------------------
// APACHE METRICS
// Replaces: apache_exporter
// -----------------------------------------------------------------------------
prometheus.exporter.apache "web" {
  scrape_uri = "http://127.0.0.1/server-status?auto"
}

prometheus.scrape "apache" {
  targets    = prometheus.exporter.apache.web.targets
  forward_to = [prometheus.relabel.add_labels.receiver]
}

// -----------------------------------------------------------------------------
// LABELS
// -----------------------------------------------------------------------------
prometheus.relabel "add_labels" {
  rule {
    action       = "replace"
    target_label = "server"
    replacement  = "web-server"
  }
  
  rule {
    action       = "replace"
    target_label = "environment"
    replacement  = "production"
  }
  
  forward_to = [prometheus.remote_write.default.receiver]
}

// -----------------------------------------------------------------------------
// REMOTE WRITE
// -----------------------------------------------------------------------------
prometheus.remote_write "default" {
  endpoint {
    url = "http://192.168.0.23:9090/api/v1/write"
  }
}

// -----------------------------------------------------------------------------
// LOG COLLECTION
// -----------------------------------------------------------------------------
loki.source.file "system_logs" {
  targets = [
    {__path__ = "/var/log/syslog", job = "syslog", server = "web-server"},
    {__path__ = "/var/log/auth.log", job = "authlog", server = "web-server"},
  ]
  forward_to = [loki.write.default.receiver]
}

loki.source.file "apache_logs" {
  targets = [
    {__path__ = "/var/log/apache2/access.log", job = "apache-access", server = "web-server"},
    {__path__ = "/var/log/apache2/error.log", job = "apache-error", server = "web-server"},
    {__path__ = "/var/log/apache2/*-access.log", job = "apache-access", server = "web-server"},
    {__path__ = "/var/log/apache2/*-error.log", job = "apache-error", server = "web-server"},
  ]
  forward_to = [loki.write.default.receiver]
}

loki.source.file "php_logs" {
  targets = [
    {__path__ = "/var/log/php*.log", job = "php-fpm", server = "web-server"},
  ]
  forward_to = [loki.write.default.receiver]
}

// -----------------------------------------------------------------------------
// LOKI WRITE
// -----------------------------------------------------------------------------
loki.write "default" {
  endpoint {
    url = "http://192.168.0.23:3100/loki/api/v1/push"
  }
}

3.5 Start Alloy

usermod -aG adm alloy
usermod -aG www-data alloy
systemctl enable alloy
systemctl start alloy

Part 4: Monitoring Docker Container Hosts

4.1 Install Alloy

curl -fsSL https://apt.grafana.com/gpg.key | gpg --dearmor -o /usr/share/keyrings/grafana.gpg
echo "deb [signed-by=/usr/share/keyrings/grafana.gpg] https://apt.grafana.com stable main" | tee /etc/apt/sources.list.d/grafana.list
apt update && apt install alloy -y

4.2 Deploy cAdvisor

Alloy's built-in cAdvisor exporter can have permission issues with Docker's overlay filesystem. The reliable solution is running cAdvisor as a privileged container:

docker run -d \
  --name=cadvisor \
  --restart=unless-stopped \
  --privileged \
  -p 8081:8080 \
  -v /:/rootfs:ro \
  -v /var/run:/var/run:ro \
  -v /sys:/sys:ro \
  -v /var/lib/docker/:/var/lib/docker:ro \
  gcr.io/cadvisor/cadvisor:latest

Note: Use port 8081 if 8080 is already in use.

Verify:

curl -s http://localhost:8081/metrics | grep container_cpu

4.3 Configure Alloy

Create /etc/alloy/config.alloy:

// =============================================================================
// ALLOY CONFIGURATION FOR DOCKER HOST
// Server: docker-host
// =============================================================================

// -----------------------------------------------------------------------------
// NODE/HOST METRICS
// -----------------------------------------------------------------------------
prometheus.exporter.unix "node" { }

prometheus.scrape "node" {
  targets    = prometheus.exporter.unix.node.targets
  forward_to = [prometheus.relabel.add_labels.receiver]
}

// -----------------------------------------------------------------------------
// CADVISOR (CONTAINER METRICS)
// -----------------------------------------------------------------------------
prometheus.scrape "cadvisor" {
  targets = [
    {"__address__" = "localhost:8081", "job" = "cadvisor"},
  ]
  forward_to = [prometheus.relabel.add_labels.receiver]
  
  scrape_interval = "15s"
  scrape_timeout  = "10s"
}

// -----------------------------------------------------------------------------
// LABELS
// -----------------------------------------------------------------------------
prometheus.relabel "add_labels" {
  rule {
    action       = "replace"
    target_label = "server"
    replacement  = "docker-host"
  }
  
  forward_to = [prometheus.remote_write.default.receiver]
}

// -----------------------------------------------------------------------------
// REMOTE WRITE
// -----------------------------------------------------------------------------
prometheus.remote_write "default" {
  endpoint {
    url = "http://192.168.0.23:9090/api/v1/write"
  }
}

// -----------------------------------------------------------------------------
// SYSTEM LOGS
// -----------------------------------------------------------------------------
loki.source.file "system_logs" {
  targets = [
    {__path__ = "/var/log/syslog", job = "syslog", server = "docker-host"},
    {__path__ = "/var/log/auth.log", job = "authlog", server = "docker-host"},
  ]
  forward_to = [loki.write.default.receiver]
}

// -----------------------------------------------------------------------------
// DOCKER CONTAINER LOGS
// Automatically discovers and collects logs from all containers
// -----------------------------------------------------------------------------
discovery.docker "containers" {
  host = "unix:///var/run/docker.sock"
}

loki.source.docker "containers" {
  host       = "unix:///var/run/docker.sock"
  targets    = discovery.docker.containers.targets
  labels     = {server = "docker-host", job = "docker"}
  forward_to = [loki.write.default.receiver]
  
  refresh_interval = "5s"
}

// -----------------------------------------------------------------------------
// LOKI WRITE
// -----------------------------------------------------------------------------
loki.write "default" {
  endpoint {
    url = "http://192.168.0.23:3100/loki/api/v1/push"
  }
}

4.4 Start Alloy

usermod -aG docker alloy
usermod -aG adm alloy
systemctl enable alloy
systemctl start alloy

Part 5: Creating Grafana Dashboards

5.1 Dashboard Variables

Before creating panels, set up a server variable for filtering:

Open your dashboard
Go to Settings → Variables → New variable
Configure:
- Name: server
- Type: Query
- Data source: Prometheus
- Query: label_values(up, server)
- Multi-value: Enable
- Include All option: Enable
Click Apply

Now use {server=~"$server"} in all your queries.

5.2 Key Metrics Reference

Host Metrics (Node Exporter)

# CPU Usage (%)
100 - (avg by(server) (rate(node_cpu_seconds_total{mode="idle", server=~"$server"}[5m])) * 100)

# Memory Usage (%)
(node_memory_MemTotal_bytes{server=~"$server"} - node_memory_MemAvailable_bytes{server=~"$server"}) / node_memory_MemTotal_bytes{server=~"$server"} * 100

# Disk Usage (%)
100 - (node_filesystem_avail_bytes{server=~"$server", mountpoint="/"} / node_filesystem_size_bytes{server=~"$server", mountpoint="/"} * 100)

# Load Average
node_load1{server=~"$server"}
node_load5{server=~"$server"}
node_load15{server=~"$server"}

# Network Traffic
rate(node_network_receive_bytes_total{server=~"$server", device!="lo"}[5m])
rate(node_network_transmit_bytes_total{server=~"$server", device!="lo"}[5m])

# Disk I/O
rate(node_disk_read_bytes_total{server=~"$server"}[5m])
rate(node_disk_written_bytes_total{server=~"$server"}[5m])

MySQL Metrics

# MySQL Up/Down
mysql_up{server=~"$server"}

# Connections
mysql_global_status_threads_connected{server=~"$server"}
mysql_global_variables_max_connections{server=~"$server"}

# Queries per Second
rate(mysql_global_status_queries{server=~"$server"}[5m])

# Slow Queries per Second
rate(mysql_global_status_slow_queries{server=~"$server"}[5m])

# Buffer Pool Hit Rate
1 - (rate(mysql_global_status_innodb_buffer_pool_reads{server=~"$server"}[5m]) / rate(mysql_global_status_innodb_buffer_pool_read_requests{server=~"$server"}[5m]))

# InnoDB Buffer Pool Usage
mysql_global_status_innodb_buffer_pool_bytes_data{server=~"$server"}
mysql_global_variables_innodb_buffer_pool_size{server=~"$server"}

Apache Metrics

# Apache Up/Down
apache_up{server=~"$server"}

# Requests per Second
rate(apache_accesses_total{server=~"$server"}[5m])

# Traffic (Bytes/sec)
rate(apache_sent_kilobytes_total{server=~"$server"}[5m]) * 1024

# Workers
apache_workers{server=~"$server", state="busy"}
apache_workers{server=~"$server", state="idle"}

# Uptime
apache_uptime_seconds_total{server=~"$server"}

Docker/Container Metrics

# Container CPU Usage
rate(container_cpu_usage_seconds_total{server=~"$server", name!=""}[5m]) * 100

# Container Memory Usage
container_memory_usage_bytes{server=~"$server", name!=""}

# Container Network I/O
rate(container_network_receive_bytes_total{server=~"$server", name!=""}[5m])
rate(container_network_transmit_bytes_total{server=~"$server", name!=""}[5m])

# Running Containers Count
count(container_memory_usage_bytes{server=~"$server", name!=""})

5.3 Sample Dashboard JSON

Here's a complete MySQL dashboard you can import:

{
  "title": "MySQL Overview",
  "uid": "mysql-overview",
  "templating": {
    "list": [{
      "name": "server",
      "type": "query",
      "datasource": "Prometheus",
      "query": "label_values(mysql_up, server)",
      "refresh": 2,
      "multi": true,
      "includeAll": true
    }]
  },
  "panels": [
    {
      "title": "MySQL Status",
      "type": "stat",
      "gridPos": {"h": 4, "w": 4, "x": 0, "y": 0},
      "targets": [{"expr": "mysql_up{server=~\"$server\"}", "legendFormat": "{{server}}"}],
      "fieldConfig": {
        "defaults": {
          "mappings": [
            {"options": {"0": {"text": "DOWN", "color": "red"}}, "type": "value"},
            {"options": {"1": {"text": "UP", "color": "green"}}, "type": "value"}
          ]
        }
      }
    },
    {
      "title": "Queries per Second",
      "type": "timeseries",
      "gridPos": {"h": 8, "w": 12, "x": 4, "y": 0},
      "targets": [{"expr": "rate(mysql_global_status_queries{server=~\"$server\"}[5m])", "legendFormat": "{{server}}"}]
    },
    {
      "title": "Connections",
      "type": "timeseries",
      "gridPos": {"h": 8, "w": 8, "x": 16, "y": 0},
      "targets": [
        {"expr": "mysql_global_status_threads_connected{server=~\"$server\"}", "legendFormat": "{{server}} - Connected"},
        {"expr": "mysql_global_variables_max_connections{server=~\"$server\"}", "legendFormat": "{{server}} - Max"}
      ]
    }
  ],
  "schemaVersion": 38,
  "time": {"from": "now-1h", "to": "now"},
  "refresh": "30s"
}

Part 6: Alerting Configuration

6.1 Prometheus Alerting Rules

Create prometheus/alerts.yml:

groups:
  - name: host
    rules:
      - alert: HostDown
        expr: up == 0
        for: 1m
        labels:
          severity: critical
        annotations:
          summary: "Host {{ $labels.server }} is down"
          description: "{{ $labels.server }} has been unreachable for more than 1 minute."
      
      - alert: HighCPU
        expr: 100 - (avg by(server) (rate(node_cpu_seconds_total{mode="idle"}[5m])) * 100) > 80
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "High CPU usage on {{ $labels.server }}"
          description: "CPU usage is above 80% (current: {{ $value | printf \"%.1f\" }}%)"
      
      - alert: HighMemory
        expr: (node_memory_MemTotal_bytes - node_memory_MemAvailable_bytes) / node_memory_MemTotal_bytes * 100 > 85
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "High memory usage on {{ $labels.server }}"
      
      - alert: DiskSpaceLow
        expr: 100 - (node_filesystem_avail_bytes{mountpoint="/"} / node_filesystem_size_bytes{mountpoint="/"} * 100) > 85
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "Low disk space on {{ $labels.server }}"

  - name: mysql
    rules:
      - alert: MySQLDown
        expr: mysql_up == 0
        for: 1m
        labels:
          severity: critical
        annotations:
          summary: "MySQL is down on {{ $labels.server }}"
      
      - alert: MySQLTooManyConnections
        expr: mysql_global_status_threads_connected / mysql_global_variables_max_connections > 0.8
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "MySQL connections above 80% on {{ $labels.server }}"
      
      - alert: MySQLSlowQueries
        expr: rate(mysql_global_status_slow_queries[5m]) > 1
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "MySQL slow queries detected on {{ $labels.server }}"

  - name: apache
    rules:
      - alert: ApacheDown
        expr: apache_up == 0
        for: 1m
        labels:
          severity: critical
        annotations:
          summary: "Apache is down on {{ $labels.server }}"

Troubleshooting Guide

Alloy Won't Start

Check config syntax:

alloy fmt /etc/alloy/config.alloy

Run manually to see errors:

alloy run /etc/alloy/config.alloy

Common errors:

component "xxx" does not exist: The exporter isn't available in Alloy
permission denied: Run usermod -aG alloy and restart

No Data in Grafana

Check if Alloy is scraping:

curl -s http://localhost:12345/metrics | grep -c "^[a-z]"

Check if Prometheus is receiving:

curl -s 'http://PROMETHEUS_IP:9090/api/v1/query?query=up' | jq '.data.result[].metric.server'

Check Alloy logs:

journalctl -u alloy -f --no-pager

Loki "Ingester Not Ready"

This is normal on first start. Wait 15-30 seconds:

watch -n 5 'curl -s http://localhost:3100/ready'

Permission Denied on Logs

# Check current groups
groups alloy

# Add to required groups
usermod -aG adm alloy      # System logs
usermod -aG mysql alloy    # MySQL logs
usermod -aG www-data alloy # Apache logs
usermod -aG docker alloy   # Docker socket

# Restart
systemctl restart alloy

High Memory Usage

Alloy can consume significant memory with many targets. Tune the config:

prometheus.remote_write "default" {
  endpoint {
    url = "http://prometheus:9090/api/v1/write"
    
    queue_config {
      max_samples_per_send = 500    # Reduce from default 2000
      capacity             = 2500   # Reduce from default 10000
      max_shards           = 10     # Limit parallelism
    }
  }
}

Performance Tuning

Prometheus Storage

For longer retention or higher cardinality:

command:
  - '--storage.tsdb.retention.time=90d'
  - '--storage.tsdb.retention.size=100GB'
  - '--storage.tsdb.wal-compression'
  - '--storage.tsdb.max-block-duration=2h'

Loki Optimization

For high-volume log ingestion:

limits_config:
  ingestion_rate_mb: 20
  ingestion_burst_size_mb: 40
  per_stream_rate_limit: 5MB
  per_stream_rate_limit_burst: 15MB

Alloy Resource Limits

Create /etc/systemd/system/alloy.service.d/limits.conf:

[Service]
MemoryMax=512M
CPUQuota=50%

systemctl daemon-reload
systemctl restart alloy

Conclusion

We've built a complete monitoring stack that:

Collects metrics from hosts, MySQL, Apache, and Docker containers
Aggregates logs from all sources into Loki
Visualizes everything in Grafana dashboards
Uses a single agent (Alloy) per server instead of multiple exporters

Key Takeaways

Alloy simplifies operations by consolidating multiple agents
Remote write eliminates the need for Prometheus to scrape targets
Consistent labeling (server, environment) enables powerful filtering
Docker Compose makes the central stack portable and reproducible

Next Steps

Add Alertmanager for alert routing and notifications
Implement Tempo for distributed tracing
Scale to Mimir for long-term metrics storage
Set up recording rules for complex/expensive queries

Building a Complete Monitoring Stack with Grafana Alloy: From Zero to Production

Table of Contents

Introduction

Why Grafana Alloy?

Key Benefits

When NOT to Use Alloy

Data Flow

Prerequisites

Monitoring Server

Target Servers

Network Requirements

Part 1: Setting Up the Central Monitoring Server

1.1 Create Directory Structure

1.2 Docker Compose Configuration

1.3 Prometheus Configuration

1.4 Loki Configuration

1.5 Grafana Datasource Provisioning

1.6 Start the Stack

1.7 Verify Installation

Part 2: Monitoring MySQL Database Servers

2.1 Install Alloy

2.2 Create MySQL Monitoring User

2.3 Configure Alloy

2.4 Configure Permissions

2.5 Start Alloy

2.6 Verify Data Collection

Part 3: Monitoring Apache + PHP-FPM Web Servers

3.1 Install Alloy

3.2 Enable Apache Server Status

3.3 Enable PHP-FPM Status (Optional)

3.4 Configure Alloy

3.5 Start Alloy

Part 4: Monitoring Docker Container Hosts

4.1 Install Alloy

4.2 Deploy cAdvisor

4.3 Configure Alloy

4.4 Start Alloy

Part 5: Creating Grafana Dashboards

5.1 Dashboard Variables

5.2 Key Metrics Reference

Host Metrics (Node Exporter)

MySQL Metrics

Apache Metrics

Docker/Container Metrics

5.3 Sample Dashboard JSON

Part 6: Alerting Configuration

6.1 Prometheus Alerting Rules

Troubleshooting Guide

Alloy Won't Start

No Data in Grafana

Loki "Ingester Not Ready"

Permission Denied on Logs

High Memory Usage

Performance Tuning

Prometheus Storage

Loki Optimization

Alloy Resource Limits

Conclusion

Key Takeaways

Next Steps

Resources

Share this article

Related Articles

How to Create LVM on Linux: Step-by-Step Guide

LVM vs LVM-Thin: What’s the Real Difference and Why It Actually M...