
Table of Contents
Introduction
Managing observability across a growing infrastructure is challenging. The traditional approach involves deploying multiple agents per server:
- Prometheus for metrics collection
- Node Exporter for host metrics
- MySQL Exporter for database metrics
- Apache Exporter for web server metrics
- Promtail for log shipping
- OpenTelemetry Collector for traces
That's potentially 5-6 different agents per server, each with its own configuration, update cycle, and failure modes.
Grafana Alloy changes this paradigm by providing a single, unified agent that handles metrics, logs, and traces collection. In this guide, I'll walk you through building a production-ready monitoring stack from scratch.
Why Grafana Alloy?
Grafana Alloy (formerly Grafana Agent) is a vendor-agnostic OpenTelemetry Collector distribution with programmable pipelines. Here's what it replaces:
| Traditional Component | Alloy Equivalent |
|---|---|
| Prometheus (scraping only) | Built-in scraping |
| Node Exporter | prometheus.exporter.unix |
| MySQL Exporter | prometheus.exporter.mysql |
| Apache Exporter | prometheus.exporter.apache |
| Redis Exporter | prometheus.exporter.redis |
| Promtail | loki.source.file, loki.source.docker |
| OpenTelemetry Collector | Native OTLP support |
Key Benefits
- Single Binary: One agent to deploy, configure, and maintain
- River Configuration: Intuitive, declarative configuration language
- Built-in Service Discovery: Automatic target discovery for Kubernetes, Docker, EC2, etc.
- Lower Resource Footprint: Optimized for edge deployment
- Native Remote Write: Push metrics directly to Prometheus, Mimir, or Grafana Cloud
- Programmable Pipelines: Transform, filter, and route telemetry data
When NOT to Use Alloy
- You need local PromQL querying (use Prometheus directly)
- You're already running a well-optimized stack with no issues
- You need exporters that Alloy doesn't have built-in (though you can still scrape external exporters)
Data Flow
- Alloy runs on each target server
- Metrics are scraped locally and remote-written to Prometheus
- Logs are tailed and pushed to Loki
- Grafana queries both Prometheus and Loki for visualization
Prerequisites
Monitoring Server
- Ubuntu 22.04/24.04 LTS
- Docker and Docker Compose installed
- Minimum 2 CPU, 4GB RAM (for small deployments)
- 50GB+ disk for metrics and logs storage
Target Servers
- Ubuntu 20.04/22.04/24.04 LTS
- Network access to monitoring server (ports 9090, 3100)
- Root or sudo access
Network Requirements
| Source | Destination | Port | Protocol | Purpose |
|---|---|---|---|---|
| Alloy agents | Prometheus | 9090 | TCP | Metrics remote write |
| Alloy agents | Loki | 3100 | TCP | Log push |
| Admin | Grafana | 3000 | TCP | Web UI |
Part 1: Setting Up the Central Monitoring Server
1.1 Create Directory Structure
```
mkdir -p /opt/monitoring/{prometheus,loki,grafana/provisioning/datasources,alloy}
cd /opt/monitoring
```
1.2 Docker Compose Configuration
Create docker-compose.yml:
1.3 Prometheus Configuration
Create prometheus/prometheus.yml:
global: scrape_interval: 15s evaluation_interval: 15s alerting: alertmanagers: - static_configs: - targets: [] rule_files: [] scrape_configs: - job_name: 'prometheus' static_configs: - targets: ['localhost:9090']Key settings:
--web.enable-remote-write-receiver: Allows Alloy to push metrics--storage.tsdb.retention.time=30d: Keep 30 days of data--storage.tsdb.retention.size=40GB: Cap storage at 40GB
1.4 Loki Configuration
Create loki/loki-config.yml:
auth_enabled: false server: http_listen_port: 3100 grpc_listen_port: 9096 common: instance_addr: 127.0.0.1 path_prefix: /loki storage: filesystem: chunks_directory: /loki/chunks rules_directory: /loki/rules replication_factor: 1 ring: kvstore: store: inmemory query_range: results_cache: cache: embedded_cache: enabled: true max_size_mb: 100 schema_config: configs: - from: 2020-10-24 store: tsdb object_store: filesystem schema: v13 index: prefix: index_ period: 24h ruler: alertmanager_url: http://localhost:9093 limits_config: retention_period: 30d ingestion_rate_mb: 10 ingestion_burst_size_mb: 20 max_streams_per_user: 10000 max_line_size: 256kb compactor: working_directory: /loki/compactor compaction_interval: 10m retention_enabled: true retention_delete_delay: 2h retention_delete_worker_count: 150 delete_request_store: filesystem analytics: reporting_enabled: falseImportant: The delete_request_store: filesystem line is required when retention_enabled: true. Without it, Loki will fail to start.
1.5 Grafana Datasource Provisioning
Create grafana/provisioning/datasources/datasources.yml:
apiVersion: 1 datasources: - name: Prometheus type: prometheus access: proxy url: http://prometheus:9090 isDefault: true editable: false - name: Loki type: loki access: proxy url: http://loki:3100 editable: false jsonData: maxLines: 10001.6 Start the Stack
cd /opt/monitoring docker compose up -d1.7 Verify Installation
# Check all containers are running docker compose ps # Test Prometheus curl -s http://localhost:9090/-/ready # Expected: Prometheus Server is Ready. # Test Loki (may take 15-30 seconds on first start) curl -s http://localhost:3100/ready # Expected: ready # Test Grafana curl -s http://localhost:3000/api/health # Expected: {"commit":"...","database":"ok","version":"..."}Access Grafana at http://YOUR_SERVER_IP:3000 with admin/changeme.
Part 2: Monitoring MySQL Database Servers
2.1 Install Alloy
Run on each MySQL server:
# Add Grafana repository curl -fsSL https://apt.grafana.com/gpg.key | gpg --dearmor -o /usr/share/keyrings/grafana.gpg echo "deb [signed-by=/usr/share/keyrings/grafana.gpg] https://apt.grafana.com stable main" | tee /etc/apt/sources.list.d/grafana.list # Install Alloy apt update && apt install alloy -y2.2 Create MySQL Monitoring User
Connect to MySQL and create a dedicated monitoring user:
CREATE USER 'alloy'@'localhost' IDENTIFIED BY 'your_secure_password_here'; GRANT PROCESS, REPLICATION CLIENT, SELECT ON *.* TO 'alloy'@'localhost'; FLUSH PRIVILEGES;Permissions explained:
- PROCESS: View running queries and connections
- REPLICATION CLIENT: View replication status
- SELECT: Read table statistics
2.3 Configure Alloy
Create /etc/alloy/config.alloy:
// ============================================================================= // ALLOY CONFIGURATION FOR MYSQL SERVER // Server: mysql-01 (change for each server) // ============================================================================= // ----------------------------------------------------------------------------- // NODE/HOST METRICS // Replaces: node_exporter // ----------------------------------------------------------------------------- prometheus.exporter.unix "node" { } prometheus.scrape "node" { targets = prometheus.exporter.unix.node.targets forward_to = [prometheus.relabel.add_labels.receiver] scrape_interval = "15s" } // ----------------------------------------------------------------------------- // MYSQL METRICS // Replaces: mysqld_exporter // ----------------------------------------------------------------------------- prometheus.exporter.mysql "database" { data_source_name = "alloy:your_secure_password_here@(localhost:3306)/" } prometheus.scrape "mysql" { targets = prometheus.exporter.mysql.database.targets forward_to = [prometheus.relabel.add_labels.receiver] scrape_interval = "15s" } // ----------------------------------------------------------------------------- // LABELS // Add consistent labels to all metrics // ----------------------------------------------------------------------------- prometheus.relabel "add_labels" { rule { action = "replace" target_label = "server" replacement = "mysql-01" // CHANGE THIS FOR EACH SERVER } rule { action = "replace" target_label = "environment" replacement = "production" } forward_to = [prometheus.remote_write.default.receiver] } // ----------------------------------------------------------------------------- // REMOTE WRITE TO PROMETHEUS // ----------------------------------------------------------------------------- prometheus.remote_write "default" { endpoint { url = "http://192.168.0.23:9090/api/v1/write" queue_config { max_samples_per_send = 1000 batch_send_deadline = "5s" min_backoff = "30ms" max_backoff = "5s" } } } // ----------------------------------------------------------------------------- // LOG COLLECTION // Replaces: promtail // ----------------------------------------------------------------------------- loki.source.file "mysql_logs" { targets = [ { __path__ = "/var/log/mysql/error.log", job = "mysql-error", server = "mysql-01", component = "mysql", }, { __path__ = "/var/log/mysql/mysql-slow.log", job = "mysql-slow", server = "mysql-01", component = "mysql", }, ] forward_to = [loki.write.default.receiver] } loki.source.file "system_logs" { targets = [ { __path__ = "/var/log/syslog", job = "syslog", server = "mysql-01", }, { __path__ = "/var/log/auth.log", job = "authlog", server = "mysql-01", }, ] forward_to = [loki.write.default.receiver] } // ----------------------------------------------------------------------------- // LOKI WRITE // ----------------------------------------------------------------------------- loki.write "default" { endpoint { url = "http://192.168.0.23:3100/loki/api/v1/push" batch_wait = "1s" batch_size = 1048576 // 1MB } }2.4 Configure Permissions
# Allow Alloy to read log files usermod -aG adm alloy usermod -aG mysql alloy # Verify permissions su - alloy -s /bin/bash -c "cat /var/log/mysql/error.log | head -1"2.5 Start Alloy
systemctl enable alloy systemctl start alloy # Check status systemctl status alloy # View logs journalctl -u alloy -f --no-pager2.6 Verify Data Collection
# Check metrics are being scraped curl -s http://localhost:12345/metrics | grep mysql_up # Check data is reaching Prometheus (run from monitoring server) curl -s 'http://192.168.0.23:9090/api/v1/query?query=mysql_up' | jq '.data.result[].metric.server'Part 3: Monitoring Apache + PHP-FPM Web Servers
3.1 Install Alloy
curl -fsSL https://apt.grafana.com/gpg.key | gpg --dearmor -o /usr/share/keyrings/grafana.gpg echo "deb [signed-by=/usr/share/keyrings/grafana.gpg] https://apt.grafana.com stable main" | tee /etc/apt/sources.list.d/grafana.list apt update && apt install alloy -y3.2 Enable Apache Server Status
Enable the status module:
a2enmod statusCreate a dedicated vhost for status endpoints. This is important if you're running Laravel or any framework that catches all routes.
Create /etc/apache2/sites-available/000-localhost-status.conf:
ServerName 127.0.0.1 # Apache server status SetHandler server-status Require local # PHP-FPM status (optional) SetHandler "proxy:unix:/run/php/php8.1-fpm.sock|fcgi://localhost/status" Require local Enable the vhost:
a2enmod proxy proxy_fcgi a2ensite 000-localhost-status systemctl restart apache2The 000- prefix ensures this vhost loads first, before your application vhosts.
Verify:
curl http://127.0.0.1/server-status?auto3.3 Enable PHP-FPM Status (Optional)
Edit /etc/php/8.1/fpm/pool.d/www.conf:
pm.status_path = /statusRestart PHP-FPM:
systemctl restart php8.1-fpmVerify:
curl http://127.0.0.1/status3.4 Configure Alloy
Create /etc/alloy/config.alloy:
// ============================================================================= // ALLOY CONFIGURATION FOR WEB SERVER // Server: web-server // ============================================================================= // ----------------------------------------------------------------------------- // NODE/HOST METRICS // ----------------------------------------------------------------------------- prometheus.exporter.unix "node" { } prometheus.scrape "node" { targets = prometheus.exporter.unix.node.targets forward_to = [prometheus.relabel.add_labels.receiver] } // ----------------------------------------------------------------------------- // APACHE METRICS // Replaces: apache_exporter // ----------------------------------------------------------------------------- prometheus.exporter.apache "web" { scrape_uri = "http://127.0.0.1/server-status?auto" } prometheus.scrape "apache" { targets = prometheus.exporter.apache.web.targets forward_to = [prometheus.relabel.add_labels.receiver] } // ----------------------------------------------------------------------------- // LABELS // ----------------------------------------------------------------------------- prometheus.relabel "add_labels" { rule { action = "replace" target_label = "server" replacement = "web-server" } rule { action = "replace" target_label = "environment" replacement = "production" } forward_to = [prometheus.remote_write.default.receiver] } // ----------------------------------------------------------------------------- // REMOTE WRITE // ----------------------------------------------------------------------------- prometheus.remote_write "default" { endpoint { url = "http://192.168.0.23:9090/api/v1/write" } } // ----------------------------------------------------------------------------- // LOG COLLECTION // ----------------------------------------------------------------------------- loki.source.file "system_logs" { targets = [ {__path__ = "/var/log/syslog", job = "syslog", server = "web-server"}, {__path__ = "/var/log/auth.log", job = "authlog", server = "web-server"}, ] forward_to = [loki.write.default.receiver] } loki.source.file "apache_logs" { targets = [ {__path__ = "/var/log/apache2/access.log", job = "apache-access", server = "web-server"}, {__path__ = "/var/log/apache2/error.log", job = "apache-error", server = "web-server"}, {__path__ = "/var/log/apache2/*-access.log", job = "apache-access", server = "web-server"}, {__path__ = "/var/log/apache2/*-error.log", job = "apache-error", server = "web-server"}, ] forward_to = [loki.write.default.receiver] } loki.source.file "php_logs" { targets = [ {__path__ = "/var/log/php*.log", job = "php-fpm", server = "web-server"}, ] forward_to = [loki.write.default.receiver] } // ----------------------------------------------------------------------------- // LOKI WRITE // ----------------------------------------------------------------------------- loki.write "default" { endpoint { url = "http://192.168.0.23:3100/loki/api/v1/push" } }3.5 Start Alloy
usermod -aG adm alloy usermod -aG www-data alloy systemctl enable alloy systemctl start alloyPart 4: Monitoring Docker Container Hosts
4.1 Install Alloy
curl -fsSL https://apt.grafana.com/gpg.key | gpg --dearmor -o /usr/share/keyrings/grafana.gpg echo "deb [signed-by=/usr/share/keyrings/grafana.gpg] https://apt.grafana.com stable main" | tee /etc/apt/sources.list.d/grafana.list apt update && apt install alloy -y4.2 Deploy cAdvisor
Alloy's built-in cAdvisor exporter can have permission issues with Docker's overlay filesystem. The reliable solution is running cAdvisor as a privileged container:
docker run -d \ --name=cadvisor \ --restart=unless-stopped \ --privileged \ -p 8081:8080 \ -v /:/rootfs:ro \ -v /var/run:/var/run:ro \ -v /sys:/sys:ro \ -v /var/lib/docker/:/var/lib/docker:ro \ gcr.io/cadvisor/cadvisor:latestNote: Use port 8081 if 8080 is already in use.
Verify:
curl -s http://localhost:8081/metrics | grep container_cpu4.3 Configure Alloy
Create /etc/alloy/config.alloy:
// ============================================================================= // ALLOY CONFIGURATION FOR DOCKER HOST // Server: docker-host // ============================================================================= // ----------------------------------------------------------------------------- // NODE/HOST METRICS // ----------------------------------------------------------------------------- prometheus.exporter.unix "node" { } prometheus.scrape "node" { targets = prometheus.exporter.unix.node.targets forward_to = [prometheus.relabel.add_labels.receiver] } // ----------------------------------------------------------------------------- // CADVISOR (CONTAINER METRICS) // ----------------------------------------------------------------------------- prometheus.scrape "cadvisor" { targets = [ {"__address__" = "localhost:8081", "job" = "cadvisor"}, ] forward_to = [prometheus.relabel.add_labels.receiver] scrape_interval = "15s" scrape_timeout = "10s" } // ----------------------------------------------------------------------------- // LABELS // ----------------------------------------------------------------------------- prometheus.relabel "add_labels" { rule { action = "replace" target_label = "server" replacement = "docker-host" } forward_to = [prometheus.remote_write.default.receiver] } // ----------------------------------------------------------------------------- // REMOTE WRITE // ----------------------------------------------------------------------------- prometheus.remote_write "default" { endpoint { url = "http://192.168.0.23:9090/api/v1/write" } } // ----------------------------------------------------------------------------- // SYSTEM LOGS // ----------------------------------------------------------------------------- loki.source.file "system_logs" { targets = [ {__path__ = "/var/log/syslog", job = "syslog", server = "docker-host"}, {__path__ = "/var/log/auth.log", job = "authlog", server = "docker-host"}, ] forward_to = [loki.write.default.receiver] } // ----------------------------------------------------------------------------- // DOCKER CONTAINER LOGS // Automatically discovers and collects logs from all containers // ----------------------------------------------------------------------------- discovery.docker "containers" { host = "unix:///var/run/docker.sock" } loki.source.docker "containers" { host = "unix:///var/run/docker.sock" targets = discovery.docker.containers.targets labels = {server = "docker-host", job = "docker"} forward_to = [loki.write.default.receiver] refresh_interval = "5s" } // ----------------------------------------------------------------------------- // LOKI WRITE // ----------------------------------------------------------------------------- loki.write "default" { endpoint { url = "http://192.168.0.23:3100/loki/api/v1/push" } }4.4 Start Alloy
usermod -aG docker alloy usermod -aG adm alloy systemctl enable alloy systemctl start alloyPart 5: Creating Grafana Dashboards
5.1 Dashboard Variables
Before creating panels, set up a server variable for filtering:
- Open your dashboard
- Go to Settings → Variables → New variable
- Configure:
- Name: server
- Type: Query
- Data source: Prometheus
- Query:
label_values(up, server) - Multi-value: Enable
- Include All option: Enable
- Click Apply
Now use {server=~"$server"} in all your queries.
5.2 Key Metrics Reference
Host Metrics (Node Exporter)
# CPU Usage (%) 100 - (avg by(server) (rate(node_cpu_seconds_total{mode="idle", server=~"$server"}[5m])) * 100) # Memory Usage (%) (node_memory_MemTotal_bytes{server=~"$server"} - node_memory_MemAvailable_bytes{server=~"$server"}) / node_memory_MemTotal_bytes{server=~"$server"} * 100 # Disk Usage (%) 100 - (node_filesystem_avail_bytes{server=~"$server", mountpoint="/"} / node_filesystem_size_bytes{server=~"$server", mountpoint="/"} * 100) # Load Average node_load1{server=~"$server"} node_load5{server=~"$server"} node_load15{server=~"$server"} # Network Traffic rate(node_network_receive_bytes_total{server=~"$server", device!="lo"}[5m]) rate(node_network_transmit_bytes_total{server=~"$server", device!="lo"}[5m]) # Disk I/O rate(node_disk_read_bytes_total{server=~"$server"}[5m]) rate(node_disk_written_bytes_total{server=~"$server"}[5m])MySQL Metrics
# MySQL Up/Down mysql_up{server=~"$server"} # Connections mysql_global_status_threads_connected{server=~"$server"} mysql_global_variables_max_connections{server=~"$server"} # Queries per Second rate(mysql_global_status_queries{server=~"$server"}[5m]) # Slow Queries per Second rate(mysql_global_status_slow_queries{server=~"$server"}[5m]) # Buffer Pool Hit Rate 1 - (rate(mysql_global_status_innodb_buffer_pool_reads{server=~"$server"}[5m]) / rate(mysql_global_status_innodb_buffer_pool_read_requests{server=~"$server"}[5m])) # InnoDB Buffer Pool Usage mysql_global_status_innodb_buffer_pool_bytes_data{server=~"$server"} mysql_global_variables_innodb_buffer_pool_size{server=~"$server"}Apache Metrics
# Apache Up/Down apache_up{server=~"$server"} # Requests per Second rate(apache_accesses_total{server=~"$server"}[5m]) # Traffic (Bytes/sec) rate(apache_sent_kilobytes_total{server=~"$server"}[5m]) * 1024 # Workers apache_workers{server=~"$server", state="busy"} apache_workers{server=~"$server", state="idle"} # Uptime apache_uptime_seconds_total{server=~"$server"}Docker/Container Metrics
# Container CPU Usage rate(container_cpu_usage_seconds_total{server=~"$server", name!=""}[5m]) * 100 # Container Memory Usage container_memory_usage_bytes{server=~"$server", name!=""} # Container Network I/O rate(container_network_receive_bytes_total{server=~"$server", name!=""}[5m]) rate(container_network_transmit_bytes_total{server=~"$server", name!=""}[5m]) # Running Containers Count count(container_memory_usage_bytes{server=~"$server", name!=""})5.3 Sample Dashboard JSON
Here's a complete MySQL dashboard you can import:
{ "title": "MySQL Overview", "uid": "mysql-overview", "templating": { "list": [{ "name": "server", "type": "query", "datasource": "Prometheus", "query": "label_values(mysql_up, server)", "refresh": 2, "multi": true, "includeAll": true }] }, "panels": [ { "title": "MySQL Status", "type": "stat", "gridPos": {"h": 4, "w": 4, "x": 0, "y": 0}, "targets": [{"expr": "mysql_up{server=~\"$server\"}", "legendFormat": "{{server}}"}], "fieldConfig": { "defaults": { "mappings": [ {"options": {"0": {"text": "DOWN", "color": "red"}}, "type": "value"}, {"options": {"1": {"text": "UP", "color": "green"}}, "type": "value"} ] } } }, { "title": "Queries per Second", "type": "timeseries", "gridPos": {"h": 8, "w": 12, "x": 4, "y": 0}, "targets": [{"expr": "rate(mysql_global_status_queries{server=~\"$server\"}[5m])", "legendFormat": "{{server}}"}] }, { "title": "Connections", "type": "timeseries", "gridPos": {"h": 8, "w": 8, "x": 16, "y": 0}, "targets": [ {"expr": "mysql_global_status_threads_connected{server=~\"$server\"}", "legendFormat": "{{server}} - Connected"}, {"expr": "mysql_global_variables_max_connections{server=~\"$server\"}", "legendFormat": "{{server}} - Max"} ] } ], "schemaVersion": 38, "time": {"from": "now-1h", "to": "now"}, "refresh": "30s" }Part 6: Alerting Configuration
6.1 Prometheus Alerting Rules
Create prometheus/alerts.yml:
groups: - name: host rules: - alert: HostDown expr: up == 0 for: 1m labels: severity: critical annotations: summary: "Host {{ $labels.server }} is down" description: "{{ $labels.server }} has been unreachable for more than 1 minute." - alert: HighCPU expr: 100 - (avg by(server) (rate(node_cpu_seconds_total{mode="idle"}[5m])) * 100) > 80 for: 5m labels: severity: warning annotations: summary: "High CPU usage on {{ $labels.server }}" description: "CPU usage is above 80% (current: {{ $value | printf \"%.1f\" }}%)" - alert: HighMemory expr: (node_memory_MemTotal_bytes - node_memory_MemAvailable_bytes) / node_memory_MemTotal_bytes * 100 > 85 for: 5m labels: severity: warning annotations: summary: "High memory usage on {{ $labels.server }}" - alert: DiskSpaceLow expr: 100 - (node_filesystem_avail_bytes{mountpoint="/"} / node_filesystem_size_bytes{mountpoint="/"} * 100) > 85 for: 5m labels: severity: warning annotations: summary: "Low disk space on {{ $labels.server }}" - name: mysql rules: - alert: MySQLDown expr: mysql_up == 0 for: 1m labels: severity: critical annotations: summary: "MySQL is down on {{ $labels.server }}" - alert: MySQLTooManyConnections expr: mysql_global_status_threads_connected / mysql_global_variables_max_connections > 0.8 for: 5m labels: severity: warning annotations: summary: "MySQL connections above 80% on {{ $labels.server }}" - alert: MySQLSlowQueries expr: rate(mysql_global_status_slow_queries[5m]) > 1 for: 5m labels: severity: warning annotations: summary: "MySQL slow queries detected on {{ $labels.server }}" - name: apache rules: - alert: ApacheDown expr: apache_up == 0 for: 1m labels: severity: critical annotations: summary: "Apache is down on {{ $labels.server }}"Troubleshooting Guide
Alloy Won't Start
Check config syntax:
alloy fmt /etc/alloy/config.alloyRun manually to see errors:
alloy run /etc/alloy/config.alloyCommon errors:
component "xxx" does not exist: The exporter isn't available in Alloypermission denied: Runusermod -aGand restartalloy
No Data in Grafana
Check if Alloy is scraping:
curl -s http://localhost:12345/metrics | grep -c "^[a-z]"Check if Prometheus is receiving:
curl -s 'http://PROMETHEUS_IP:9090/api/v1/query?query=up' | jq '.data.result[].metric.server'Check Alloy logs:
journalctl -u alloy -f --no-pagerLoki "Ingester Not Ready"
This is normal on first start. Wait 15-30 seconds:
watch -n 5 'curl -s http://localhost:3100/ready'Permission Denied on Logs
# Check current groups groups alloy # Add to required groups usermod -aG adm alloy # System logs usermod -aG mysql alloy # MySQL logs usermod -aG www-data alloy # Apache logs usermod -aG docker alloy # Docker socket # Restart systemctl restart alloyHigh Memory Usage
Alloy can consume significant memory with many targets. Tune the config:
prometheus.remote_write "default" { endpoint { url = "http://prometheus:9090/api/v1/write" queue_config { max_samples_per_send = 500 # Reduce from default 2000 capacity = 2500 # Reduce from default 10000 max_shards = 10 # Limit parallelism } } }Performance Tuning
Prometheus Storage
For longer retention or higher cardinality:
command: - '--storage.tsdb.retention.time=90d' - '--storage.tsdb.retention.size=100GB' - '--storage.tsdb.wal-compression' - '--storage.tsdb.max-block-duration=2h'Loki Optimization
For high-volume log ingestion:
limits_config: ingestion_rate_mb: 20 ingestion_burst_size_mb: 40 per_stream_rate_limit: 5MB per_stream_rate_limit_burst: 15MBAlloy Resource Limits
Create /etc/systemd/system/alloy.service.d/limits.conf:
[Service] MemoryMax=512M CPUQuota=50%systemctl daemon-reload systemctl restart alloyConclusion
We've built a complete monitoring stack that:
- Collects metrics from hosts, MySQL, Apache, and Docker containers
- Aggregates logs from all sources into Loki
- Visualizes everything in Grafana dashboards
- Uses a single agent (Alloy) per server instead of multiple exporters
Key Takeaways
- Alloy simplifies operations by consolidating multiple agents
- Remote write eliminates the need for Prometheus to scrape targets
- Consistent labeling (server, environment) enables powerful filtering
- Docker Compose makes the central stack portable and reproducible
Next Steps
- Add Alertmanager for alert routing and notifications
- Implement Tempo for distributed tracing
- Scale to Mimir for long-term metrics storage
- Set up recording rules for complex/expensive queries