Skip to content

监控服务部署指南

概述

本指南介绍如何在 monitor-storage (192.168.1.162) 上独立部署 Prometheus + Grafana 监控系统。

架构特点:

  • 独立部署,不依赖 K8s 集群
  • 使用 Docker Compose 管理服务
  • 通过 EasyTier 网络访问
  • 数据持久化到本地存储

一、环境准备

1.1 系统要求

项目规格
CPU4 核
内存4 GB
系统盘32 GB
数据盘200GB(/dev/sdb)
操作系统Ubuntu 24.04

1.2 安装 Docker

bash
# 更新系统
sudo apt update && sudo apt upgrade -y

# 安装 Docker
curl -fsSL https://get.docker.com | sh

# 添加当前用户到 docker 组
sudo usermod -aG docker $USER
newgrp docker

# 验证安装
docker --version
docker compose version

二、目录规划

bash
# 创建服务目录
sudo mkdir -p /opt/{prometheus,grafana,nfs-exports}
sudo mkdir -p /opt/nfs-exports/{backups,configs,shared}

# 创建数据目录
sudo mkdir -p /opt/prometheus/{data,conf}
sudo mkdir -p /opt/grafana/{data,dashboards,datasources}

# 设置权限
sudo chown -R $USER:$USER /opt/prometheus /opt/grafana /opt/nfs-exports

三、部署 Prometheus

3.1 创建配置文件

bash
cat << 'EOF' > /opt/prometheus/conf/prometheus.yml
global:
  scrape_interval: 15s
  evaluation_interval: 15s
  external_labels:
    cluster: 'home-lab'
    environment: 'monitoring'

alerting:
  alertmanagers:
    - static_configs:
        - targets: []

rule_files: []

scrape_configs:
  # Prometheus 自身监控
  - job_name: 'prometheus'
    static_configs:
      - targets: ['localhost:9090']
    metrics_path: /metrics

  # Node Exporter - 主机监控
  - job_name: 'node-exporter'
    static_configs:
      - targets: 
        - '192.168.1.161:9100'  # easytier-gateway
        - '192.168.1.162:9100'  # monitor-storage
        - '192.168.1.163:9100'  # gitea-artifact
        - '192.168.1.165:9100'  # k8s-master
        - '192.168.1.166:9100'  # k8s-worker-1
EOF

3.2 创建 Docker Compose 文件

bash
cat << 'EOF' > /opt/prometheus/docker-compose.yml
version: '3.8'

services:
  prometheus:
    image: prom/prometheus:v2.50.0
    container_name: prometheus
    restart: unless-stopped
    ports:
      - "9090:9090"
    volumes:
      - ./conf/prometheus.yml:/etc/prometheus/prometheus.yml:ro
      - ./data:/prometheus
    command:
      - '--config.file=/etc/prometheus/prometheus.yml'
      - '--storage.tsdb.path=/prometheus'
      - '--storage.tsdb.retention.time=30d'
      - '--web.enable-lifecycle'
    networks:
      - monitoring
    healthcheck:
      test: ["CMD", "wget", "-q", "--spider", "http://localhost:9090/-/healthy"]
      interval: 30s
      timeout: 10s
      retries: 3

  node-exporter:
    image: prom/node-exporter:v1.7.0
    container_name: node-exporter
    restart: unless-stopped
    ports:
      - "9100:9100"
    command:
      - '--path.procfs=/host/proc'
      - '--path.sysfs=/host/sys'
      - '--path.rootfs=/host'
      - '--collector.filesystem.mount-points-exclude=^/(sys|proc|dev|host|etc)($$|/)'
    volumes:
      - /proc:/host/proc:ro
      - /sys:/host/sys:ro
      - /:/host:ro
    networks:
      - monitoring

networks:
  monitoring:
    name: monitoring
EOF

3.3 启动 Prometheus

bash
cd /opt/prometheus

# 拉取镜像
docker compose pull

# 启动服务
docker compose up -d

# 查看状态
docker compose ps

# 查看日志
docker compose logs -f prometheus

3.4 验证

浏览器访问:http://192.168.1.162:9090

检查 Targets 页面,确认所有节点状态为 UP。


四、部署 Grafana

4.1 创建数据源配置

bash
cat << 'EOF' > /opt/grafana/datasources/prometheus.yml
apiVersion: 1

datasources:
  - name: Prometheus
    type: prometheus
    access: proxy
    url: http://prometheus:9090
    isDefault: true
    editable: false
    uid: prometheus
EOF

4.2 创建仪表盘配置

bash
cat << 'EOF' > /opt/grafana/dashboards/dashboards.yml
apiVersion: 1

providers:
  - name: 'default'
    orgId: 1
    folder: ''
    type: file
    disableDeletion: false
    editable: true
    options:
      path: /etc/grafana/provisioning/dashboards
EOF

4.3 创建 Docker Compose 文件

bash
cat << 'EOF' > /opt/grafana/docker-compose.yml
version: '3.8'

services:
  grafana:
    image: grafana/grafana:10.4.0
    container_name: grafana
    restart: unless-stopped
    ports:
      - "3000:3000"
    volumes:
      - ./data:/var/lib/grafana
      - ./dashboards:/etc/grafana/provisioning/dashboards
      - ./datasources:/etc/grafana/provisioning/datasources
    environment:
      - GF_SECURITY_ADMIN_USER=admin
      - GF_SECURITY_ADMIN_PASSWORD=admin123
      - GF_USERS_ALLOW_SIGN_UP=false
      - GF_SERVER_ROOT_URL=http://monitor.hoseahu.cn/
      - GF_SERVER_DOMAIN=monitor.hoseahu.cn
    networks:
      - monitoring

networks:
  monitoring:
    name: monitoring
    external: true
EOF

4.4 启动 Grafana

bash
cd /opt/grafana

# 拉取镜像
docker compose pull

# 启动服务
docker compose up -d

# 查看状态
docker compose ps

4.5 验证

  1. 浏览器访问:http://192.168.1.162:3000
  2. 默认账号密码:admin / admin123
  3. 首次登录会要求修改密码

五、配置 Node Exporter

在所有需要监控的服务器上安装 Node Exporter:

bash
# 在所有被监控节点执行
wget https://github.com/prometheus/node_exporter/releases/download/v1.7.0/node_exporter-1.7.0.linux-amd64.tar.gz
tar xvf node_exporter-1.7.0.linux-amd64.tar.gz
cd node_exporter-1.7.0.linux-amd64
sudo cp node_exporter /usr/local/bin/

# 创建 systemd 服务
cat << 'EOF' | sudo tee /etc/systemd/system/node-exporter.service
[Unit]
Description=Node Exporter
After=network-online.target

[Service]
User=root
ExecStart=/usr/local/bin/node_exporter
Restart=on-failure

[Install]
WantedBy=multi-user.target
EOF

# 启动服务
sudo systemctl daemon-reload
sudo systemctl enable node-exporter
sudo systemctl start node-exporter

# 检查状态
sudo systemctl status node-exporter

六、导入官方仪表盘

6.1 Node Exporter Full

Grafana Dashboard ID: 1860

导入步骤:

  1. 登录 Grafana
  2. 点击左侧菜单 → Dashboards → Import
  3. 输入 Dashboard ID: 1860
  4. 选择 Prometheus 数据源
  5. 点击 Import

6.2 常用仪表盘

Dashboard ID名称用途
1860Node Exporter Full主机监控
3662Prometheus StatsPrometheus 状态
179Docker and Container MonitoringDocker 监控

七、配置告警

7.1 创建告警规则

bash
cat << 'EOF' > /opt/prometheus/conf/alert_rules.yml
groups:
  - name: host_alerts
    interval: 30s
    rules:
      - alert: HostHighCpuLoad
        expr: 100 - (avg by (instance) (rate(node_cpu_seconds_total{mode="idle"}[5m])) * 100) > 80
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "High CPU load on {{ $labels.instance }}"

      - alert: HostHighMemoryUsage
        expr: (1 - (node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes)) * 100 > 85
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "High memory usage on {{ $labels.instance }}"

      - alert: HostDown
        expr: up == 0
        for: 1m
        labels:
          severity: critical
        annotations:
          summary: "Host {{ $labels.instance }} is down"
EOF

7.2 重载配置

bash
curl -X POST http://localhost:9090/-/reload

八、备份策略

8.1 备份脚本

bash
cat << 'EOF' > /opt/scripts/backup-monitoring.sh
#!/bin/bash
# 备份监控数据

BACKUP_DIR="/exports/backups/monitoring"
DATE=$(date +%Y%m%d_%H%M%S)

mkdir -p $BACKUP_DIR

# 备份 Prometheus 数据
tar czf $BACKUP_DIR/prometheus_data_$DATE.tar.gz -C /opt/prometheus data/

# 备份 Grafana 配置
tar czf $BACKUP_DIR/grafana_config_$DATE.tar.gz -C /opt/grafana data/

# 保留最近 7 天的备份
find $BACKUP_DIR -name "*.tar.gz" -mtime +7 -delete

echo "Backup completed: $DATE"
EOF

chmod +x /opt/scripts/backup-monitoring.sh

8.2 配置定时备份

bash
# 添加 cron 任务(每天凌晨 3 点执行)
(crontab -l 2>/dev/null; echo "0 3 * * * /opt/scripts/backup-monitoring.sh") | crontab -

九、故障排查

常用诊断命令

bash
# Prometheus
docker compose -f /opt/prometheus/docker-compose.yml ps
docker compose -f /opt/prometheus/docker-compose.yml logs prometheus

# Grafana
docker compose -f /opt/grafana/docker-compose.yml ps
docker compose -f /opt/grafana/docker-compose.yml logs grafana

# 健康检查
curl http://localhost:9090/-/healthy
curl http://localhost:3000/api/health
curl http://localhost:9100/metrics

常见问题

问题原因解决方法
Prometheus 无法连接数据源网络问题检查防火墙和容器网络
Grafana 仪表盘无数据数据源配置错误检查 Prometheus 数据源配置
Node Exporter 无数据服务未启动检查 systemctl status

十、访问配置

通过 EasyTier 访问

服务地址说明
Prometheushttp://192.168.1.162:9090prometheus.hoseahu.cn
Grafanahttp://192.168.1.162:3000monitor.hoseahu.cn
Node Exporterhttp://192.168.1.162:9100本机指标

基于开源技术构建