Monitoring Module Architecture

TOC
Overall Architecture Explanation
The monitoring system consists of the following core functional modules:
- Monitoring System
- Data Collection and Storage: Collecting and persisting monitoring metrics from multiple sources
- Data Query and Visualization: Providing flexible query and visualization capabilities for monitoring data
- Alerting System
- Alert Rule Management: Configuring and managing alert policies
- Alert Triggering and Notification: Evaluating alert rules and dispatching notifications
- Real-time Alert Status: Providing a real-time view of the current alert status of the system
- Notification System
- Notification Configuration: Managing notification templates, contact groups, and policies
- Notification Server: Managing the configuration of various notification channels
Monitoring System
Data Collection and Storage
- Prometheus/VictoriaMetrics Operator Responsibilities:
- Load and validate monitoring collection configurations
- Load and validate alert rule configurations
- Synchronize configurations to Prometheus/VictoriaMetrics instances
- Sources of Monitoring Data:
- Nevermore: Generates log-related metrics
- Warlock: Generates event-related metrics
- Prometheus/VictoriaMetrics: Discovers and collects various exporters' metrics via ServiceMonitor
Data Query and Visualization
-
Monitoring Data Query Process:
- The browser initiates a query request (Path:
/platform/monitoring.alauda.io/v1beta1
)
- ALB forwards the request to the Courier component
- Courier API processes the query:
- Built-in Metrics: Obtains PromQL through the indicators interface and queries
- Custom Metrics: Directly forwards PromQL to the monitoring component
- The monitoring dashboard retrieves data and displays it
-
Monitoring Dashboard Management Process:
- Users access the
global
cluster ALB (Path: /kubernetes/cluster_name/apis/ait.alauda.io/v1alpha2/MonitorDashboard
)
- ALB forwards the request to the Erebus component
- Erebus routes the request to the target monitoring cluster
- The Warlock component is responsible for:
- Validating the legality of the monitoring dashboard configuration
- Managing the MonitorDashboard CR resource
Alerting System
Alert Rule Management
The alert rule configuration process:
- Users access the
global
cluster ALB (Path: /kubernetes/cluster_name/apis/monitoring.coreos.com/v1/prometheusrules
)
- The request passes through ALB -> Erebus -> target cluster kube-apiserver
- Responsibilities of each component:
- Prometheus/VictoriaMetrics Operator:
- Validating the legality of alert rules
- Managing PrometheusRule CR
- Nevermore: Listening for and processing log alert metrics
- Warlock: Listening for and processing event alert metrics
Alert Processing Workflow
- Alert Evaluation:
- PrometheusRule/VMRule defines alert rules
- Prometheus/VictoriaMetrics evaluates rules periodically
- Alert Notification:
- Alerts are sent to Alertmanager once triggered
- Alertmanager -> ALB -> Courier API
- Courier API is responsible for dispatching notifications
- Alert Storage:
- Alert history is stored in ElasticSearch/ClickHouse
Real-time Alert Status
- Status Collection:
- The
global
cluster Courier generates metrics:
- cpaas_active_alerts: Current active alerts
- cpaas_active_silences: Current silence configurations
- Global Prometheus collects every 15 seconds
- Status Display:
- The front-end queries and displays real-time status via Courier API
Notification System
Notification Configuration Management
The management process for notification templates, notification contact groups, and notification policies is as follows:
- Users access the standard API of the
global
cluster via a browser
- Access path:
/apis/ait.alauda.io/v1beta1/namespaces/cpaas-system
- Managing related resources:
- Notification Template: apiVersion: "ait.alauda.io/v1beta1", kind: "NotificationTemplate"
- Notification Contact Group: apiVersion: "ait.alauda.io/v1beta1", kind: "NotificationGroup"
- Notification Policy: apiVersion: "ait.alauda.io/v1beta1", kind: "Notification"
- Courier is responsible for:
- Validating the legality of notification templates
- Validating the legality of notification contact groups
- Validating the legality of notification policies
Notification Server Management
- Users access the
global
cluster's ALB via a browser
- Access path:
/kubernetes/global/api/v1/namespaces/cpaas-system/secrets
- Managing and submitting notification server configurations
- Resource name: platform-email-server
- Courier is responsible for:
- Validating the legality of the notification server configuration