Concepts

Monitoring

Metrics

Metrics are used to quantitatively describe the operating status of a system, and each metric consists of four basic elements:

  • Metric Name: Used to identify the monitored object, such as cpu_usage
  • Metric Value: Specific measurement value, such as 85.5
  • Timestamp: Records the time of measurement
  • Labels: Used for multidimensional data classification, such as {pod="nginx-1", namespace="default"}

PromQL

PromQL is the query language for Prometheus, used to query and aggregate metric data from the monitoring system.

Built-in Indicators

The platform has preset a series of commonly used monitoring metrics based on long-term operational experience. You can directly use these metrics when configuring alarm rules or creating monitoring dashboards without additional configuration.

Exporter

The Exporter is a component for collecting monitoring data, with primary responsibilities including:

  • Collecting raw monitoring data from the target system
  • Transforming data into a standard time-series metric format
  • Providing metric data for querying via HTTP interface

ServiceMonitor

ServiceMonitor is used to declaratively manage monitoring configurations and primarily defines:

  • The selection criteria for monitoring targets
  • Configuration of metric collection interfaces
  • Execution parameters for collection tasks (intervals, timeouts, etc.)

Alarms

Alarm Rules

Alarm rules define the specific conditions for triggering alarms:

  • Alarm Expression: Describes the conditions for triggering an alarm using PromQL statements
  • Alarm Threshold: Explicit boundary values for trigger
  • Duration: Duration for which the conditions must be continuously met
  • Alarm Level: Distinguishes the severity of alarms (e.g., P0/P1/P2)

Alarm Policies

Alarm policies organize multiple alarm rules together for unified configuration:

  • Alarm Targets: The target scope of the rules
  • Notification Method: The channels for sending alarms
  • Sending Interval: The time interval for repeated alarm notifications

Notifications

Notification Policies

Notification policies manage the rules for sending alarm messages:

  • Recipients: Target users for alarm notifications
  • Notification Channels: Supported message sending methods
  • Notification Templates: Definition of message content format

Notification Templates

Notification templates customize the display format of alarm messages:

  • Title Template: Format of the alarm message title
  • Content Template: Organization of alarm details
  • Variable Replacement: Supports dynamic data filling

Monitoring Dashboard

Dashboard

A dashboard is a collection of multiple related pannels, providing an overall view of the system status. It supports flexible layout arrangements and can organize pannels in rows or columns.

Pannels

Pannels are visual representations of monitoring data, supporting various display types.

Data Sources

The configuration of monitoring data sources. Currently, only the monitoring components of the current cluster are supported as data sources, and custom data sources are not supported for now.

Variables

Variables serve as placeholders for values and can be used in metric queries. Through the variable selector at the top of the dashboard, you can dynamically adjust query conditions, allowing chart content to update in real-time.