logo
Alauda Container Platform
English
简体中文
English
简体中文
logo
Alauda Container Platform
Navigation

Overview

Architecture
Release Notes

Install

Overview

Prepare for Installation

Prerequisites
Download
Node Preprocessing
Installing
global Cluster Disaster Recovery

Upgrade

Overview
Pre-Upgrade Preparation
Upgrade the global cluster
Upgrade Workload Clusters

User Interface

Web Console

Overview
Accessing the Web Console
Customizing the Web Console
Customizing the Left Navigation
CLI Tools

Configure

Feature Gate

Clusters

Overview
Creating an On-Premise Cluster
etcd Encryption
Automated Rotate Kuberentes Certificates

How to

Add External Address for Built-in Registry
Choosing a Container Runtime
Updating Public Repository Credentials

Networking

Introduction

Architecture

Understanding Kube-OVN
Understanding ALB
Understanding MetalLB

Concepts

Auth
Ingress-nginx Annotation Compatibility
TCP/HTTP Keepalive
ModSecurity
Comparison Among Different Ingress Method
HTTP Redirect
L4/L7 Timeout
GatewayAPI
OTel

Guides

Creating Services
Creating Ingresses
Configure Gateway
Create Ingress-Nginx
Creating a Domain Name
Creating Certificates
Creating External IP Address Pool
Creating BGP Peers
Configure Subnets
Configure Network Policies
Creating Admin Network Policies
Configure Cluster Network Policies

How To

Deploy High Available VIP for ALB
Soft Data Center LB Solution (Alpha)
Preparing Kube-OVN Underlay Physical Network
Automatic Interconnection of Underlay and Overlay Subnets
Use OAuth Proxy with ALB
Creating GatewayAPI Gateway
Configure a Load Balancer
How to properly allocate CPU and memory resources
Forwarding IPv6 Traffic to IPv4 Addresses within the Cluster
Calico Network Supports WireGuard Encryption
Kube-OVN Overlay Network Supports IPsec Encryption
ALB Monitoring
Load Balancing Session Affinity Policy in Application Load Balancer (ALB)

Trouble Shooting

How to Solve Inter-node Communication Issues in ARM Environments?
Find Who Cause the Error

Machine Configuration

Overview
Managing Node Configuration with MachineConfig
Node Disruption Policies

Storage

Introduction

Concepts

Core Concepts
Persistent Volume
Access Modes and Volume Modes

Guides

Creating CephFS File Storage Type Storage Class
Creating CephRBD Block Storage Class
Create TopoLVM Local Storage Class
Creating an NFS Shared Storage Class
Deploy Volume Snapshot Component
Creating a PV
Creating PVCs
Using Volume Snapshots

How To

Setting the naming rules for subdirectories in the NFS Shared Storage Class
Generic ephemeral volumes
Using an emptyDir
Third‑Party Storage Capability Annotation Guide

Troubleshooting

Recover From PVC Expansion Failure

Storage

Ceph Distributed Storage

Introduction

Install

Create Standard Type Cluster
Create Stretch Type Cluster
Architecture

Concepts

Core Concepts

Guides

Accessing Storage Services
Managing Storage Pools
Node-specific Component Deployment
Adding Devices/Device Classes
Monitoring and Alerts

How To

Configure a Dedicated Cluster for Distributed Storage
Cleanup Distributed Storage

Disaster Recovery

File Storage Disaster Recovery
Block Storage Disaster Recovery
Object Storagge Disaster Recovery
Update the optimization parameters
Create ceph object store user

MinIO Object Storage

Introduction
Install
Architecture

Concepts

Core Concepts

Guides

Adding a Storage Pool
Monitoring & Alerts

How To

Data Disaster Recovery

TopoLVM Local Storage

Introduction
Install

Guides

Device Management
Monitoring and Alerting

How To

Backup and Restore TopoLVM Filesystem PVCs with Velero

Security

Alauda Container Security

Security and Compliance

Compliance

Introduction
Installation

HowTo

Private Registry Access Configuration
Image Signature Verification Policy
Image Signature Verification Policy with Secrets
Image Registry Validation Policy
Container Escape Prevention Policy
Security Context Enforcement Policy
Network Security Policy
Volume Security Policy

API Refiner

Introduction
Install

Users and Roles

User

Introduction

Guides

Manage User Roles
Create User
User Management

Group

Introduction

Guides

Manage User Group Roles
Create Local User Group
Manage Local User Group Membership

Role

Introduction

Guides

Create Role
Manage Custom Roles

IDP

Introduction

Guides

LDAP Management
OIDC Management

Troubleshooting

Delete User

User Policy

Introduction

Multitenancy(Project)

Introduction

Guides

Create Project
Manage Project
Manage Project Cluster
Manage Project Members

Audit

Introduction

Telemetry

Install

Virtualization

Virtualization

Overview

Introduction
Install

Images

Introduction

Guides

Adding Virtual Machine Images
Update/Delete Virtual Machine Images
Update/Delete Image Credentials

How To

Creating Windows Images Based on ISO using KubeVirt
Creating Linux Images Based on ISO Using KubeVirt
Exporting Virtual Machine Images
Permissions

Virtual Machine

Introduction

Guides

Creating Virtual Machines/Virtual Machine Groups
Batch Operations on Virtual Machines
Logging into the Virtual Machine using VNC
Managing Key Pairs
Managing Virtual Machines
Monitoring and Alerts
Quick Location of Virtual Machines

How To

Configuring USB host passthrough
Virtual Machine Hot Migration
Virtual Machine Recovery
Clone Virtual Machines on KubeVirt
Physical GPU Passthrough Environment Preparation
Configuring High Availability for Virtual Machines
Create a VM Template from an Existing Virtual Machine

Troubleshooting

Pod Migration and Recovery from Abnormal Shutdown of Virtual Machine Nodes
Hot Migration Error Messages and Solutions

Network

Introduction

Guides

Configure Network

How To

Control Virtual Machine Network Requests Through Network Policy
Configuring SR-IOV
Configuring Virtual Machines to Use Network Binding Mode for IPv6 Support

Storage

Introduction

Guides

Managing Virtual Disks

Backup and Recovery

Introduction

Guides

Using Snapshots

Developer

Overview

Quick Start

Creating a simple application via image

Building Applications

Concepts

Application Types
Custom Applications
Workload Types
Understanding Parameters
Understanding Environment Variables
Understanding Startup Commands
Resource Unit Description

Namespaces

Creating Namespaces
Importing Namespaces
Resource Quota
Limit Range
Pod Security Admission
Overcommit Ratio
Managing Namespace Members
Updating Namespaces
Deleting/Removing Namespaces

Creating Applications

Creating applications from Image
Creating applications from Chart
Creating applications from YAML
Creating applications from Code
Creating applications from Operator Backed
Creating applications by using CLI

Operation and Maintaining Applications

Application Rollout

Installing Alauda Container Platform Argo Rollouts
Application Blue Green Deployment
Application Canary Deployment
Status Description

KEDA(Kubernetes Event-driven Autoscaling)

KEDA Overview
Installing KEDA

How To

Integrating ACP Monitoring with Prometheus Plugin
Pausing Autoscaling in KEDA
Configuring HPA
Starting and Stopping Applications
Configuring VerticalPodAutoscaler (VPA)
Configuring CronHPA
Updating Applications
Exporting Applications
Updating and deleting Chart Applications
Version Management for Applications
Deleting Applications
Health Checks

Workloads

Deployments
DaemonSets
StatefulSets
CronJobs
Jobs
Pods
Containers
Working with Helm charts

Configurations

Configuring ConfigMap
Configuring Secrets

Application Observability

Monitoring Dashboards
Logs
Events

How To

Setting Scheduled Task Trigger Rules

Registry

Introduction

Install

Install Via YAML
Install Via Web UI

How To

Common CLI Command Operations
Using Alauda Container Platform Registry in Kubernetes Clusters

Source to Image

Introduction

Install

Installing Alauda Container Platform Builds

Upgrading

Upgrading Alauda Container Platform Builds
Architecture

Guides

Managing applications created from Code

How To

Creating an application from Code

Node Isolation Strategy

Introduction
Architecture

Concepts

Core Concepts

Guides

Create Node Isolation Strategy
Permissions
FAQ

GitOps

Introduction

Install

Installing Alauda Build of Argo CD
Installing Alauda Container Platform GitOps

Upgrade

Upgrading Alauda Container Platform GitOps
Architecture

Concepts

GitOps

Argo CD Concept

Introduction
Application
ApplicationSet
Tool
Helm
Kustomize
Directory
Sync
Health

Alauda Container Platform GitOps Concepts

Introduction
Alauda Container Platform GitOps Sync and Health Status

Guides

Creating GitOps Application

Creating GitOps Application
Creating GitOps ApplicationSet

GitOps Observability

Argo CD Component Monitoring
GitOps Applications Ops

How To

Integrating Code Repositories via Argo CD dashboard
Creating an Argo CD Application via Argo CD dashboard
Creating an Argo CD Application via the web console
How to Obtain Argo CD Access Information
Troubleshooting

Extend

Operator
Cluster Plugin

Observability

Overview

Monitoring

Introduction
Install

Architecture

Monitoring Module Architecture
Monitoring Component Selection Guide
Concepts

Guides

Management of Metrics
Management of Alert
Management of Notification
Management of Monitoring Dashboards
Management of Probe

How To

Backup and Restore of Prometheus Monitoring Data
VictoriaMetrics Backup and Recovery of Monitoring Data
Collect Network Data from Custom-Named Network Interfaces

Distributed Tracing

Introduction
Install
Architecture
Concepts

Guides

Query Tracing
Query Trace Logs

How To

Non-Intrusive Integration of Tracing in Java Applications
Business Log Associated with the TraceID

Troubleshooting

Unable to Query the Required Tracing
Incomplete Tracing Data

Logs

Introduction
Install

Architecture

Log Module Architecture
Log Component Selection Guide
Log Component Capacity Planning
Concepts

Guides

Logs

How To

How to Archive Logs to Third-Party Storage
How to Interface with External ES Storage Clusters

Events

Introduction
Events

Inspection

Introduction
Architecture

Guides

Inspection
Component Health Status

Hardware accelerators

Overview

Introduction
Features
Install

Application Development

Introduction

Guides

CUDA Driver and Runtime Compatibility
Add Custom Devices Using ConfigMap

Troubleshooting

Troubleshooting float16 is only supported on GPUs with compute capability at least xx Error in vLLM
Paddle Autogrow Memory Allocation Crash on GPU-Manager

Configuration Management

Introduction

Guides

Configure Hardware accelerator on GPU nodes

Resource Monitoring

Introduction

Guides

GPU Resource Monitoring

Alauda Service Mesh

About Alauda Service Mesh

Alauda AI

About Alauda AI

Alauda DevOps

About Alauda DevOps

Alauda Cost Management

About Alauda Cost Management

Alauda Application Services

Overview

Introduction
Architecture
Install
Upgrade

Alauda Database Service for MySQL

About Alauda Database Service for MySQL-MGR
About Alauda Database Service for MySQL-PXC

Alauda Cache Service for Redis OSS

About Alauda Cache Service for Redis OSS

Alauda Streaming Service for Kafka

About Alauda Streaming Service for Kafka

Alauda Streaming Service for RabbitMQ

About Alauda Streaming Service for RabbitMQ

Alauda support for PostgreSQL

About Alauda support for PostgreSQL

Operations Management

Introduction

Parameter Template Management

Introduction

Guides

Parameter Template Management

Backup Management

Introduction

Guides

External S3 Storage
Backup Management

Inspection Management

Introduction

Guides

Create Inspection Task
Exec Inspection Task
Update and Delete Inspection Tasks

How To

How to set Inspection scheduling?

Inspection Optimization Recommendations

MySQL

MySQL IO Load Optimization
MySQL Memory Usage Optimization
MySQL Storage Space Optimization
MySQL Active Thread Count Optimization
MySQL Row Lock Optimization

Redis

Redis BigKey
High CPU Usage in Redis
High Memory Usage in Redis

Kafka

High CPU Utilization in Kafka
Kafka Rebalance Optimization
Kafka Memory Usage Optimization
Kafka Storage Space Optimization

RabbitMQ

RabbitMQ Mnesia Database Exception Handling

Alert Management

Introduction

Guides

Relationship with Platform Capabilities

Upgrade Management

Introduction

Guides

Instance Upgrade

API Reference

Overview

Introduction
Kubernetes API Usage Guide

Advanced APIs

Alert APIs

AlertHistories [v1]
AlertHistoryMessages [v1]
AlertStatus [v2]
SilenceStatus [v2]

Event APIs

Search

Log APIs

Aggregation
Archive
Context
Search

Monitoring APIs

Indicators [monitoring.alauda.io/v1beta1]
Metrics [monitoring.alauda.io/v1beta1]
Variables [monitoring.alauda.io/v1beta1]

Kubernetes APIs

Alert APIs

AlertTemplate [alerttemplates.aiops.alauda.io/v1beta1]
PrometheusRule [prometheusrules.monitoring.coreos.com/v1]

Inspection APIs

Inspection [inspections.ait.alauda.io/v1alpha1]

Notification APIs

Notification [notifications.ait.alauda.io/v1beta1]
NotificationGroup [notificationgroups.ait.alauda.io/v1beta1]
NotificationTemplate [notificationtemplates.ait.alauda.io/v1beta1]
📝 Edit this page on GitHub
Previous PageManagement of Notification
Next PageManagement of Probe

#Management of Monitoring Dashboards

#TOC

#Function Overview

The platform provides powerful dashboard management functionality designed to replace traditional Grafana tools, offering users a more comprehensive and flexible monitoring experience. This feature aggregates various monitoring data from within the platform, presenting a unified monitoring view that significantly enhances your configuration efficiency.

#Main Features

  • Supports configuring custom monitoring dashboards for both business views and platform views.
  • Enables viewing publicly shared dashboards configured in platform views from business views, with data isolated based on the namespace to which the business belongs.
  • Supports managing panels within the dashboard, allowing users to add, delete, modify panels, zoom in/out panels, and move panels through drag-and-drop.
  • Allows setting custom variables within the dashboard for filtering query data.
  • Supports configuring groups within the dashboard for managing the panels. Groups can be displayed repeatedly based on custom variables.
  • Supported panel types include: trend、step line chart、bar chart、horizontal bar chart、bar gauge chart、gauge chart、table、stat chart、XY chart、pie chart、text.
  • One-click import feature for Grafana dashboards.

#Advantages

  • Supports user-customized monitoring scenarios without being constrained by predefined templates, truly achieving a personalized monitoring experience.
  • Provides a rich array of visualization options, including line charts, bar charts, pie charts, and flexible layout and styling options.
  • Integrates seamlessly with the platform's role permissions, allowing business views to define their own monitoring dashboards while ensuring data isolation.
  • Deep integration with various functionalities of the container platform, enabling instant access to monitoring data for containers, networks, storage, etc., providing users with comprehensive performance observation and fault diagnosis.
  • Fully compatible with Grafana dashboard JSON, allowing easy migration from Grafana for continued use.

#Use Cases

  • IT Operations Management: As part of the IT operations team, you can use the monitoring dashboards to unify the display and management of various performance metrics of the container platform, such as CPU, memory, network traffic, etc. By customizing monitoring reports and alert rules, you can promptly detect and pinpoint system issues, enhancing operational efficiency.
  • Application Performance Analysis: For application developers and testers, monitoring dashboards offer various rich visualization options to intuitively display application running states and resource consumption. You can customize dedicated monitoring views tailored to different application scenarios to deeply analyze application performance bottlenecks and provide a basis for optimization.
  • Multi-Cluster Management: For users managing multiple container clusters, monitoring dashboards can aggregate monitoring data from disparate clusters, allowing you to grasp the overall operational state of the system at a glance.
  • Fault Diagnosis: When a system issue occurs, monitoring dashboards provide you with comprehensive performance data and analytical tools to quickly pinpoint the root cause of the problem. You can swiftly view fluctuations in relevant monitoring metrics based on alert information for in-depth fault analysis.

#Prerequisites

Currently, monitoring dashboards only support viewing monitoring data collected by monitoring components installed in the platform. Therefore, you should prepare as follows before configuring a monitoring dashboard:

  • Ensure that the cluster for which you want to configure the monitoring dashboard has monitoring components installed, specifically the ACP Monitor with Prometheus or ACP Monitor with VictoriaMetrics plugin.
  • Ensure that the data you wish to display on the dashboard has been collected by the monitoring components.

#Relationship Between Monitoring Dashboards and Monitoring Components

  • Monitoring dashboard resources are stored in the Kubernetes cluster. You can switch views between different clusters using the Cluster tab at the top.
  • Monitoring dashboards depend on the monitoring components in the cluster for querying data sources. Therefore, before using monitoring dashboards, ensure that the current cluster has successfully installed monitoring components and that they are operating normally.
  • The monitoring dashboard will default to requesting monitoring data from the corresponding cluster. If you install the VictoriaMetrics plugin in proxy mode in the cluster, we will request the storage cluster for you to query the corresponding data for this cluster without the need for special configuration.

#Manage Dashboards

A dashboard is a collection composed of one or more panels, organized and arranged in one or more rows to provide a clear view of relevant information. These panels can query raw data from data sources and transform it into a series of visual effects supported by the platform.

#Create a Dashboard

  1. Click Create Dashboard, reference the following instructions to configure relevant parameters.
ParameterDescription
FolderThe folder where the dashboard resides; you can input or select an existing folder.
LabelLabel for the monitoring dashboard; you can quickly find existing dashboards by filtering through the top labels during the switch.
Set as Main DashboardIf enabled, this will set the current dashboard as the main dashboard upon successful creation; when re-entering the monitoring dashboard feature, the main dashboard data will be displayed by default.
VariablesAdd variables when creating the dashboard to reference as metric parameters in the added panels, which can also be used as filters on the dashboard homepage.
  1. After adding, click Create to finish creating the dashboard. Next, you need to add variables, add panels, and add groups to complete the overall layout design.

#Import Dashboard

The platform supports direct import of Grafana JSON to convert it into a monitoring dashboard for display.

  • Currently, only Grafana JSON of version V8+ is supported; lower versions will be prohibited from being imported.
  • If any panels within the imported dashboard are not within the platform's supported scope, they may be displayed as unsupported panel types, but you can modify the panel's settings to achieve normal display.
  • After importing the dashboard, you can perform any management actions as usual, which will not differ from panels created in the platform.

#Add Variables

  1. In the variable form area, click Add.

Query

Variables of type Query allow you to filter data based on the feature dimensions of time series. The query expression can be specified to dynamically calculate and generate query results.

ParameterDescription
Query SettingsWhen defining query settings, besides using PromQL to query time series, the platform also provides some common variables and functions. Reference Common Functions and Variables.
Regular ExpressionBy using regular expressions, you can filter out the desired values from the content returned by the variable queries. This makes each option name in the variable more expected. You can preview if the filtered values meet expectations in Variable Value Preview.
Selection Settings- Multiple Selection: When selected from the top filters on the dashboard homepage, allows the selection of multiple options simultaneously. You need to reference this variable in the query expression of the panels to view the data corresponding to the variable value.
- All: If checked, an option containing All will be enabled in the filter options to select all variable data.

Constant

Constant Variables are static variables with fixed values that remain unchanged throughout the dashboard, commonly used for storing environment identifiers, fixed thresholds, or configuration parameters that need to be referenced across multiple panels without displaying as filter options.

ParameterDescription
Constant ValueThe value of the constant variable.

Custom

Custom Variables allow users to define a predefined list of static options that appear as dropdown filters on the dashboard, commonly used for manual selection of specific services, teams, or categories without requiring dynamic data queries.

ParameterDescription
Custom SettingsEnter option values separated by commas, using the format display_name : value for each option (e.g., Production : prod, Staging : stage, Development : dev), or simply list values directly if display name equals value.

Textbox

Textbox Variables are variables that allow users to enter text directly, commonly used for specifying specific values or parameters that do not require dynamic data queries.

ParameterDescription
Textbox ValueThe default value of the textbox variable.
  1. Click OK to add one or more variables.

#Add Panels

Add multiple panels to the currently created monitoring dashboard to display data information for different resources.

Tip: You can customize the size of a panel by clicking the lower right corner; click anywhere on the panel to rearrange the order of the panels.

  1. Click Add Panel, reference the following instructions to configure relevant parameters.
  • Panel Preview: The area will dynamically display the data information corresponding to the added metrics.

  • Add Metric: Configure the panel title and monitoring metrics in this area.

  • Adding Method: Supports using built-in metrics or using natively customized metrics. Both methods will take the union and be effective simultaneously.

    • Built-in Metrics: Select commonly used metrics and legend parameters built into the platform to display the data information under the current panel.
      • Note: All metrics added to the panel must have a unified unit; it is not possible to add metrics with multiple units to one panel.
    • Native: Customize the metric unit, metric expression, and legend parameters. The metric expression follows PromQL syntax; for details, please refer to PromQL Official Documentation.
  • Legend Parameters: Control the names corresponding to the curves in the panels. Text or templates can be used:

    • Rule: The input value must be in the format {{.xxxx}}; for example, {{.hostname}} will replace it with the value corresponding to the hostname label returned by the expression.
    • Tip: If you input an incorrectly formatted legend parameter, the names corresponding to the curves in the panel will be displayed in their original format.
  • Instant Switch: When the Instant switch is turned on, it will query instant values through Prometheus's Query interface and sort them, as in statistical charts and gauge charts. If off, it will use the query_range method to calculate, querying a series of data over a specific time period.

  • Panel Settings: Supports selecting different panel types for visualizing metric data. Please refer to Manage Panels.

  1. Click Save to complete adding the panels.

  2. You can add one or more panels within the dashboard.

  3. After adding the panels, you can use the following operations to ensure the display and size of the panels meet your expectations.

    • Click the lower right corner of the panel to customize its size.
    • Click anywhere on the panel to rearrange the order of the panels.
    • Click the Edit button to modify the panel settings.
    • Click the Delete button to delete the panel.
    • Click the Copy button to copy the panel.
  4. After adjusting, click the Save button on the dashboard page to save your modifications.

#Add Groups

Groups are logical dividers within the dashboard that can group panels together.

  1. Click the Add Panel drop-down menu > Add Group, and reference the following instructions to configure relevant parameters.
  • Group: The name of the group.
  • Repeat: Supports disabling repeats or selecting variables for the current panels.
    • Disable Repeat: Do not select a variable, and use the default created group.
    • Parameter Variables: Select the variables created in the current panels, and the monitoring dashboard will generate a row of identical sub-groups for each corresponding value of the variable. Sub-groups do not support modifications, deletions, or moving of the panels.
  1. After adding the group, you can perform the following operations on the group to manage the panel display within the dashboard.
    • Groups can be collapsed or expanded to hide part of the content in the dashboard. Panels within collapsed groups will not send queries.
    • Move the panel into the group to allow that panel to be managed by that group. The group will manage all panels between it and the next group.
    • When a group is folded, you can also move all panels managed by that group together.
    • The folding and unfolding of groups also constitutes an adjustment to the dashboard. If you want to maintain this state when reopening this dashboard next time, please click the Save button.

#Switch Dashboards

Set the created custom monitoring dashboard as the main dashboard. When re-entering the monitoring dashboard feature, the main dashboard data will be displayed by default.

  1. In the left navigation bar, click Operations Center > Monitoring > Monitoring Dashboards.

  2. By default, the main monitoring dashboard is entered. Click Switch Dashboard.

  3. You can find dashboards by filtering through labels or searching by name, and switch main dashboards via the Main Dashboard switch.

#Other Operations

You can click the operation button on the right side of the dashboard page to perform actions on the dashboard as needed.

OperationDescription
YMALOpens the actual CR resource code of the dashboard stored in the Kubernetes cluster. You can modify all content in the dashboard by editing parameters in the YAML.
Export ExpressionYou can export the metrics and corresponding query expressions used in the current dashboard in CSV format.
CopyCopies the current dashboard; you can edit the panels as needed and save it as a new dashboard.
SettingsModifies the basic information of the current dashboard, such as changing labels and adding more variables.
DeleteDeletes the current monitoring dashboard.

#Manage Panels

The platform provides various visualization methods to support different use cases. This chapter will mainly introduce these panel types, configuration options, and usage methods.

#Panel Description

No.Panel NameDescriptionSuggested Use Cases
1Trend ChartDisplays the trend of data over time via one or more line segments.Shows trends over time, such as changes in CPU utilization, memory usage, etc.
2Step Line ChartBuilds on the line chart by connecting data points with horizontal and vertical segments to form a step-like structure.Suitable for displaying the timestamps of discrete events, such as the number of alerts.
3Bar ChartUses vertical rectangular bars to represent the magnitude of data, where the height of the bars represents value.Bar charts are intuitive for comparing value differences, beneficial for discovering patterns and anomalies, suitable for scenarios focusing on value changes, such as the number of pods, number of nodes, etc.
4Horizontal Bar ChartSimilar to the bar chart but uses horizontal rectangular bars to represent data.When there are many data dimensions, horizontal bar charts can better utilize spatial layout and improve readability.
5Gauge ChartUses half or ring shapes to represent the current value of an indicator and its proportion of the total.Intuitively reflects the current status of key monitoring indicators, such as system CPU utilization and memory usage. It is recommended to use alert thresholds with color changes to indicate abnormal conditions.
6Gauge Bar ChartUses vertical rectangular bars to display the current value of indicators and their proportion.Intuitively reflects the current status of key indicators, such as target completion progress and system load. When multiple categories of the same indicator exist, the gauge bar chart is more recommended, such as available disk space or utilization.
7Pie ChartUses sectors to display the proportional relationship of parts to the whole.Suitable for demonstrating the composition of overall data across different dimensions, such as the proportions of 4XX, 3XX, and 2XX response codes over a period.
8TableOrganizes data in a row-column format, making it easy to view and compare specific values.Suitable for displaying structured multi-dimensional data, such as detailed information of nodes, detailed information of pods, etc.
9Stat ChartDisplays the current value of a single key indicator, typically requiring textual explanation.Suitable for showing real-time values of important monitoring indicators, such as numbers of pods, number of nodes, current alert count, etc.
10Scatter PlotUses Cartesian coordinates to plot a series of data points, reflecting the correlation between two variables.Suitable for analyzing relationships between two indicators, discovering patterns such as linear correlation and clustering through the distribution of data points, helping unearth relationships between metrics.
11Text CardDisplays key textual information in a card format, usually containing a title and a brief description.Suitable for presenting textual information, such as panel descriptions and troubleshooting explanations.

#Panel Configuration Description

#General Parameters

ParameterDescription
Basic InformationSelect the appropriate panel type based on the selected metric data and add titles and descriptions; you can add one or more links, which can be quickly accessed by selecting the corresponding link name next to the title.
Standard SettingsUnits used for native metric data. Additionally, gauge charts and gauge bars also support configuring the Total Value field, which will display as the percentage of Current Value/Total Value in the chart.
TooltipsTooltips are the display switch for real-time data when hovering over the panels and support selected sorting.
Threshold ParametersConfigure the threshold switch for the panels; when enabled, the threshold will be shown in selected colors in the panels, allowing for threshold sizing.
ValueSet the calculation method for values, such as the most recent value or minimal value. This configuration option is only applicable to stat charts and gauge charts.
Value MappingRedefine specified values, ranges,regex or special such as defining 100 as full load. This configuration option is only applicable to stat charts, tables, and gauge charts.

#Special Parameters for Panels

Panel TypeParameterDescription
Trend ChartGraph StyleYou can choose between a line chart or an area chart as the display style; line charts focus more on reflecting the trend changes of indicators, while area charts draw more attention to changes in total and partial proportions. Choose based on your actual needs.
Gauge ChartGauge Chart Settings
Show Direction: When you need to view multiple metrics in a single chart, you can set whether these metrics are arranged horizontally or vertically.
Unit Redefinition: You can set independent units for each metric; if not set, the platform will display units from the Standard Settings.
Stat ChartStat Chart Settings
Show Direction: When you need to view multiple metrics in a single chart, you can set whether these metrics are arranged horizontally or vertically.
Graph Mode: You can add a graph to the stat chart to display the trend of the metric over time.
Pie ChartPie Chart Settings
Maximum Number of Slices: You can set this parameter to reduce the number of slices in the pie chart to lessen the interference of categories with comparatively low proportions but high quantities. Excess slices will be merged and displayed as Others.
Label Display Fields: You can set the fields displayed in the pie chart labels.
Pie ChartGraph StyleYou can choose either pie or donut as the display style.
TableTable Settings
Hide Columns: You can reduce the number of columns in the table with this parameter to focus on some primary column information.
Column Alignment: You can modify the alignment of data within the column using this parameter.
Display Name and Unit: You can modify the column names and units used through this parameter.
Text CardGraph Style
Style: You can choose to edit the content you wish to display in the text card in either a rich-text editing box or HTML.

#Create Monitoring Dashboards via CLI

  1. Create a new YAML configuration file named example-dashboard.yaml.

  2. Add the MonitorDashboard resource to the YAML file and submit it. The following example creates a monitoring dashboard named demo-v2-dashboard1:

    kind: MonitorDashboard
    apiVersion: ait.alauda.io/v1alpha2
    metadata:
      annotations:
        cpaas.io/dashboard.version: '3'
        cpaas.io/description: '{"zh":"描述信息","en":""}' # Description field
        cpaas.io/operator: admin
      labels:
        cpaas.io/dashboard.folder: demo-v2-folder1 # Folder
        cpaas.io/dashboard.is.home.dashboard: 'False' # Is it the main dashboard?
      name: demo-v2-dashboard1 # Name
      namespace: cpaas-system # Namespace (all management view creations will occur in this ns)
    spec:
      body: # All information fields
        titleZh: 更新显示名称 # Built-in field for Chinese display name (this field is created under the Chinese language)
        title: english_display_name # Built-in field for English display name (this field is created under the English language) Built-in dashboards can set bilingual translations.
        templating: # Custom variables
          list:
            - hide: 0 # 0 means not hidden; 1 means only the label is hidden; 2 means both label and value are hidden
              label: 集群 # Built-in variable display name (label is set to the appropriate name based on the language, e.g., cluster in English)
              name: cluster # Built-in variable name (unique)
              options: # Define dropdown options; if a query retrieves data, it will use requested data; otherwise, it will use options. A default value can be set (generally only used for setting default values)
                - selected: false # Whether to default select
                  text: global
                  value: global
              type: custom # Custom variable type; currently, only built-in (custom) and query are supported (Importing Grafana will support constant custom interval (after import, it will be changed to a custom variable and will not support auto))
    
            - allValue: '' # Select all, passing options with the format xxx|xxx|xxx; can set allValue for conversion (Grafana retrieves all data for the current variable as xxx|xxx|xxx, adjustments will ensure consistency)
              current: null # Current value of the variable; if not set, defaults to the first in the list
              definition: query_result(kube_namespace_labels) # Query expression for data retrieval
              hide: 0 # 0 means not hidden; 1 means only the label is hidden; 2 means both label and value are hidden
              includeAll: true # Whether to select all
              label: ns # Built-in variable display name
              multi: true # Whether multiple selections are allowed
              name: ns # Variable name (unique)
              options: []
              query: ''
              regex: /.*namespace=\"(.*?)\".*/ # Regex expression for extracting variable values
              sort: 2 # Sorting: 1 - ascending alphabetical order; 2 - descending alphabetical order (only these two support temporarily); 3 - ascending numerical order; 4 - descending numerical order
              type: query # Custom variable type
        time: # Dashboard time
          from: now-30m # Start time
          to: now # End time
        repeat: '' # Row repeat configuration; chooses custom variable
        collapsed: 'false' # Row collapsed or expanded configuration
        description: '123' # Description (tooltip after title)
        targets: # Data sources
          - indicator: cluster.node.ready # Metric
            expr: sum (cpaas_pod_number{cluster=\"\"}>0) # PromQL expression
            instant: false # Query mode true retrieves data at a specific time
            legendFormat: '' # Legend
            range: true # Default querying range when retrieving data
            refId: 指标1 # Unique identifier for display name of data source
        gridPos: # Information on the dashboard's positional layout
          h: 8 # Height
          w: 12 # Width (width corresponds to 24 grid units)
          x: 0 # Horizontal position
          y: 0 # Vertical position
        panels: # Panel data
          title: 图表标题tab # Panel name
          type: table # Panel type; currently supports timeseries, barchart, stat, gauge, table, bargauge, row, text, pie (step chart, scatter plot, bar chart, configurable through drawStyle attribute)
          id: a2239830-492f-4d27-98f3-cb7ecb77c56f # Unique identifier
          links: # Links
            - targetBlank: true # Open in a new tab
              title: '1' # Name
              url: '1' # URL address
          transformations: # Data transformations
            - id: 'organize' # Type organize; used for sorting, rearranging order, showing fields, whether to display
              options:
                excludeByName: # Hidden fields
                  cluster_cpu_utilization: true
                indexByName: # Sort
                  cluster_cpu_utilization: 0,
                  Time: 1
                renameByName: # Rename
                  Time: ''
                  cluster_cpu_utilization: '222'
            - id: 'merge' # Merging data
              options:
          fieldConfig: # For defining panel properties and appearance
            defaults: # Default configuration
              custom: # Custom graphic attributes
                align: 'left' # Table alignment: left, center, right
                cellOptions: # Table threshold configuration
                  type: color-text # Only supports text for threshold color settings
                spanNulls: false # true connects null values; false does not connect; number == 0 connects null values according to 0
                drawStyle: line # Panel types: line, bars for bar charts, points for point charts
                fillOpacity: 20 # Exists when drawStyle is area (currently does not support configuration, area defaults to 20)
                thresholdsStyle: # Configures how to display thresholds (currently only supports line)
                  mode: line # Threshold display format (area not supported currently)
                lineInterpolation: 'stepBefore' # Step chart configuration; defaults to only supporting stepBefore (stepAfter will be supported later)
              decimals: 3 # Decimal points
              min: 0 # Minimum value (currently not supported for page configuration, only supports imports that have been adapted)
              max: 1 # Maximum value (page configuration only applies to stat gauge barGauge pie)
              unit: '%' # Unit
              mappings: # Value mapping configuration (currently only supports value and range types; special types supported on data)
                - options: # Value mapping rules
                    '1': # Corresponding value
                      index: 0
                      text: 'Running' # Displayed as Running when value is 1
                  type: value # Value mapping type
                - options: # Range mapping rules
                    from: 2 # Range start value
                    to: 3 # Range end value
                    result: # Mapping result
                      index: 1
                      text: 'Error' # Values from 2 to 3 will display as Error
                  type: range # Mapping type for range
                - type: special # Mapping type for special scenarios
                  options:
                    match: null # nan null null+nan empty true false
                    result:
                      text: xxx
                      index: 2
              thresholds: # Threshold configuration
                mode: absolute # Threshold configuration mode, absolute value mode (currently only supports absolute and percentage mode; percentage mode is not supported yet)
                steps: # Threshold steps
                  - color: '#a7772f' # Threshold color
                    value: '2' # Threshold value
                  - color: '#007AF5' # Default value with no value is the Base
            overrides: # Override configuration
              - matcher:
                  id: byName # Match based on field name
                  options: node # Corresponding name
                properties: # Override configuration; id currently only supports displayName unit
                  - id: displayName # Display name override
                    value: '1' # Overridden display name
                  - id: unit # Unit override
                    value: GB/s # Unit value
                  - id: noValue # No value display
                    value: No value display
          options:
            orientation: horizontal # Control the layout direction of panels; applies to gauge and barGauge (stat will be supported later)
            legend: # Legend configuration
              calcs: # Calculating methods (only displays when the legend position is on the right)
                - latest # Currently only supports most recent value
              placement: right # Legend position (right or bottom; defaults to bottom)
              placementRightTop: '' # Configuration for the upper right
              showLegend: true # Whether to display the legend
            tooltip: # Tooltips
              mode: multi # Mode dual selection (only multi-mode supported) All data displayed when the mouse hovers over
              sort: asc # Sorting: asc or desc
            reduceOptions: # Value calculating method (used for aggregating data)
              calcs: # Calculating methods (latest, minimum, maximum, average, sum)
                - latest
              limit: 3 # Pie limits the number of slices
            textMode: 'value' # Stat configuration; defines style for displaying metric value; options are auto, value, value_and_name, name, none (currently not supported in the page configuration, but supported in imports)
            colorMode: 'value' # Stat configuration; defines color mode for displaying metric values; options are none, value, background (defaults to value; not supported in configuration but adapted in import)
            displayLabels: ['name', 'value', 'percent'] # Fields displayed in pie chart labels
            pieType: 'pie' # Pie chart type; options are pie and donut
            mode: 'html' # Text chart type mode; options are html and richText
            content: '<div>xxx</div>' # Content for text chart type
            footer:
              enablePagination: true # Table pagination enabled

#Common Functions and Variables

#Common Functions

When defining query settings, besides using PromQL to set queries, the platform provides some common functions as follows for your reference in customizing query settings.

FunctionPurpose
label_names()Returns all labels in Prometheus, e.g., label_names().
label_values(label)Returns all selectable values for the label name in all monitored metrics in Prometheus, e.g., label_values(job).
label_values(metric, label)Returns all selectable values for the label name in the specified metric in Prometheus, e.g., label_values(up, job).
metrics(metric)Returns all metric names that satisfy the defined regex pattern in the metric field, e.g., metrics(cpaas_active).
query_result(query)Returns the query result for the specified Prometheus query, e.g., query_result(up).

#Common Variables

While defining query settings, you can combine common functions into variables to quickly define custom variables. Here are some common variable definitions available for your reference:

Variable NameQuery FunctionRegular Expression
clusterlabel_values(cpaas_cluster_info,cluster)-
nodelabel_values(node_load1, instance)/(.*?):.*/
namespacequery_result(kube_namespace_labels)/.*namespace=\"(.*?)\".*/
deploymentlabel_values(kube_deployment_spec_replicas{namespace="$namespace"}, deployment)-
daemonsetlabel_values(kube_daemonset_status_number_ready{namespace="$namespace"}, daemonset)-
statefulsetlabel_values(kube_statefulset_replicas{namespace="$namespace"}, statefulset)-
podlabel_values(kube_pod_info{namespace=~"$namespace"}, pod)-
vmclusterlabel_values(up, vmcluster)-
daemonsetlabel_values(kube_daemonset_status_number_ready{namespace="$namespace"}, daemonset)-

#Variable Use Case One

Using the query_result(query) function to query the value: node_load5, and extract the IP.

  1. In Query Settings, fill in query_result(node_load5).

  2. In the Variable Value Preview area, the preview example is node_load5{container="node-exporter",endpoint="metrics",host_ip="192.168.178.182",instance="192.168.178.182:9100"}.

  3. In Regular Expression, fill in /.*instance="(.*?):.*/ to filter the value.

  4. In the Variable Value Preview area, the preview example is 192.168.176.163.

#Variable Use Case Two

  1. Add the first variable: namespace, using the query_result(query) function to query the value: kube_namespace_labels, and extract the namespace.

    • Query Settings: query_result(kube_namespace_labels).

    • Variable Value Preview: kube_namespace_labels{container="exporter-kube-state", endpoint="kube-state-metrics", instance="12.3.188.121:8080", job="kube-state", label_cpaas_io_project="cpaas-system", namespace="cert-manager", pod="kube-prometheus-exporter-kube-state-55bb6bc67f-lpgtx", project="cpaas-system", service="kube-prometheus-exporter-kube-state"}.

    • Regular Expression: /.+namespace=\"(.*?)\".*/.

    • In the Variable Value Preview area, the preview example includes multiple namespaces such as argocd, cpaas-system, and more.

  2. Add the second variable: deployment, and reference the variable created earlier:

    • Query Settings: kube_deployment_spec_replicas{namespace=~"$namespace"}.

    • Regular Expression: /.+deployment="(.*?)",.*/.

  3. Add a panel to the current dashboard and reference the previously added variables, for example:

    • Metric Name: pod Memory Usage under Compute Components.

    • Key-Value Pair: kind: Deployment, name: $deployment, namespace: $namespace.

  4. Once you have added the panels and saved them, you can view the corresponding panel information on the dashboard homepage.

#Notes When Using Built-in Metrics

WARNING

The following metrics use custom variables namespace, name, and kind, which do not support multiple selections or selecting all.

  • namespace only supports selecting a specific namespace;
  • name only supports three types of computing components: deployment, daemonset, statefulset;
  • kind only supports specifying one of the types: Deployment, DaemonSet, StatefulSet.
  • workload.cpu.utilization

  • workload.memory.utilization

  • workload.network.receive.bytes.rate

  • workload.network.transmit.bytes.rate

  • workload.gpu.utilization

  • workload.gpu.memory.utilization

  • workload.vgpu.utilization

  • workload.vgpu.memory.utilization