English

Management of Metrics

The platform's monitoring system is based on the metrics collected by Prometheus / VictoriaMetrics. This document will guide you on how to manage these metrics.

Viewing Metrics Exposed by Platform Components

The monitoring method for the cluster components within the platform is to extract metrics exposed via ServiceMonitor. Metrics in the platform are publicly available through the /metrics endpoint. You can view the exposed metrics of a specific component in the platform using the following example command:

curl -s http://<Component IP>:<Component metrics port>/metrics | grep 'TYPE\|HELP'

Sample Output:

# HELP controller_runtime_active_workers Number of currently used workers per controller
# TYPE controller_runtime_active_workers gauge
# HELP controller_runtime_max_concurrent_reconciles Maximum number of concurrent reconciles per controller
# TYPE controller_runtime_max_concurrent_reconciles gauge
# HELP controller_runtime_reconcile_errors_total Total number of reconciliation errors per controller
# TYPE controller_runtime_reconcile_errors_total counter
# HELP controller_runtime_reconcile_time_seconds Length of time per reconciliation per controller

Viewing All Metrics Stored by Prometheus / VictoriaMetrics

You can view the list of available metrics in the cluster to help you write the PromQL you need based on these metrics.

Prerequisites

You have obtained your user Token
You have obtained the platform address

Procedures

Run the following command to get the list of metrics using the curl command:

curl -k -X 'GET' -H 'Authorization: Bearer <Your token>' 'https://<Your platform access address>/v2/metrics/<Your cluster name>/prometheus/label/__name__/values'

Sample Output:

{
  "status": "success",
  "data": [
    "ALERTS",
  "ALERTS_FOR_STATE",
  "advanced_search_cached_resources_count",
  "alb_error",
  "alertmanager_alerts",
  "alertmanager_alerts_invalid_total",
  "alertmanager_alerts_received_total",
  "alertmanager_cluster_enabled"]
}

Viewing All Built-in Metrics Defined by the Platform

To simplify user usage, the platform has built in a large number of commonly used metrics. You can directly use these metrics when configuring alerts or monitoring dashboards without needing to define them yourself. The following will introduce you to how to view these metrics.

Prerequisites

You have obtained your user Token
You have obtained the platform address

Procedures

Run the following command to get the list of metrics using the curl command:

curl -k -X 'GET' -H 'Authorization: Bearer <Your token>' 'https://<Your platform access address>/v2/metrics/<Your cluster name>/indicators'

Sample Output:

[
  {
  "alertEnabled": true, 
  "annotations": {
   "cn": "CPU utilization of containers in the compute component",
   "descriptionEN": "Cpu utilization for pods in workload",
   "descriptionZH": "CPU utilization of containers in the compute component",
   "displayNameEN": "CPU utilization of the pods",
   "displayNameZH": "CPU utilization of containers in the compute component",
   "en": "Cpu utilization for pods in workload",
   "features": "SupportDashboard", 
   "summaryEN": "CPU usage rate {{.externalLabels.comparison}}{{.externalLabels.threshold}} of Pod ({{.labels.pod}})",
   "summaryZH": "CPU usage rate {{.externalLabels.comparison}}{{.externalLabels.threshold}} of pod ({{.labels.pod}})"
  },
  "displayName": "CPU utilization of containers in the compute component",
  "kind": "workload",
  "multipleEnabled": true,  
  "name": "workload.pod.cpu.utilization",
  "query": "avg by (kind,name,namespace,pod) (avg by (kind,name,namespace,pod,container)(cpaas_advanced_container_cpu_usage_seconds_total_irate5m{kind=~\"{{.kind}}\",name=~\"{{.name}}\",namespace=~\"{{.namespace}}\",container!=\"\",container!=\"POD\"}) / avg by (kind,name,namespace,pod,container)(cpaas_advanced_kube_pod_container_resource_limits{kind=~\"{{.kind}}\",name=~\"{{.name}}\",namespace=~\"{{.namespace}}\",resource=\"cpu\"}))", 
  "summary": "CPU usage rate {{.externalLabels.comparison}}{{.externalLabels.threshold}} of pod ({{.labels.pod}})",
  "type": "metric",
  "unit": "%",
  "legend": "{{.namespace}}/{{.pod}}",
  "variables": [ 
   "namespace",
   "name",
   "kind"
  ]
 }
]

Whether this metric supports being used for configuring alerts
Whether this metric supports being used in monitoring dashboards
Whether this metric supports being used when configuring alerts for multiple resources
The PromQL statement defined for the metric
The variables that can be used in the PromQL statement of the metric

Integrating External Metrics

In addition to the built-in metrics of the platform, you can also integrate metrics exposed by your applications or third-party applications via ServiceMonitor or PodMonitor. This section uses the Elasticsearch Exporter installed in pod form in the same cluster as an example for explanation.

Prerequisites

You have installed your application and exposed metrics through specified interfaces. In this document, we assume your application is installed in the cpaas-system namespace and has exposed the http://<elasticsearch-exporter-ip>:9200/_prometheus/metrics endpoint.

Procedures

Create a Service/Endpoint for the Exporter to expose metrics

apiVersion: v1
kind: Service
metadata:
  labels:
    chart: elasticsearch
    service_name: cpaas-elasticsearch
  name: cpaas-elasticsearch
  namespace: cpaas-system
spec:
  clusterIP: 10.105.125.99
  ports:
  - name: cpaas-elasticsearch
    port: 9200
    protocol: TCP
    targetPort: 9200
  selector:
    service_name: cpaas-elasticsearch
  sessionAffinity: None
  type: ClusterIP

Create a ServiceMonitor object to describe the metrics exposed by your application:

apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  labels:
    app: cpaas-monitor
    chart: cpaas-monitor
    heritage: Helm
    prometheus: kube-prometheus
    release: cpaas-monitor
  name: cpaas-elasticsearch-Exporter
  namespace: cpaas-system
spec:
  jobLabel: service_name
  namespaceSelector:
    any: true
  selector:
    matchExpressions:
    - key: service_name
      operator: Exists
  endpoints:
  - port: cpaas-elasticsearch
    path: /_prometheus/metrics
    interval: 60s
    honorLabels: true
    basicAuth:
      password:
        key: ES_PASSWORD
        name: acp-config-secret
      username:
        key: ES_USER
        name: acp-config-secret

To which Prometheus should the ServiceMonitor be synchronized; the operator will listen to the corresponding ServiceMonitor resource based on the serviceMonitorSelector configuration of the Prometheus CR. If the ServiceMonitor’s labels do not match the serviceMonitorSelector configuration of the Prometheus CR, this ServiceMonitor will not be monitored by the operator.
The operator will listen to which namespaces of ServiceMonitor based on the serviceMonitorNamespaceSelector configuration of the Prometheus CR; if the ServiceMonitor is not in the serviceMonitorNamespaceSelector of the Prometheus CR, this ServiceMonitor will not be monitored by the operator.
Metrics collected by Prometheus will add a job label, with the value being the service label value corresponding to jobLabel.
The ServiceMonitor matches the corresponding Service based on the namespaceSelector configuration.
The ServiceMonitor matches the Service based on the selector configuration.
The ServiceMonitor matches the Service’s port based on port configuration.
The access path to the Exporter, default is /metrics.
The interval at which Prometheus scrapes the Exporter metrics.
If authentication is required to access the Exporter path, authentication information needs to be added; it also supports bearer token, tls authentication, and other methods.

Check if the ServiceMonitor is being monitored by Prometheus

Access the UI of the monitoring component to check if the job cpaas-elasticsearch-exporter exists.

Prometheus UI address: https://<Your platform access address>/clusters/<Cluster name>/prometheus-0/targets
VictoriaMetrics UI address: https://<Your platform access address>/clusters/<Cluster name>/vmselect/vmui/?#/metrics

How to

Architecture

Concepts

Guides

How To

Trouble Shooting

Concepts

Guides

How To

Troubleshooting

Install

Concepts

Guides

How To

Disaster Recovery

Concepts

Guides

How To

Guides

Compliance

Install

API Refiner

User

Guides

Group

Guides

Role

Guides

IDP

Guides

Troubleshooting

User Policy

Guides

Overview

Images

Guides

How To

Virtual Machine

Guides

How To

Troubleshooting

Network

Guides

How To

Storage

Guides

Backup and Recovery

Guides

Concepts

Concepts

Guides

Namespaces

Pre-Application-Creation Preparation

Creating Applications

Post-Application-Creation Configuration

Operation and Maintenance

Application Observability

Workloads

Pod

Container

How To

Install

How To

Install

Guides

How To

Concepts

Guides

Argo CD Concept

Alauda Container Platform GitOps Concepts

Creating GitOps Application

GitOps Observability

Architecture

Guides

How To

Guides

How To

Troubleshooting

Architecture

Guides