Management of Metrics

The platform's monitoring system is based on the metrics collected by Prometheus / VictoriaMetrics. This document will guide you on how to manage these metrics.

Viewing Metrics Exposed by Platform Components

The monitoring method for the cluster components within the platform is to extract metrics exposed via ServiceMonitor. Metrics in the platform are publicly available through the /metrics endpoint. You can view the exposed metrics of a specific component in the platform using the following example command:

curl -s http://<Component IP>:<Component metrics port>/metrics | grep 'TYPE\|HELP'

Sample Output:

# HELP controller_runtime_active_workers Number of currently used workers per controller
# TYPE controller_runtime_active_workers gauge
# HELP controller_runtime_max_concurrent_reconciles Maximum number of concurrent reconciles per controller
# TYPE controller_runtime_max_concurrent_reconciles gauge
# HELP controller_runtime_reconcile_errors_total Total number of reconciliation errors per controller
# TYPE controller_runtime_reconcile_errors_total counter
# HELP controller_runtime_reconcile_time_seconds Length of time per reconciliation per controller

Viewing All Metrics Stored by Prometheus / VictoriaMetrics

You can view the list of available metrics in the cluster to help you write the PromQL you need based on these metrics.

Prerequisites

You have obtained your user Token
You have obtained the platform address

Procedures

Run the following command to get the list of metrics using the curl command:

curl -k -X 'GET' -H 'Authorization: Bearer <Your token>' 'https://<Your platform access address>/v2/metrics/<Your cluster name>/prometheus/label/__name__/values'

Sample Output:

{
  "status": "success",
  "data": [
    "ALERTS",
  "ALERTS_FOR_STATE",
  "advanced_search_cached_resources_count",
  "alb_error",
  "alertmanager_alerts",
  "alertmanager_alerts_invalid_total",
  "alertmanager_alerts_received_total",
  "alertmanager_cluster_enabled"]
}

Viewing All Built-in Metrics Defined by the Platform

To simplify user usage, the platform has built in a large number of commonly used metrics. You can directly use these metrics when configuring alerts or monitoring dashboards without needing to define them yourself. The following will introduce you to how to view these metrics.

Prerequisites

You have obtained your user Token
You have obtained the platform address

Procedures

Run the following command to get the list of metrics using the curl command:

curl -k -X 'GET' -H 'Authorization: Bearer <Your token>' 'https://<Your platform access address>/v2/metrics/<Your cluster name>/indicators'

Sample Output:

[
  {
  "alertEnabled": true, 
  "annotations": {
   "cn": "CPU utilization of containers in the compute component",
   "descriptionEN": "Cpu utilization for pods in workload",
   "descriptionZH": "CPU utilization of containers in the compute component",
   "displayNameEN": "CPU utilization of the pods",
   "displayNameZH": "CPU utilization of containers in the compute component",
   "en": "Cpu utilization for pods in workload",
   "features": "SupportDashboard", 
   "summaryEN": "CPU usage rate {{.externalLabels.comparison}}{{.externalLabels.threshold}} of Pod ({{.labels.pod}})",
   "summaryZH": "CPU usage rate {{.externalLabels.comparison}}{{.externalLabels.threshold}} of pod ({{.labels.pod}})"
  },
  "displayName": "CPU utilization of containers in the compute component",
  "kind": "workload",
  "multipleEnabled": true,  
  "name": "workload.pod.cpu.utilization",
  "query": "avg by (kind,name,namespace,pod) (avg by (kind,name,namespace,pod,container)(cpaas_advanced_container_cpu_usage_seconds_total_irate5m{kind=~\"{{.kind}}\",name=~\"{{.name}}\",namespace=~\"{{.namespace}}\",container!=\"\",container!=\"POD\"}) / avg by (kind,name,namespace,pod,container)(cpaas_advanced_kube_pod_container_resource_limits{kind=~\"{{.kind}}\",name=~\"{{.name}}\",namespace=~\"{{.namespace}}\",resource=\"cpu\"}))", 
  "summary": "CPU usage rate {{.externalLabels.comparison}}{{.externalLabels.threshold}} of pod ({{.labels.pod}})",
  "type": "metric",
  "unit": "%",
  "legend": "{{.namespace}}/{{.pod}}",
  "variables": [ 
   "namespace",
   "name",
   "kind"
  ]
 }
]

Whether this metric supports being used for configuring alerts
Whether this metric supports being used in monitoring dashboards
Whether this metric supports being used when configuring alerts for multiple resources
The PromQL statement defined for the metric
The variables that can be used in the PromQL statement of the metric

Integrating External Metrics

In addition to the built-in metrics of the platform, you can also integrate metrics exposed by your applications or third-party applications via ServiceMonitor or PodMonitor. This section uses the Elasticsearch Exporter installed in pod form in the same cluster as an example for explanation.

Prerequisites

You have installed your application and exposed metrics through specified interfaces. In this document, we assume your application is installed in the cpaas-system namespace and has exposed the http://<elasticsearch-exporter-ip>:9200/_prometheus/metrics endpoint.

Procedures

Create a Service/Endpoint for the Exporter to expose metrics

apiVersion: v1
kind: Service
metadata:
  labels:
    chart: elasticsearch
    service_name: cpaas-elasticsearch
  name: cpaas-elasticsearch
  namespace: cpaas-system
spec:
  clusterIP: 10.105.125.99
  ports:
  - name: cpaas-elasticsearch
    port: 9200
    protocol: TCP
    targetPort: 9200
  selector:
    service_name: cpaas-elasticsearch
  sessionAffinity: None
  type: ClusterIP

Create a ServiceMonitor object to describe the metrics exposed by your application:
apiVersion: monitoring.coreos.com/v1 kind: ServiceMonitor metadata: labels: app: cpaas-monitor chart: cpaas-monitor heritage: Helm prometheus: kube-prometheus release: cpaas-monitor name: cpaas-elasticsearch-Exporter namespace: cpaas-system spec: jobLabel: service_name namespaceSelector: any: true selector: matchExpressions: - key: service_name operator: Exists endpoints: - port: cpaas-elasticsearch path: /_prometheus/metrics interval: 60s honorLabels: true basicAuth: password: key: ES_PASSWORD name: acp-config-secret username: key: ES_USER name: acp-config-secret
1. To which Prometheus should the ServiceMonitor be synchronized; the operator will listen to the corresponding ServiceMonitor resource based on the serviceMonitorSelector configuration of the Prometheus CR. If the ServiceMonitor's labels do not match the serviceMonitorSelector configuration of the Prometheus CR, this ServiceMonitor will not be monitored by the operator.
2. The operator will listen to which namespaces of ServiceMonitor based on the serviceMonitorNamespaceSelector configuration of the Prometheus CR; if the ServiceMonitor is not in the serviceMonitorNamespaceSelector of the Prometheus CR, this ServiceMonitor will not be monitored by the operator.
3. Metrics collected by Prometheus will add a job label, with the value being the service label value corresponding to jobLabel.
4. The ServiceMonitor matches the corresponding Service based on the namespaceSelector configuration.
5. The ServiceMonitor matches the Service based on the selector configuration.
6. The ServiceMonitor matches the Service's port based on port configuration.
7. The access path to the Exporter, default is /metrics.
8. The interval at which Prometheus scrapes the Exporter metrics.
9. If authentication is required to access the Exporter path, authentication information needs to be added; it also supports bearer token, tls authentication, and other methods.
Check if the ServiceMonitor is being monitored by Prometheus

Access the UI of the monitoring component to check if the job cpaas-elasticsearch-exporter exists.
- Prometheus UI address: https://<Your platform access address>/clusters/<Cluster name>/prometheus-0/targets
- VictoriaMetrics UI address: https://<Your platform access address>/clusters/<Cluster name>/vmselect/vmui/?#/metrics

#Management of Metrics

#TOC

#Viewing Metrics Exposed by Platform Components

#Viewing All Metrics Stored by Prometheus / VictoriaMetrics

#Prerequisites

#Procedures

#Viewing All Built-in Metrics Defined by the Platform

#Prerequisites

#Procedures

#Integrating External Metrics

#Prerequisites

#Procedures

Management of Metrics

TOC

Viewing Metrics Exposed by Platform Components

Viewing All Metrics Stored by Prometheus / VictoriaMetrics

Prerequisites

Procedures

Viewing All Built-in Metrics Defined by the Platform

Prerequisites

Procedures

Integrating External Metrics

Prerequisites

Procedures