Management of Metrics

The platform's monitoring system is based on the metrics collected by Prometheus / VictoriaMetrics. This document will guide you on how to manage these metrics.

TOC

Viewing Metrics Exposed by Platform Components

The monitoring method for the cluster components within the platform is to extract metrics exposed via ServiceMonitor. Metrics in the platform are publicly available through the /metrics endpoint. You can view the exposed metrics of a specific component in the platform using the following example command:

curl -s http://<Component IP>:<Component metrics port>/metrics | grep 'TYPE\|HELP'

Sample Output:

# HELP controller_runtime_active_workers Number of currently used workers per controller
# TYPE controller_runtime_active_workers gauge
# HELP controller_runtime_max_concurrent_reconciles Maximum number of concurrent reconciles per controller
# TYPE controller_runtime_max_concurrent_reconciles gauge
# HELP controller_runtime_reconcile_errors_total Total number of reconciliation errors per controller
# TYPE controller_runtime_reconcile_errors_total counter
# HELP controller_runtime_reconcile_time_seconds Length of time per reconciliation per controller

Viewing All Metrics Stored by Prometheus / VictoriaMetrics

You can view the list of available metrics in the cluster to help you write the PromQL you need based on these metrics.

Prerequisites

  1. You have obtained your user Token

  2. You have obtained the platform address

Procedures

Run the following command to get the list of metrics using the curl command:

curl -k -X 'GET' -H 'Authorization: Bearer <Your token>' 'https://<Your platform access address>/v2/metrics/<Your cluster name>/prometheus/label/__name__/values'

Sample Output:

{
  "status": "success",
  "data": [
    "ALERTS",
  "ALERTS_FOR_STATE",
  "advanced_search_cached_resources_count",
  "alb_error",
  "alertmanager_alerts",
  "alertmanager_alerts_invalid_total",
  "alertmanager_alerts_received_total",
  "alertmanager_cluster_enabled"]
}

Viewing All Built-in Metrics Defined by the Platform

To simplify user usage, the platform has built in a large number of commonly used metrics. You can directly use these metrics when configuring alerts or monitoring dashboards without needing to define them yourself. The following will introduce you to how to view these metrics.

Prerequisites

  1. You have obtained your user Token

  2. You have obtained the platform address

Procedures

Run the following command to get the list of metrics using the curl command:

curl -k -X 'GET' -H 'Authorization: Bearer <Your token>' 'https://<Your platform access address>/v2/metrics/<Your cluster name>/indicators'

Sample Output:

[
  {
  "alertEnabled": true, 
  "annotations": {
   "cn": "CPU utilization of containers in the compute component",
   "descriptionEN": "Cpu utilization for pods in workload",
   "descriptionZH": "CPU utilization of containers in the compute component",
   "displayNameEN": "CPU utilization of the pods",
   "displayNameZH": "CPU utilization of containers in the compute component",
   "en": "Cpu utilization for pods in workload",
   "features": "SupportDashboard", 
   "summaryEN": "CPU usage rate {{.externalLabels.comparison}}{{.externalLabels.threshold}} of Pod ({{.labels.pod}})",
   "summaryZH": "CPU usage rate {{.externalLabels.comparison}}{{.externalLabels.threshold}} of pod ({{.labels.pod}})"
  },
  "displayName": "CPU utilization of containers in the compute component",
  "kind": "workload",
  "multipleEnabled": true,  
  "name": "workload.pod.cpu.utilization",
  "query": "avg by (kind,name,namespace,pod) (avg by (kind,name,namespace,pod,container)(cpaas_advanced_container_cpu_usage_seconds_total_irate5m{kind=~\"{{.kind}}\",name=~\"{{.name}}\",namespace=~\"{{.namespace}}\",container!=\"\",container!=\"POD\"}) / avg by (kind,name,namespace,pod,container)(cpaas_advanced_kube_pod_container_resource_limits{kind=~\"{{.kind}}\",name=~\"{{.name}}\",namespace=~\"{{.namespace}}\",resource=\"cpu\"}))", 
  "summary": "CPU usage rate {{.externalLabels.comparison}}{{.externalLabels.threshold}} of pod ({{.labels.pod}})",
  "type": "metric",
  "unit": "%",
  "legend": "{{.namespace}}/{{.pod}}",
  "variables": [ 
   "namespace",
   "name",
   "kind"
  ]
 }
]
  1. Whether this metric supports being used for configuring alerts
  2. Whether this metric supports being used in monitoring dashboards
  3. Whether this metric supports being used when configuring alerts for multiple resources
  4. The PromQL statement defined for the metric
  5. The variables that can be used in the PromQL statement of the metric

Integrating External Metrics

In addition to the built-in metrics of the platform, you can also integrate metrics exposed by your applications or third-party applications via ServiceMonitor or PodMonitor. This section uses the Elasticsearch Exporter installed in pod form in the same cluster as an example for explanation.

Prerequisites

You have installed your application and exposed metrics through specified interfaces. In this document, we assume your application is installed in the cpaas-system namespace and has exposed the http://<elasticsearch-exporter-ip>:9200/_prometheus/metrics endpoint.

Procedures

  1. Create a Service/Endpoint for the Exporter to expose metrics

    apiVersion: v1
    kind: Service
    metadata:
      labels:
        chart: elasticsearch
        service_name: cpaas-elasticsearch
      name: cpaas-elasticsearch
      namespace: cpaas-system
    spec:
      clusterIP: 10.105.125.99
      ports:
      - name: cpaas-elasticsearch
        port: 9200
        protocol: TCP
        targetPort: 9200
      selector:
        service_name: cpaas-elasticsearch
      sessionAffinity: None
      type: ClusterIP
  2. Create a ServiceMonitor object to describe the metrics exposed by your application:

    apiVersion: monitoring.coreos.com/v1
    kind: ServiceMonitor
    metadata:
      labels:
        app: cpaas-monitor
        chart: cpaas-monitor
        heritage: Helm
        prometheus: kube-prometheus
        release: cpaas-monitor
      name: cpaas-elasticsearch-Exporter
      namespace: cpaas-system
    spec:
      jobLabel: service_name
      namespaceSelector:
        any: true
      selector:
        matchExpressions:
          - key: service_name
            operator: Exists
      endpoints:
        - port: cpaas-elasticsearch
          path: /_prometheus/metrics
          interval: 60s
          honorLabels: true
          basicAuth:
            password:
              key: ES_PASSWORD
              name: acp-config-secret
            username:
              key: ES_USER
              name: acp-config-secret
    1. To which Prometheus should the ServiceMonitor be synchronized; the operator will listen to the corresponding ServiceMonitor resource based on the serviceMonitorSelector configuration of the Prometheus CR. If the ServiceMonitor's labels do not match the serviceMonitorSelector configuration of the Prometheus CR, this ServiceMonitor will not be monitored by the operator.
    2. The operator will listen to which namespaces of ServiceMonitor based on the serviceMonitorNamespaceSelector configuration of the Prometheus CR; if the ServiceMonitor is not in the serviceMonitorNamespaceSelector of the Prometheus CR, this ServiceMonitor will not be monitored by the operator.
    3. Metrics collected by Prometheus will add a job label, with the value being the service label value corresponding to jobLabel.
    4. The ServiceMonitor matches the corresponding Service based on the namespaceSelector configuration.
    5. The ServiceMonitor matches the Service based on the selector configuration.
    6. The ServiceMonitor matches the Service's port based on port configuration.
    7. The access path to the Exporter, default is /metrics.
    8. The interval at which Prometheus scrapes the Exporter metrics.
    9. If authentication is required to access the Exporter path, authentication information needs to be added; it also supports bearer token, tls authentication, and other methods.
  3. Check if the ServiceMonitor is being monitored by Prometheus

    Access the UI of the monitoring component to check if the job cpaas-elasticsearch-exporter exists.

    • Prometheus UI address: https://<Your platform access address>/clusters/<Cluster name>/prometheus-0/targets
    • VictoriaMetrics UI address: https://<Your platform access address>/clusters/<Cluster name>/vmselect/vmui/?#/metrics