Fleet Monitoring

Overview

Fleet Monitoring provides a global multi-cluster monitoring view for platform administrators. It helps you understand which clusters are connected, whether monitoring data is fresh, and whether the fleet has resource capacity or quota risks.

Fleet Monitoring does not replace single-cluster monitoring. Use Fleet Monitoring for fleet-level governance, capacity analysis, and long-term trends. Use single-cluster monitoring when you need detailed troubleshooting for a specific cluster, namespace, workload, or metric.

Prerequisites

Before using Fleet Monitoring, make sure the following requirements are met:

  • Alauda Container Platform Fleet Monitoring Central Service is installed on the Global cluster.
  • A FleetMonitoringHub resource exists on the Global cluster.
  • Alauda Container Platform Fleet Monitoring Cluster Service is installed on every cluster that you want to include in Fleet Monitoring.
  • A FleetMonitoringAgent resource exists on every cluster that you want to include.
  • The connected clusters are managed by the Global cluster and have the Monitoring feature enabled.
  • You have permission to view the Fleet Monitoring page and monitoring data.

For enablement steps, see Configure Fleet Monitoring.

Access Fleet Monitoring

To open Fleet Monitoring, go to Platform > Observe > Fleet Monitoring.

The page displays the default Fleet Monitoring dashboard. Use Switch to switch between built-in and custom Fleet Monitoring dashboards.

In the first version, the Fleet Monitoring page does not provide Create, Import, or panel editing actions. To add a custom Fleet Monitoring dashboard, use the existing Monitoring Dashboard resource workflow. For more information, see Add Custom Fleet Monitoring Dashboards.

Built-in Dashboards

Fleet Monitoring includes preset dashboards for fleet-level monitoring.

Fleet Monitoring Overview

The Fleet Monitoring Overview dashboard is the default preset dashboard. It helps you answer the following questions:

  • Which clusters are connected to Fleet Monitoring?
  • How many nodes, pods, projects, CPU cores, and memory resources are in the fleet?
  • What are the current CPU and memory usage and request levels across the fleet?
  • Which nodes have higher CPU or memory usage?
  • What are the fleet-level CPU and memory utilization trends?

Use this dashboard for daily fleet health checks, capacity review, and quick identification of clusters that need attention.

The dashboard includes the following information:

AreaDescription
Fleet inventoryCluster count, node count, pod count, project count, total CPU, total memory, stale collection count, and active alert count.
Fleet utilizationCPU usage ratio, CPU request ratio, memory usage ratio, and memory request ratio.
Node rankingTop nodes by CPU usage and memory usage across connected clusters.
Utilization trendsCPU utilization trend and memory utilization trend.
Cluster detailsCluster-level resource and utilization details.

Fleet Monitoring Project Quota

The Fleet Monitoring Project Quota dashboard is a preset dashboard that helps you understand project quota allocation and usage across connected clusters.

Use this dashboard to review project quota distribution and identify projects with quota allocation or usage risks.

The dashboard includes the following information:

AreaDescription
Definitions and thresholdsDefinitions for quota, allocated, used, and usage ratio, and threshold meanings for normal, high, and near-limit usage.
Quota usage overviewProject count, quota object count, high-usage object count, CPU and memory usage ratios, quota totals, allocated amounts, and used amounts.
Quota distributionCPU and memory quota distribution across unallocated, allocated-unused, and occupied resources.
Quota usage rankingTop projects by CPU quota usage and memory quota usage.
Project quota detailsProject-level quota allocation, usage, and risk details.

Filter Data

Fleet Monitoring dashboards provide variables that help you narrow the view to a specific cluster set or project set.

VariableDescription
Cluster Label KeySelects the cluster label key used for filtering.
Cluster Label ValueSelects a value for the selected cluster label key.
ClusterSelects one or more clusters. This list can be narrowed by the cluster label variables.
ProjectSelects one or more projects. This variable is available on the project quota dashboard.
Quota Resource TypeSwitches between limits and requests. This variable is available on the project quota dashboard.
Time rangeControls the dashboard query time range.

The Cluster variable can list all known clusters. The Connected Clusters metric counts only clusters that are actually writing Fleet Monitoring data. Therefore, the cluster list and connected cluster count can be different.

Add Custom Fleet Monitoring Dashboards

You can add custom multi-cluster dashboards to Fleet Monitoring by using the existing Monitoring Dashboard resource workflow.

Use one of the following methods:

  • In the existing Dashboard page, create a dashboard on the Global cluster and add the fleet-monitoring tag by using the page actions.
  • Submit a MonitorDashboard YAML resource to the namespace where Alauda Container Platform Fleet Monitoring Central Service is installed on the Global cluster. Make sure the resource has the cpaas.io/dashboard.tag.fleet-monitoring: "true" label. This label adds the fleet-monitoring tag to the dashboard.

Example:

apiVersion: ait.alauda.io/v1alpha2
kind: MonitorDashboard
metadata:
  name: my-fleet-dashboard
  namespace: <fleet-monitoring-namespace>
  labels:
    cpaas.io/dashboard.folder: fleet
    cpaas.io/dashboard.tag.fleet-monitoring: "true"
    cpaas.io/dashboard.tag.multi-cluster: "true"
    cpaas.io/published: "true"
spec:
  body: {}

Replace <fleet-monitoring-namespace> with the namespace where Alauda Container Platform Fleet Monitoring Central Service is installed.

The system does not validate whether a custom dashboard uses Fleet Monitoring metrics. The dashboard author must make sure that the dashboard uses data sources, metrics, and variables that work in the Fleet Monitoring context.

For multi-cluster dashboards, use the cluster label to identify the source cluster. If an original metric already has a cluster label, Fleet Monitoring preserves the original value as exported_cluster.

In the current platform query path, Fleet Monitoring dashboard queries should explicitly include vmcluster=~".*" when querying Fleet metrics directly. Without this selector, the monitoring query proxy can narrow the query to the Global monitoring backend and return no Fleet metric data for connected clusters.

Examples:

avg_over_time(fleet:node:node_load15:avg{vmcluster=~".*",cluster="g1-c1"}[1h])
last_over_time(fleet:node:node_load15:avg:avg_over_time_1h{vmcluster=~".*",cluster="g1-c1"}[2h])

Limits

The first version of Fleet Monitoring has the following limits:

  • Fleet Monitoring does not provide a dedicated multi-cluster alerting page.
  • Fleet Monitoring data can be used by the existing alerting mechanism, but alert rules are still managed through existing alerting workflows.
  • Fleet Monitoring does not provide a self-service page for ordinary users to configure collected metrics or recording rules.
  • Fleet Monitoring does not backfill historical data. Data is collected only after Fleet Monitoring is enabled.
  • Fleet Monitoring is not intended for second-level troubleshooting. Use single-cluster monitoring for detailed troubleshooting.