Configure Fleet Monitoring
TOC
OverviewBefore You BeginPush Fleet Monitoring Operator PackagesEnable Fleet Monitoring on the Global ClusterConnect a Cluster to Fleet MonitoringConfigure the Collection IntervalConfigure Custom Metrics and Recording RulesDecide Whether You Need Agent-side Rules Only or Both Agent-side and Hub-side RulesConfigure the Connected ClusterVerify the Connected-cluster RenderingAdd a Custom Hub Rollup RuleVerify the Hub-side RollupQuery the Custom Fleet MetricCommon MistakesVerify Data FreshnessTroubleshootingNo Fleet Monitoring data is displayedA cluster is missing from Connected ClustersA custom dashboard does not appear in Fleet MonitoringData freshness is abnormalLearn MoreOverview
Fleet Monitoring uses two services:
- Alauda Container Platform Fleet Monitoring Central Service runs on the Global cluster and enables the Global side of Fleet Monitoring.
- Alauda Container Platform Fleet Monitoring Cluster Service runs on every cluster that you want to include in Fleet Monitoring.
After Fleet Monitoring is enabled, connected clusters write fleet-level metrics to the Global VictoriaMetrics storage. The built-in Fleet Monitoring dashboards use these metrics to display fleet health, resource usage, data freshness, and project quota usage.
Before You Begin
Before configuring Fleet Monitoring, make sure the following requirements are met:
- You have platform administrator permissions or the required permissions to install Operators and create the Fleet Monitoring resources.
- The Global cluster has an available remote write endpoint for Fleet Monitoring data. VictoriaMetrics with a write endpoint is supported. Prometheus or Thanos-based Monitoring can also work if the endpoint exposed by the Monitoring feature accepts remote write traffic.
- Each cluster that you want to connect is managed by the Global cluster.
- Each cluster that you want to connect has the Monitoring feature enabled.
- The Fleet Monitoring Operator packages have been pushed to the required clusters or are already available in OperatorHub. Fleet Monitoring is delivered as Agnostic Operators and is not included by default with the platform installation.
- The New Web Console plugin deployment capability is available. Fleet Monitoring Operators are deployed through the New Web Console OperatorHub workflow. For more information, see Install the New Web Console.
Push Fleet Monitoring Operator Packages
Before installing Fleet Monitoring from OperatorHub, push the Fleet Monitoring Operator packages with violet.
Push Alauda Container Platform Fleet Monitoring Central Service to the Global cluster:
Push Alauda Container Platform Fleet Monitoring Cluster Service to every cluster where the Cluster Service will be installed. If the Global cluster also needs to be included in Fleet Monitoring data, include global in the cluster list:
After the package is pushed, the corresponding Operator appears in Marketplace > OperatorHub on the selected cluster. For more information about violet push, see Upload Packages.
Enable Fleet Monitoring on the Global Cluster
-
Go to Administrator.
-
In the left navigation bar, click Marketplace > OperatorHub.
-
At the top of the page, select the
globalcluster. -
Search for Alauda Container Platform Fleet Monitoring Central Service.
-
If the Operator status is not Installed, click Install and keep the default installation configuration unless your environment requires a different channel, namespace, or upgrade strategy.
-
Verify that Alauda Container Platform Fleet Monitoring Central Service is Installed in OperatorHub.
If you select the
Manualupgrade strategy and OperatorHub shows a pending install plan, approve the install plan to complete the installation. -
Verify that the Global cluster has an available VictoriaMetrics write endpoint.
Fleet Monitoring uses the Global VictoriaMetrics write endpoint to receive data from connected clusters. If the Global cluster has only Prometheus or Thanos Query available, the connected clusters cannot write Fleet Monitoring data to the Global cluster.
-
Create the
FleetMonitoringHubresource on the Global cluster.FleetMonitoringHubis a cluster-scoped resource. Do not setmetadata.namespace. -
Verify the
FleetMonitoringHubstatus.Check that the following conditions are ready:
You can also check the phase:
The expected phase is
Ready.
Connect a Cluster to Fleet Monitoring
Repeat the following steps on every cluster that you want to include in Fleet Monitoring.
-
Go to Administrator.
-
In the left navigation bar, click Marketplace > OperatorHub.
-
At the top of the page, select the target cluster.
-
Search for Alauda Container Platform Fleet Monitoring Cluster Service.
-
If the Operator status is not Installed, click Install and keep the default installation configuration unless your environment requires a different channel, namespace, or upgrade strategy.
-
Verify that Alauda Container Platform Fleet Monitoring Cluster Service is Installed in OperatorHub.
If you select the
Manualupgrade strategy and OperatorHub shows a pending install plan, approve the install plan to complete the installation. -
Create the
FleetMonitoringAgentresource on the target cluster.FleetMonitoringAgentis a cluster-scoped resource. Do not setmetadata.namespace.To include the Global cluster itself in Fleet Monitoring data, also install Alauda Container Platform Fleet Monitoring Cluster Service on the Global cluster and create a
FleetMonitoringAgentresource there. -
Verify the
FleetMonitoringAgentstatus.Check the following conditions:
You can also check the phase and the detected cluster name:
The expected phase is
Ready. CommonReadyreasons are:WorkloadReady: The cluster deploys a VMAgent and is ready to write Fleet Monitoring data.SkippedForGlobal: The cluster is the Global cluster, so the Cluster Service skips deploying a VMAgent back to the same Global storage.SkippedBackendReuse: The cluster reuses the Global VictoriaMetrics backend, so the Cluster Service skips deploying a Fleet Monitoring VMAgent to avoid a write loop.
If the target cluster writes data to the Global storage, verify that the database information is available:
Replace
<fleet-monitoring-namespace>with the namespace where Alauda Container Platform Fleet Monitoring Cluster Service is installed. -
Open Platform > Observe > Fleet Monitoring and verify that the cluster appears in the dashboard data.
Configure the Collection Interval
The collection interval is configured on the FleetMonitoringAgent resource of each connected cluster.
Supported values:
5m10m15m30m
Example:
After you update spec.interval, the Fleet Monitoring Cluster Service reconciles the local collection configuration.
On clusters that deploy a Fleet Monitoring VMAgent, the VMAgent collection interval follows spec.interval, while Fleet Monitoring recording rules continue to evaluate at the system-managed interval used for federation. On the Global cluster and on clusters that reuse the Global VictoriaMetrics backend, where no Fleet Monitoring VMAgent is deployed, local Fleet Monitoring rules follow spec.interval.
Configure Custom Metrics and Recording Rules
Fleet Monitoring includes built-in metrics and recording rules. Cluster administrators can append custom metrics and recording rules on each connected cluster.
Use the following workflow when you want to report a user-defined metric into Fleet Monitoring:
- On the connected workload cluster, define a local Fleet recording rule that converts the source metric into a Fleet metric name.
- Add that recorded metric name to the Fleet allowlist so the Fleet Monitoring VMAgent federates and remote-writes it to the Global cluster.
- If you need a Fleet-level rollup such as a 1-hour aggregate, add a separate custom Hub-side
PrometheusRuleon the Global cluster. - Verify the recorded metric and rollup metric by using Fleet Monitoring queries or dashboards.
Decide Whether You Need Agent-side Rules Only or Both Agent-side and Hub-side Rules
Choose one of the following patterns:
- Use only the connected-cluster ConfigMap when you need the raw Fleet metric on the Global cluster and can query it directly.
- Use both the connected-cluster ConfigMap and a Global-cluster custom
PrometheusRulewhen you also need Fleet-level rollups such as 1-hour aggregates for dashboards or long-range views.
Example target:
- Source metric on the connected cluster:
node_load15 - Fleet metric recorded on the connected cluster:
fleet:node:node_load15:avg - Optional 1-hour rollup on the Global cluster:
fleet:node:node_load15:avg:avg_over_time_1h
Configure the Connected Cluster
Create or update the fleet-monitoring-custom-metrics ConfigMap in the namespace where Alauda Container Platform Fleet Monitoring Cluster Service is installed on the connected cluster.
This ConfigMap has two roles:
metrics.yamladds metric names to the Fleet Monitoring VMAgent federate allowlist.recording-rules-prometheus.yamlorrecording-rules-victoriametrics.yamldefines the local recording rule that produces the Fleet metric.
Example:
Replace <fleet-monitoring-namespace> with the namespace where Alauda Container Platform Fleet Monitoring Cluster Service is installed.
metrics.yaml appends metric names to the built-in allowlist.
For recording rules, the Cluster Service loads only the key that matches the local Monitoring stack:
recording-rules-prometheus.yamlon Prometheus-based clustersrecording-rules-victoriametrics.yamlon VictoriaMetrics-based clusters
Custom configuration can append metrics and rules. It does not remove or override built-in defaults.
If the ConfigMap has an invalid format or contains invalid rules, Fleet Monitoring keeps the built-in defaults and reports the error in the FleetMonitoringAgent status.
In a connected workload cluster that deploys a Fleet Monitoring VMAgent, the Agent reconciler normalizes the rendered Fleet Monitoring recording-rule group interval to 1m. Do not rely on a custom interval value in the ConfigMap to control the final rendered local Fleet Monitoring rule interval.
For custom Fleet metrics, use the Fleet naming convention for the recorded metric, for example fleet:node:node_load15:avg. This keeps the metric compatible with Fleet Monitoring dashboards, rollups, and query patterns.
Fleet Monitoring queries and dashboards require the recorded time series to carry the cluster label. The Agent reconciler automatically adds cluster=<local-cluster-name> to rendered Fleet Monitoring recording rules when the rule does not already define that label.
Verify the Connected-cluster Rendering
After you update the ConfigMap, the Fleet Monitoring Agent automatically reconciles the local rule and VMAgent configuration. No restart is required.
Check the rendered local rule:
Confirm that:
- the custom group appears in
spec.groups - the custom recorded metric appears in the rule list
- the rendered rule carries
labels.cluster=<connected-cluster-name>
Check the rendered VMAgent federate allowlist:
Confirm that the custom recorded metric appears in data.prometheus.yml under params.match[].
Add a Custom Hub Rollup Rule
If you want a custom Fleet metric to have a Fleet-level rollup such as a 1-hour aggregate, create a separate PrometheusRule on the Global cluster in the namespace where Alauda Container Platform Fleet Monitoring Central Service is installed.
Example:
Replace <fleet-monitoring-namespace> with the namespace where Alauda Container Platform Fleet Monitoring Central Service is installed.
This custom PrometheusRule is additive. It does not need to copy the built-in Hub rules and should not be created with Fleet Monitoring operator ownership metadata.
Verify the Hub-side Rollup
Check the custom Hub-side rule:
Confirm that:
- the rule exists on the Global cluster
- the rule uses the expected Fleet metric as input
- the rollup output metric name matches the dashboard or query expression you plan to use
Query the Custom Fleet Metric
When querying Fleet metrics through the platform Monitoring API or Fleet Monitoring dashboards, explicitly include vmcluster=~".*" in the selector. In the current platform query path, omitting this selector can cause the query proxy to narrow the query to the Global monitoring backend and return no Fleet data for connected clusters.
Example queries:
-
Raw custom Fleet metric:
-
1-hour rollup metric:
Common Mistakes
Watch for the following issues:
- Creating
fleet-monitoring-custom-metricsincpaas-systemwhen Fleet Monitoring is installed in another namespace such asfleet-monitoring - Adding the source metric name to
metrics.yamlinstead of the recorded Fleet metric name - Defining the local recording rule but not adding the recorded Fleet metric name to
metrics.yaml - Expecting a custom interval in the connected-cluster ConfigMap to remain effective after rendering
- Querying Fleet metrics without
vmcluster=~".*"in the selector - Expecting a Global Fleet rollup metric before creating the corresponding custom Hub-side rule
Verify Data Freshness
After clusters are connected, open Platform > Observe > Fleet Monitoring and check the following information on the overview dashboard:
- Connected Clusters
- Stale Clusters
- Last Write Ago
- Data Freshness Exceptions
If a cluster appears in the Cluster variable but is not counted as connected, the cluster can be known to the platform but not writing Fleet Monitoring data. Check whether the cluster has Alauda Container Platform Fleet Monitoring Cluster Service installed from OperatorHub and has a ready FleetMonitoringAgent.
Troubleshooting
No Fleet Monitoring data is displayed
Check the following items:
- Alauda Container Platform Fleet Monitoring Central Service is installed on the Global cluster.
- The
FleetMonitoringHubresource exists and has ready conditions. - The Global cluster has an available VictoriaMetrics storage and write endpoint. Prometheus-only Monitoring cannot receive Fleet Monitoring remote write data.
- Built-in dashboards and Global rules are applied.
A cluster is missing from Connected Clusters
Check the following items on the target cluster:
- Alauda Container Platform Fleet Monitoring Cluster Service is installed.
- The
FleetMonitoringAgentresource exists. - The cluster is managed by the Global cluster.
- The Monitoring feature is enabled on the cluster.
- The
FleetMonitoringAgentstatus does not report missing database information or invalid Monitoring feature information. - The
fleet-monitoring-databaseSecret contains aremoteWriteURLthat points to the Global VictoriaMetrics write endpoint. - The
fleet-monitoring-vmagentlogs do not report remote write errors. If the logs show405 Method Not Allowed, theremoteWriteURLcan be pointing to a Prometheus or Thanos Query endpoint instead of the VictoriaMetrics write endpoint.
A custom dashboard does not appear in Fleet Monitoring
Check whether the dashboard was created on the Global cluster and the dashboard resource in the cpaas-system namespace has the following label:
Dashboards without this label do not have the fleet-monitoring tag and are not listed in the Fleet Monitoring Switch list.
Data freshness is abnormal
Check the following items:
- The connected cluster is running.
- The Fleet Monitoring Cluster Service pods are healthy.
- The local Monitoring component on the connected cluster is healthy.
- The connected cluster can write data to the Global VictoriaMetrics storage.
- The
FleetMonitoringAgentstatus does not report configuration or resource application errors.