Evaluating Resources for Global Cluster

TOC

Overview

This topic provides recommended practices and resource evaluation guidelines for Multi-Cluster in .

Proper node sizing ensures the global cluster can efficiently manage all registered clusters, handle synchronization traffic, and process user API and Web Console requests without performance degradation.

Node Sizing

The Global cluster is responsible for:

  • Maintaining cluster registration and metadata.
  • Handling inbound API requests from the Web Console and CLI.
  • Coordinating synchronization and heartbeat messages with managed clusters.
  • Managing internal controllers and resource reconciliation loops.

Because the Global Cluster must handle both management operations and data aggregation from all connected clusters, resource allocation should be planned according to the expected scale and workload intensity.

Baseline Production Sizing

The production-scale sizing depends primarily on:

  • Number of managed clusters
  • Frequency of synchronization cycles
  • Concurrent API request rate (from users or automation)
  • Volume of streaming requests
  • Number of plugins installed

The following table provides reference configurations validated through internal performance testing.

Scale TierManaged ClustersNumber of NodesCPU per NodeMemory per NodeNotes
Small≤ 1038 cores16 GBSuitable for small-scale environments
Medium≤ 50316 cores32 GBDefault production setup
Large≤ 100324 cores48 GBSupports heavy Web Console usage and frequent sync cycles
Extra Large≤ 500632 cores64 GBRequires horizontal scaling and dedicated infra nodes
WARNING

These recommendations are general guidelines. Actual requirements depend on your cluster topology, user concurrency, and plugins installed.

Vertical Scaling Guidelines

When increasing load per node (for example, 2× more clusters or higher user concurrency), follow these adjustments:

ParameterScaling Recommendation
CPU+50% for every 50 additional managed clusters
Memory+50% for every 50 additional managed clusters

Horizontal Scaling Guidelines

When exceeding 100 managed clusters or encountering persistent API latency above 500 ms:

Add nodes to distribute request handling and controller workloads.

Resource Validation and Monitoring

After deployment, continuously monitor the following metrics to validate node sizing:

MetricRecommended Range
Node CPU utilization60–75% under peak load
Node Memory utilization≤80% sustained
API request latencyP90 < 500ms
etcd commit latencyP99 < 50ms
Node CPU utilization
Node Memory utilization
API request latency
etcd commit latency
100 * (1 - avg by (instance)(rate(node_cpu_seconds_total{mode="idle"}[5m])))
NOTE

If sustained resource usage consistently exceeds recommended thresholds, scale vertically (add CPU/memory) or horizontally (add nodes) before user-facing performance degradation occurs.

Summary

When sizing the Global Cluster:

  1. Begin with 3 nodes × 16 cores × 32 GB for moderate-scale deployments (≤50 clusters).
  2. Scale vertically for higher request concurrency or heavy Web Console usage.
  3. Scale horizontally beyond 100 clusters to maintain API responsiveness.
  4. Re-evaluate sizing after every significant increase in managed cluster count or sync frequency.

Following these guidelines ensures predictable performance and operational stability as your Multi-Cluster environment grows.