Evaluating Resources for Global Cluster

Overview

This topic provides recommended practices and resource evaluation guidelines for Multi-Cluster in .

Proper node sizing ensures the global cluster can efficiently manage all registered clusters, handle synchronization traffic, and process user API and Web Console requests without performance degradation.

Node Sizing

The Global cluster is responsible for:

Maintaining cluster registration and metadata.
Handling inbound API requests from the Web Console and CLI.
Coordinating synchronization and heartbeat messages with managed clusters.
Managing internal controllers and resource reconciliation loops.

Because the Global Cluster must handle both management operations and data aggregation from all connected clusters, resource allocation should be planned according to the expected scale and workload intensity.

Baseline Production Sizing

The production-scale sizing depends primarily on:

Number of managed clusters
Frequency of synchronization cycles
Concurrent API request rate (from users or automation)
Volume of streaming requests
Number of plugins installed

The following table provides reference configurations validated through internal performance testing.

Scale Tier	Managed Clusters	Number of Nodes	CPU per Node	Memory per Node	Notes
Small	≤ 10	3	8 cores	16 GB	Suitable for small-scale environments
Medium	≤ 50	3	16 cores	32 GB	Default production setup
Large	≤ 100	3	24 cores	48 GB	Supports heavy Web Console usage and frequent sync cycles
Extra Large	≤ 500	6	32 cores	64 GB	Requires horizontal scaling and dedicated infra nodes

WARNING

These recommendations are general guidelines. Actual requirements depend on your cluster topology, user concurrency, and plugins installed.

Vertical Scaling Guidelines

When increasing load per node (for example, 2× more clusters or higher user concurrency), follow these adjustments:

Parameter	Scaling Recommendation
CPU	+50% for every 50 additional managed clusters
Memory	+50% for every 50 additional managed clusters

Horizontal Scaling Guidelines

When exceeding 100 managed clusters or encountering persistent API latency above 500 ms:

Add nodes to distribute request handling and controller workloads.

Resource Validation and Monitoring

After deployment, continuously monitor the following metrics to validate node sizing:

Metric	Recommended Range
Node CPU utilization	60–75% under peak load
Node Memory utilization	≤80% sustained
API request latency	P90 < 500ms
etcd commit latency	P99 < 50ms

Node CPU utilization

Node Memory utilization

API request latency

etcd commit latency

100 * (1 - avg by (instance)(rate(node_cpu_seconds_total{mode="idle"}[5m])))

NOTE

If sustained resource usage consistently exceeds recommended thresholds, scale vertically (add CPU/memory) or horizontally (add nodes) before user-facing performance degradation occurs.

Summary

When sizing the Global Cluster:

Begin with 3 nodes × 16 cores × 32 GB for moderate-scale deployments (≤50 clusters).
Scale vertically for higher request concurrency or heavy Web Console usage.
Scale horizontally beyond 100 clusters to maintain API responsiveness.
Re-evaluate sizing after every significant increase in managed cluster count or sync frequency.

Following these guidelines ensures predictable performance and operational stability as your Multi-Cluster environment grows.

#Evaluating Resources for Global Cluster

#TOC

#Overview

#Node Sizing

#Baseline Production Sizing

#Vertical Scaling Guidelines

#Horizontal Scaling Guidelines

#Resource Validation and Monitoring

#Summary