Monitor Component Capacity Planning
The monitor component is responsible for storing metrics data collected from one or more clusters in the platform. Therefore, you need to assess your monitor scale in advance and plan the resources needed for the monitor component according to the guidelines in this document.
TOC
Assumptions and Methodology
- Data in this document comes from controlled lab performance reports and is intended as a sizing baseline for production planning.
- Retention for disk examples is 7 days; adjust proportionally for other retention targets.
- Storage baseline matches the warning above (SSD, ~6000 IOPS, ~250MB/s read/write, independent mount).
- Test workloads exercised typical monitoring pages such as "acp ns overview page" and "platform region detail page".
Prometheus
Below are sizing recommendations by scale for Prometheus and related components (Thanos Query, Thanos Sidecar, etc.).
Small Scale — 10 worker nodes, 500 double-container Pods
- Metric ingestion rate: ~2800 samples/second
Component | Container | Replicas | CPU Limit | Memory Limit | Disk (if applicable) | Notes |
---|
courier-api | courier | 2 | 2C | 4Gi | - | - |
kube-prometheus-thanos-query | thanos-query | 1 | 1C | 1Gi | - | - |
prometheus-kube-prometheus-0 | prometheus | 1 | 2C | 8Gi | 20G | ~10G write over 7 days |
Medium Scale — 50 worker nodes, 2000 double-container Pods
- Metric ingestion rate: ~7294 samples/second
Component | Container | Replicas | CPU Limit | Memory Limit | Disk (if applicable) | Notes |
---|
courier-api | courier | 2 | 4C | 4Gi | - | - |
kube-prometheus-thanos-query | thanos-query | 1 | 2.5C | 8Gi | - | - |
prometheus-kube-prometheus-0 | prometheus | 1 | 4C | 8Gi | 40G | ~30G write over 7 days |
Large Scale — 500 worker nodes, 10000 double-container Pods
- Metric ingestion rate: ~41575 samples/second
Component | Container | Replicas | CPU Limit | Memory Limit | Disk (if applicable) | Notes |
---|
courier-api | courier | 2 | 6C | 4Gi | - | - |
kube-prometheus-thanos-query | thanos-query | 1 | 2C | 6Gi | - | In-field deployments may use 2 replicas |
prometheus-kube-prometheus-0 | prometheus | 1 | 8C | 20Gi | 100G | Peak mem ~15Gi; ~69G write over 7 days |
VictoriaMetrics
Below are sizing recommendations by scale for VictoriaMetrics components.
Small Scale — 10 worker nodes, 500 double-container Pods
- Metric ingestion rate: ~3274 samples/second
Component | Container | Replicas | CPU Limit | Memory Limit | Disk (if applicable) | Notes |
---|
courier-api | courier | 1 | 2C | 4Gi | - | - |
vmselect-cluster | proxy | 1 | 1C | 200Mi | - | - |
vmselect | vmselect | 1 | 500m | 1Gi | - | - |
vmstorage-cluster | vmstorage | 1 | 500m | 2Gi | 3G | ~1.5G write over 7 days |
Medium Scale — 50 worker nodes, 2000 double-container Pods
- Metric ingestion rate: ~6940 samples/second
Component | Container | Replicas | CPU Limit | Memory Limit | Disk (if applicable) | Notes |
---|
courier-api | courier | 2 | 4C | 4Gi | - | - |
vmselect-cluster | proxy | 1 | 1C | 200Mi | - | - |
vmselect | vmselect | 1 | 2C | 2Gi | - | - |
vmstorage-cluster | vmstorage | 1 | 2C | 2Gi | 10G | ~2.6G write over 7 days |
Large Scale — 500 worker nodes, 10000 double-container Pods
- Metric ingestion rate: ~34300 samples/second
Component | Container | Replicas | CPU Limit | Memory Limit | Disk (if applicable) | Notes |
---|
courier-api | courier | 2 | 6C | 4Gi | - | - |
vmselect-cluster | proxy | 1 | 2C | 200Mi | - | - |
vmselect | vmselect | 1 | 5C | 3Gi | - | - |
vmstorage-cluster | vmstorage | 1 | 2C | 6Gi | 30G | ~16.8G write over 7 days |