Overcommitment Ratio
⚠️ This feature is still experimental. Please use it with caution.
TOC
Understanding the Overcommitment Ratio in Hami vGPU
Hami supports configuring a global overcommitment ratio for both vGPU compute cores and memory. The purpose of vGPU overcommitment is to improve GPU utilization, not to increase resource allocation for individual tasks. The mechanism of vGPU overcommitment is only logical in hami-scheduler.
Key Concepts
- NVIDIA Device Core Scaling: Overcommitment ratio applied to GPU compute cores.
- NVIDIA Device Memory Scaling: Overcommitment ratio applied to GPU memory.
Core Capabilities
- Enable higher GPU utilization, allowing more workloads to share a single GPU card.
Configuring the Overcommitment Ratio
- Go to Administrator → Marketplace → Cluster Plugin.
- Switch to the target cluster.
- Update the parameters NVIDIA Device Core Scaling and NVIDIA Device Memory Scaling when deploying or upgrading the Alauda Build of Hami cluster plugin.
Notes
-
vGPU Core Overcommitment
- When the overcommitment ratio for GPU cores is greater than 1, multiple workloads may request more than 100% of the GPU compute capacity.
- If all workloads run at full load, they share the physical GPU compute equally (up to their requested share). As a result, each workload may run slower compared to using a dedicated GPU.
- If some workloads are idle, active workloads can utilize the freed capacity.
Example:
- Core overcommitment ratio = 2 → one GPU card provides a logical 200% of allocatable cores.
- Four pods request: Pod A = 80%, Pod B = 60%, Pod C = 40%, Pod D = 20%.
- Scenarios:
- If all pods are busy, Pod D receives its requested 20%, while Pods A–C compete for the remaining 80% (≈26.7% each).
- If only Pod A is active, it can utilize up to 80% of the cores.
-
vGPU Memory Overcommitment
- When memory overcommitment is enabled, workloads may collectively request more than the physical GPU memory capacity.
- If total requests exceed available memory and all pods attempt to use their full allocation, some workloads may encounter
CUDA out of memoryerrors. - Use memory overcommitment with caution, as it can directly lead to application failures.
-
Scope
- The overcommitment ratio described here applies only to NVIDIA GPUs.