Overcommitment Ratio

⚠️ This feature is still experimental. Please use it with caution.

Understanding the Overcommitment Ratio in Hami vGPU

Hami supports configuring a global overcommitment ratio for both vGPU compute cores and memory. The purpose of vGPU overcommitment is to improve GPU utilization, not to increase resource allocation for individual tasks. The mechanism of vGPU overcommitment is only logical in hami-scheduler.

Key Concepts

NVIDIA Device Core Scaling: Overcommitment ratio applied to GPU compute cores.
NVIDIA Device Memory Scaling: Overcommitment ratio applied to GPU memory.

Core Capabilities

Enable higher GPU utilization, allowing more workloads to share a single GPU card.

Configuring the Overcommitment Ratio

Go to Administrator → Marketplace → Cluster Plugin.
Switch to the target cluster.
Update the parameters NVIDIA Device Core Scaling and NVIDIA Device Memory Scaling when deploying or upgrading the Alauda Build of Hami cluster plugin.

Notes

vGPU Core Overcommitment
- When the overcommitment ratio for GPU cores is greater than 1, multiple workloads may request more than 100% of the GPU compute capacity.
- If all workloads run at full load, they share the physical GPU compute equally (up to their requested share). As a result, each workload may run slower compared to using a dedicated GPU.
- If some workloads are idle, active workloads can utilize the freed capacity.
Example:
- Core overcommitment ratio = 2 → one GPU card provides a logical 200% of allocatable cores.
- Four pods request: Pod A = 80%, Pod B = 60%, Pod C = 40%, Pod D = 20%.
- Scenarios:
  - If all pods are busy, Pod D receives its requested 20%, while Pods A–C compete for the remaining 80% (≈26.7% each).
  - If only Pod A is active, it can utilize up to 80% of the cores.
vGPU Memory Overcommitment
- When memory overcommitment is enabled, workloads may collectively request more than the physical GPU memory capacity.
- If total requests exceed available memory and all pods attempt to use their full allocation, some workloads may encounter CUDA out of memory errors.
- Use memory overcommitment with caution, as it can directly lead to application failures.
Scope
- The overcommitment ratio described here applies only to NVIDIA GPUs.

#Overcommitment Ratio

#TOC

#Understanding the Overcommitment Ratio in Hami vGPU

#Key Concepts

#Core Capabilities

#Configuring the Overcommitment Ratio