Features
TOC
vGPU (Based on Opensource GPU-Manager)
-
Fine-Grained Resource Slicing
Splits physical GPUs core from 1-100 quotas. Supports dynamic allocation for multi-tenant environments like AI inference and virtual desktops. -
Topology-Aware Scheduling
Automatically prioritizes NVLink/C2C-connected GPUs to minimize cross-socket data transfer latency. Ensures optimal GPU pairing for distributed training workloads.
pGPU (NVIDIA Device Plugin)
-
NUMA-Optimized Allocation
Enforces 1:1 GPU-to-Pod mapping with NUMA node binding, reducing PCIe bus contention for high-performance computing (HPC) tasks like LLM training. -
Exclusive Hardware Access
Provides full physical GPU isolation through PCIe passthrough, ideal for mission-critical applications requiring deterministic performance (e.g., medical imaging processing).
MPS (NVIDIA Multi-Process Service Plugin)
-
Latency-Optimized Execution
Enables CUDA kernel fusion across processes, reducing inference latency by 30-50% for real-time applications like video analytics. -
Resource Sharing with Caps
Allows concurrent GPU context execution while enforcing per-process compute (0-100%) and memory limits via environment variables.