Features

vGPU (Based on Opensource GPU-Manager)

  • Fine-Grained Resource Slicing
    Splits physical GPUs core from 1-100 quotas. Supports dynamic allocation for multi-tenant environments like AI inference and virtual desktops.

  • Topology-Aware Scheduling
    Automatically prioritizes NVLink/C2C-connected GPUs to minimize cross-socket data transfer latency. Ensures optimal GPU pairing for distributed training workloads.

pGPU (NVIDIA Device Plugin)

  • NUMA-Optimized Allocation
    Enforces 1:1 GPU-to-Pod mapping with NUMA node binding, reducing PCIe bus contention for high-performance computing (HPC) tasks like LLM training.

  • Exclusive Hardware Access
    Provides full physical GPU isolation through PCIe passthrough, ideal for mission-critical applications requiring deterministic performance (e.g., medical imaging processing).

MPS (NVIDIA Multi-Process Service Plugin)

  • Latency-Optimized Execution
    Enables CUDA kernel fusion across processes, reducing inference latency by 30-50% for real-time applications like video analytics.

  • Resource Sharing with Caps
    Allows concurrent GPU context execution while enforcing per-process compute (0-100%) and memory limits via environment variables.