Introduction

Hardware accelerator Introduction

The Kubernetes Hardware accelerator Suite is an enterprise-grade solution for optimizing GPU resource allocation, isolation, and sharing in cloud-native environments. Built on Kubernetes device plugins and NVIDIA-native technologies, it provides three core modules:

vGPU Module Based on Opensource GPU-Manager, this enables fine-grained GPU virtualization by splitting physical GPUs into shareable virtual units with memory/compute quotas. Ideal for multi-tenant environments requiring dynamic resource allocation.
pGPU Module Leveraging NVIDIA's official Device Plugin, it delivers full physical GPU isolation with NUMA-aware scheduling. Designed for high-performance computing (HPC) workloads needing dedicated GPU access.
MPS Module Implements NVIDIA's Multi-Process Service to allow concurrent GPU context execution with resource constraints. Optimizes latency-sensitive applications through CUDA kernel fusion.

Product Advantages

vGPU Module

Dynamic Slicing: Split GPUs to support multi progress used one physical gpu
QoS Enforcement: Guaranteed compute units (vcuda-core) and memory quotas (vcuda-memory)

pGPU Module

Hardware-Level Isolation: Direct PCIe passthrough with IOMMU protection
NUMA Optimization: Minimize cross-socket data transfer via automatic NUMA node binding

MPS Module

Low-Latency Execution: 30-50% latency reduction through CUDA context fusion
Resource Caps: Limit per-process GPU compute (0-100%) and memory usage
Zero Code Changes: Works with unmodified CUDA applications

Application Scenarios

vGPU Use Cases

Multi-Tenant AI Platforms: Share A100/H100 GPUs across teams with guaranteed SLAs
VDI Environments: Deliver GPU-accelerated virtual desktops for CAD/3D rendering
Batch Inference: Parallelize model serving with fractional GPU allocations

pGPU Use Cases

HPC Clusters: Run MPI jobs with exclusive GPU access for weather simulation
ML Training: Full GPU utilization for large language model training
Medical Imaging: Process high-resolution MRI data without resource contention

MPS Use Cases

Real-Time Inference: Low-latency video analytics using concurrent CUDA streams
Microservice Orchestration: Co-locate multiple GPU microservices on shared Hardware
High-Concurrency Serving: Improve QPS by 3x for recommendation systems

Technical Limitations

Privileged Required

Hardware Device Access Requirements

Device File Permissions NVIDIA GPU devices require direct access to protected system resources:

# Device file ownership and permissions
ls -l /dev/nvidia*
crw-rw-rw- 1 root root 195,   0 Aug 1 10:00 /dev/nvidia0
crw-rw-rw- 1 root root 195, 255 Aug 1 10:00 /dev/nvidiactl
crw-rw-rw- 1 root root 195, 254 Aug 1 10:00 /dev/nvidia-uvm

Requirement: Root access to read/write device files
Consequence: Non-root containers get permission denied errors

Kernel-Level Operations

Essential NVIDIA Driver Interactions

Operation	Privilege Requirement	Purpose
Module Loading	CAP_SYS_MODULE	Load NVIDIA kernel modules
Memory Management	CAP_IPC_LOCK	GPU memory allocation
Interrupt Handling	CAP_SYS_RAWIO	Process GPU interrupts

K8s Device Plugin Architecture Requirements

Socket Creation: Write to /var/lib/kubelet/device-plugins
Health Monitoring: Access to nvidia-smi and kernel logs
Resource Allocation: Modify device cgroups

vGPU Constraints

support only cuda less then 12.4
No MIG support when vGPU enabled

pGPU Constraints

No GPU sharing capability (1:1 pod-to-GPU mapping)
Requires Kubernetes 1.25+ with SR-IOV enabled
Limited to PCIe/NVSwitch-connected GPUs

MPS Constraints

Potential fault propagation across fused contexts
Requires CUDA 11.4+ for memory limits
No support for MIG-sliced GPUs

View full docs as PDF

Node Management

Managed Clusters

Import Clusters

Public Cloud Cluster Initialization

Network Initialization

Storage Initialization

How to

How to

Backup Management

Recovery Management

Architecture

Concepts

Guides

How To

Trouble Shooting

Concepts

Guides

How To

Troubleshooting

Install

Concepts

Guides

How To

Disaster Recovery

Concepts

Guides

How To

Guides

Compliance

Install

API Refiner

User

Guides

Group

Guides

Role

Guides

IDP

Guides

Troubleshooting

User Policy

Guides

Overview

Images

Guides

How To

Virtual Machine

Guides

How To

Troubleshooting

Network

Guides

How To

Storage

Guides

Backup and Recovery

Guides

Concepts

Concepts

Guides

Namespaces

Pre-Application-Creation Preparation

Creating Applications

Post-Application-Creation Configuration

Operation and Maintenance

Application Observability

Workloads

Pod

Container

How To

Install

How To

Install

Guides

How To

Concepts

Guides

Argo CD Concept

Alauda Container Platform GitOps Concepts

Creating GitOps Application