Physical GPU Passthrough Environment Preparation
Physical GPU passthrough in virtual machines refers to the process of directly allocating the actual Graphics Processing Unit (GPU) to a virtual machine within a virtualization environment. This allows the virtual machine to access and utilize the physical GPU directly, achieving graphics performance equivalent to that of running directly on a physical machine. It avoids performance bottlenecks caused by virtual graphics adapters, thus enhancing overall performance.
Constraints and Limitations
The physical GPU passthrough functionality requires the use of the kubevirt-gpu-device-plugin; however, there is currently no ARM64 image available for the kubevirt-gpu-device-plugin, which means this functionality cannot be used in an operating system with an ARM64 CPU architecture.
Prerequisites
Chart and Image Preparation
Obtain the following Chart and images and upload them to an image repository. This document uses build-harbor.example.cn
as an example repository address. For the specific method of obtaining the Chart and images, please contact the relevant personnel.
Chart
- build-harbor.example.cn/example/chart-gpu-operator.9.1
Images
- build-harbor.example.cn/3rdparty/nvidia/gpu-operator.9.0
- build-harbor.example.cn/3rdparty/nvidia/cloud-native/gpu-operator-validator.9.0
- build-harbor.example.cn/3rdparty/nvidia/cuda:12.3.1-base-ubi8
- build-harbor.example.cn/3rdparty/nvidia/kubevirt-gpu-device-plugin.2.4
- build-harbor.example.cn/3rdparty/nvidia/nfd/node-feature-discovery.14.2
Enabling IOMMU
The procedure for enabling IOMMU varies across different operating systems. Please refer to the documentation of the corresponding operating system. This document uses CentOS as an example, and all commands should be executed in the terminal.
-
Edit the /etc/default/grub
file and add intel_iommu=on iommu=pt
to the GRUB_CMDLINE_LINUX
configuration option.
GRUB_CMDLINE_LINUX="crashkernel=auto rd.lvm.lv=centos/root rhgb quiet intel_iommu=on iommu=pt"
-
Execute the following command to generate the grub.cfg
file.
grub2-mkconfig -o /boot/grub2/grub.cfg
-
Restart the server.
-
Run the following command to confirm if IOMMU has been successfully enabled. If the output contains IOMMU enabled
, then it indicates that it has been successfully enabled.
Operating Steps
Note: All commands below should be executed in the CLI tool on the corresponding cluster Master node unless otherwise specified.
Create Namespace
Execute the following command to create a namespace named gpu-system
. If the output displays namespace/gpu-system created
, it indicates that the creation was successful.
kubectl create ns gpu-system
Deploy gpu-operator
-
Execute the following command to deploy the gpu-operator.
export REGISTRY=<registry> # Replace <registry> with the repository address where the gpu-operator image is located, e.g.: export REGISTRY=build-harbor.example.cn
cat <<EOF | kubectl create -f -
apiVersion: operator.alauda.io/v1alpha1
kind: AppRelease
metadata:
annotations:
auto-recycle: "true"
interval-sync: "true"
name: gpu-operator
namespace: gpu-system
spec:
destination:
cluster: ""
namespace: "gpu-operator"
source:
charts:
- name: <chartName> # Replace <chartName> with the actual chart path, e.g.: name = example/chart-gpu-operator
releaseName: gpu-operator
targetRevision: v23.9.1
repoURL: $REGISTRY
timeout: 120
values:
global:
registry:
address: $REGISTRY
nfd:
enabled: true
sandboxWorkloads:
enabled: true
defaultWorkload: "vm-passthrough"
EOF
-
Execute the following command to check if the gpu-operator has synchronized. If SYNC
shows as Synced
, it indicates that it has synchronized successfully.
kubectl -n gpu-system get apprelease gpu-operator
Output information:
NAME SYNC HEALTH MESSAGE UPDATE AGE
gpu-operator Synced Ready chart synced 28s 32s
-
Execute the following command to retrieve the names of all nodes and find the GPU node name.
kubectl get nodes -o wide
-
Execute the following command to check if the GPU node has any pass-through capable GPU. If the output contains GPU information similar to nvidia.com/GK210GL_TESLA_K80
, it indicates that there are pass-through capable GPUs.
kubectl get node <gpu-node-name> -o jsonpath='{.status.allocatable}' # Replace <gpu-node-name> with the GPU node name obtained from Step 3
Output information:
{"cpu":"39","devices.kubevirt.io/kvm":"1k","devices.kubevirt.io/tun":"1k","devices.kubevirt.io/vhost-net":"1k","ephemeral-storage":"426562784165","hugepages-1Gi":"0","hugepages-2Mi":"0","memory":"122915848Ki","nvidia.com/GK210GL_TESLA_K80":"8","pods":"256"}
-
At this point, the gpu-operator has been successfully deployed.
Configure Kubevirt
-
Execute the following command to enable the DisableMDEVConfiguration feature. If a message similar to hyperconverged.hco.kubevirt.io/kubevirt-hyperconverged patched
is returned, it indicates successful enabling.
kubectl patch hco kubevirt-hyperconverged -n kubevirt --type='json' -p='[{"op": "add", "path": "/spec/featureGates/disableMDevConfiguration", "value": true }]'
-
In the terminal of the GPU node, execute the following command to obtain the pciDeviceSelector. The 10de:102d
part in the output is the value of pciDeviceSelector. {#pciDeviceSelector}
lspci -nn | grep -i nvidia
Output information:
04:00.0 3D controller [0302]: NVIDIA Corporation GK210GL [Tesla K80] [10de:102d] (rev a1)
05:00.0 3D controller [0302]: NVIDIA Corporation GK210GL [Tesla K80] [10de:102d] (rev a1)
08:00.0 3D controller [0302]: NVIDIA Corporation GK210GL [Tesla K80] [10de:102d] (rev a1)
09:00.0 3D controller [0302]: NVIDIA Corporation GK210GL [Tesla K80] [10de:102d] (rev a1)
85:00.0 3D controller [0302]: NVIDIA Corporation GK210GL [Tesla K80] [10de:102d] (rev a1)
86:00.0 3D controller [0302]: NVIDIA Corporation GK210GL [Tesla K80] [10de:102d] (rev a1)
89:00.0 3D controller [0302]: NVIDIA Corporation GK210GL [Tesla K80] [10de:102d] (rev a1)
8a:00.0 3D controller [0302]: NVIDIA Corporation GK210GL [Tesla K80] [10de:102d] (rev a1)
-
Execute the following command to retrieve the names of all nodes and find the GPU node name.
kubectl get nodes -o wide
-
Execute the following command to obtain the resourceName. The nvidia.com/GK210GL_TESLA_K80
part in the output is the value of resourceName.
kubectl get node <gpu-node-name> -o jsonpath='{.status.allocatable}' # Replace <gpu-node-name> with the GPU node name obtained from Step 3
Output information:
{"cpu":"39","devices.kubevirt.io/kvm":"1k","devices.kubevirt.io/tun":"1k","devices.kubevirt.io/vhost-net":"1k","ephemeral-storage":"426562784165","hugepages-1Gi":"0","hugepages-2Mi":"0","memory":"122915848Ki","nvidia.com/GK210GL_TESLA_K80":"8","pods":"256"}
-
Execute the following command to add the passthrough GPU.
Note: When replacing the <pci-devices-id> part in the command below with the pciDeviceSelector value obtained in Step 2, all letters in the pciDeviceSelector must be converted to uppercase. For example, if the pciDeviceSelector value obtained is 10de:102d
, it should be replaced with export DEVICE=10DE:102D
.
-
Adding a single GPU card
export DEVICE=<pci-devices-id> # Replace <pci-devices-id> with the pciDeviceSelector obtained in Step 2, e.g.: export DEVICE=10DE:102D
export RESOURCE=<resource-name> # Replace <resource-name> with the resourceName obtained in Step 4, e.g.: export RESOURCE=nvidia.com/GK210GL_TESLA_K80
kubectl patch hco kubevirt-hyperconverged -n kubevirt --type='json' -p='
[
{
"op": "add",
"path": "/spec/permittedHostDevices",
"value": {
"pciHostDevices": [
{
"externalResourceProvider": true,
"pciDeviceSelector": "'"$DEVICE"'",
"resourceName": "'"$RESOURCE"'"
}
]
}
}
]'
-
Adding multiple GPU cards
Note: When adding multiple GPU cards, each pciDeviceSelector value used to replace <pci-devices-id> must be unique.
export DEVICE1=<pci-devices-id1> # Replace <pci-devices-id1> with the pciDeviceSelector obtained in Step 2
export RESOURCE1=<resource-name1> # Replace <resource-name1> with the resourceName obtained in Step 4
export DEVICE2=<pci-devices-id2> # Replace <pci-devices-id2> with the pciDeviceSelector obtained in Step 2
export RESOURCE2=<resource-name2> # Replace <resource-name2> with the resourceName obtained in Step 4
kubectl patch hco kubevirt-hyperconverged -n kubevirt --type='json' -p='
[
{
"op": "add",
"path": "/spec/permittedHostDevices",
"value": {
"pciHostDevices": [
{
"externalResourceProvider": true,
"pciDeviceSelector": "'"$DEVICE"'",
"resourceName": "'"$RESOURCE"'"
},
{
"externalResourceProvider": true,
"pciDeviceSelector": "'"$DEVICE2"'",
"resourceName": "'"$RESOURCE2"'"
}
]
}
}
]'
-
Adding new GPU cards after already adding GPU cards
export DEVICE=<pci-devices-id> # Replace <pci-devices-id> with the pciDeviceSelector obtained in Step 2
export RESOURCE=<resource-name> # Replace <resource-name> with the resourceName obtained in Step 4
export INDEX=<index> # index is a zero-based array index, use the number to replace <index>, for example: if one GPU card has already been added, and now you want to add another one, index should be 1, i.e., export INDEX=1
kubectl patch hco kubevirt-hyperconverged -n kubevirt --type='json' -p='
[
{
"op": "add",
"path": "/spec/permittedHostDevices/pciHostDevices/'"${INDEX}"'",
"value": {
"externalResourceProvider": true,
"pciDeviceSelector": "'"$DEVICE"'",
"resourceName": "'"$RESOURCE"'"
}
}
]'
Result Verification
After completing the above configuration steps, if the corresponding physical GPU can be selected when creating the virtual machine, it indicates that the physical GPU passthrough environment has been successfully prepared.
Note: If physical GPU passthrough needs to be configured, please enable the relevant features in advance.
-
Go to Container Platform.
-
In the left navigation bar, click Virtualization > Virtual Machines.
-
Click Create Virtual Machine.
-
Configure the Physical GPU (Alpha) parameter for the virtual machine.
Parameter | Description |
---|
Physical GPU (Alpha) | Select the model of the configured physical GPU. Only one physical GPU can be assigned to each virtual machine. |
-
At this point, the physical GPU passthrough environment has been successfully prepared.
Related Operations
Delete the Virtual Machine with Passthrough GPU
-
Go to Container Platform.
-
In the left navigation bar, click Virtualization > Virtual Machines.
-
In the list page, click the ⋮ on the right side of the virtual machine to be deleted > Delete, or click the name of the virtual machine to be deleted to enter its detail information page, and click Actions > Delete.
-
Input the confirmation information to delete the virtual machine with passthrough GPU.
Remove GPU-related Configuration from KubeVirt
-
On the corresponding cluster Master node for the GPU, use the CLI tool to execute the following command to remove the GPU-related configuration from KubeVirt.
kubectl patch hco kubevirt-hyperconverged -n kubevirt --type='json' -p='[{"op": "remove", "path": "/spec/permittedHostDevices"}]'
-
After deletion, if it is not possible to choose the corresponding physical GPU model when creating a virtual machine through Container Platform, it indicates that the deletion was successful. Please refer to Select Physical GPU Model for the specific steps to create a virtual machine.
Uninstall gpu-operator
-
Use the CLI tool on the corresponding cluster Master node for the GPU to execute the following command to uninstall the gpu-operator.
kubectl -n gpu-system delete apprelease gpu-operator
Output information:
apprelease.operator.alauda.io "gpu-operator" deleted
-
Execute the command, and if you receive a response similar to the one below, it indicates that the gpu-operator has been successfully uninstalled.
kubectl -n gpu-system get apprelease gpu-operator
Output information:
Error from server (NotFound): appreleases.operator.alauda.io "gpu-operator" not found