Verification
This article describes how to verify that the installed Alauda Build of NVIDIA GPU Device Plugin and related monitoring is valid.
TOC
Verify Alauda Build of NVIDIA GPU Device Plugin
- Check whether there are allocatable GPU resources on the GPU node in the control node of the business cluster.
Run the following command:
kubectl get node ${nodeName} -o=jsonpath='{.status.allocatable}'
# The output contains: "nvidia.com/gpu":"1" (the specific value depends on the number of GPU cards)
- Deploy a GPU demo instance.
Check whether there is any GPU-related resource consumption. Run the following command on the GPU node of the business cluster:
nvidia-smi pmon -s u -d 1
If both sm and mem contain data, the GPU is ready. You can start developing GPU applications on the GPU node.
Note: When deploying GPU applications, be sure to configure the following mandatory parameters:
spec:
containers:
- image: your-image
imagePullPolicy: IfNotPresent
name: gpu
resources:
limits:
cpu: '2'
memory: 4Gi
nvidia.com/gpu: 1 # Request 1 physical GPU (required)
Verify GPU Dashboards
After the HAMi vgpu service has been running for a while, navigate to Administrator
-> Operations Center
-> Monitor
-> Dashboards
page and switch to the node
and pod
panel under GPU
.
You will see the relevant monitoring data.