Verification

This article describes how to verify that the installed Alauda build of Hami and related monitoring is valid.

TOC

Verify Hami

  1. Check whether there are allocatable GPU resources on the GPU node in the control node of the business cluster. Run the following command:
    kubectl get node  ${nodeName} -o=jsonpath='{.status.allocatable}'
    # The output contains: "nvidia.com/gpualloc":"10" (the specific value depends on the number of GPU cards and installation parameters)
  2. Deploy a GPU demo instance. Check whether there is any GPU-related resource consumption. Run the following command on the GPU node of the business cluster:
    nvidia-smi pmon -s u -d 1

If both sm and mem contain data, the GPU is ready. You can start developing GPU applications on the GPU node. Note: When deploying GPU applications, be sure to configure the following mandatory parameters:

spec:
  containers:
    - image: your-image
      imagePullPolicy: IfNotPresent
      name: gpu
      resources:
        limits:
          cpu: '2'
          memory: 4Gi
          nvidia.com/gpualloc: 1     # Request 1 physical GPU (required)
          nvidia.com/gpucores: "50"  # Request 50% of the compute resources per GPU (optional)
          nvidia.com/gpumem: 8000    # Request 8000MB of video memory per GPU (optional)

Verify MonitorDashboard

After the HAMi vgpu service has been running for a while, navigate to Administrator -> Operations Center -> Monitor -> Dashboards page and switch to the HAMi GPU Monitoring panel under Hami. You will see the relevant chart data.

Verify Hami-WebUI

After HAMi-WebUI components have been running for a while, access http://{business cluster node IP}:NodePort in your browser.