How to Define a vGPU(Hami) Cost Model

TOC

Prerequisites

In the GPU cluster:

  • Alauda Build of Hami installed
  • The Cost Management Agent installed

About Alauda Build of Hami

Heterogeneous AI Computing Virtualization Middleware (HAMi), formerly known as k8s-vGPU-scheduler, is an "all-in-one" chart designed to manage Heterogeneous AI Computing Devices in a k8s cluster. It can provide the ability to share Heterogeneous AI devices among tasks.

Note
Because Alauda Build of Hami releases on a different cadence from Alauda Container Platform, the Alauda Build of Hami documentation is now available as a separate documentation set at .

Procedure

Create PrometheusRule for generate needed metrics

Create a PrometheusRule in the Hami cluster.

apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
  labels:
    prometheus: kube-prometheus
  name: hami-gpu-labels
  namespace: kube-system
spec:
  groups:
    - name: hami-gpu-labels.rules
      rules:
        - expr: |
            min by (podnamespace, deviceuuid, label_modelName, label_device) (
              vGPUCorePercentage
              * on (deviceuuid) group_left(label_modelName, label_device) (
                label_replace(
                  label_replace(
                    label_replace(
                      DCGM_FI_DEV_SM_CLOCK,
                      "deviceuuid", "$1", "UUID", "(.*)"
                    ),
                    "label_modelName", "$1", "modelName", "(.*)"
                  ),
                  "label_device", "$1", "device", "([a-zA-Z]+)[0-9]+$"
                )
              )
            )
          record: vgpu_core_labels

Add Collection Config (Cost Management Agent)

Create some ConfigMap in the Hami cluster where the Cost Management Agent runs to declare what to collect.

Note: The configmap of project quota is only supported in hami 2.7+

apiVersion: v1
data:
  config: >
    - kind: vGPU
      category: vGPUCore
      item: vGPUCoreQuota
      period: Hourly
      labels:
        query: "vgpu_core_labels{}"
        mappers:
          name: deviceuuid
          namespace: podnamespace
          cluster: ""
          project: ""
      usage:
        query: sum by (deviceuuid,podnamespace) (avg_over_time(vGPUCorePercentage{}[5m]))
        step: 5m
        mappers:
          name: deviceuuid
          namespace: podnamespace
          cluster: ""
          project: ""
    - kind: vGPU
      category: vGPUMemory
      item: vGPURamBytesQuota
      period: Hourly
      labels:
        query: "vgpu_core_labels{}"
        mappers:
          name: deviceuuid
          namespace: podnamespace
          cluster: ""
          project: ""
      usage:
        query: sum by (deviceuuid,podnamespace) (avg_over_time(vGPU_device_memory_limit_in_bytes{}[5m]))
        step: 5m
        mappers:
          name: deviceuuid
          namespace: podnamespace
          cluster: ""
          project: ""
    - kind: vGPU
      category: vGPUCore
      item: vGPUCoreUsed
      period: Hourly
      labels:
        query: "vgpu_core_labels{}"
        mappers:
          name: deviceuuid
          namespace: podnamespace
          cluster: ""
          project: ""
      usage:
        query: sum by (deviceuuid,podnamespace) (avg_over_time(Device_utilization_desc_of_container{}[5m]))
        step: 5m
        mappers:
          name: deviceuuid
          namespace: podnamespace
          cluster: ""
          project: ""
    - kind: vGPU
      category: vGPUMemory
      item: vGPURamBytesUsed
      period: Hourly
      labels:
        query: "vgpu_core_labels{}"
        mappers:
          name: deviceuuid
          namespace: podnamespace
          cluster: ""
          project: ""
      usage:
        query: sum by (deviceuuid,podnamespace) (avg_over_time(vGPU_device_memory_usage_in_bytes{}[5m]))
        step: 5m
        mappers:
          name: deviceuuid
          namespace: podnamespace
          cluster: ""
          project: ""
kind: ConfigMap
metadata:
  labels:
    cpaas.io/slark.collection.config: "true"
  name: slark-agent-vgpu-namespace-config
  namespace: cpaas-system
---
# Note: The following configmap is only supported in hami 2.7+
apiVersion: v1
data:
  config: >
    - kind: Project
      category: vGPUCore
      item: vGPUCoresProjectQuota
      period: Hourly
      usage:
        query: avg by (project, cluster) (avg_over_time(cpaas_project_resourcequota{resource="limits.nvidia.com/gpucores", type="project-hard"}[5m]))
        step: 5m
        mappers:
          name: project
          namespace: ""
          cluster: cluster
          project: project
    - kind: Project
      category: vGPUMemory
      item: vGPURamBytesProjectQuota
      period: Hourly
      usage:
        query: avg by (project, cluster) (avg_over_time(cpaas_project_resourcequota{resource="limits.nvidia.com/gpumem", type="project-hard"}[5m]))
        step: 5m
        mappers:
          name: project
          namespace: ""
          cluster: cluster
          project: project
kind: ConfigMap
metadata:
  labels:
    cpaas.io/slark.collection.config: "true"
  name: slark-agent-project-config-vgpu
  namespace: cpaas-system

After adding the yaml , you need to restart the Agent Pod to reload the configurations.

kubectl delete pods -n cpaas-system -l service_name=slark-agent

Add Display/Storage Config (Cost Management Server)

Create a ConfigMap in the cluster where the Cost Management Server runs to declare billing items, methods, units, and display names. This tells the server what and how to bill.

Note: It makes no sense to use Request Usage without enabling GPU Overcommitment Ratio. If you use Request Usage, please enable GPU Overcommitment Ratio.

apiVersion: v1
data:
  config: |
    - name: vGPUCore
      displayname:
        zh: "HAMi NVIDIA vGPU Cores"
        en: "HAMi NVIDIA vGPU Cores"
      methods:
        - name: Request
          displayname:
            zh: "请求量"
            en: "Request Usage"
          item: vGPUCoreQuota
          divisor: 1
          unit:
            zh: "core-hours"
            en: "core-hours"
        - name: Usage
          displayname:
            zh: "使用量"
            en: "Used Usage"
          item: vGPUCoreUsed
          divisor: 1
          unit:
            zh: "core-hours"
            en: "core-hours"
        - name: ProjectQuota
          displayname:
            zh: "项目配额"
            en: "Project Quota"
          item: vGPUCoresProjectQuota
          unit:
            zh: "core-hours"
            en: "core-hours"
          divisor: 1
    - name: vGPUMemory
      displayname:
        zh: "HAMi NVIDIA vGPU Memory"
        en: "HAMi NVIDIA vGPU Memory"
      methods:
        - name: Request
          displayname:
            zh: "请求量"
            en: "Request Usage"
          item: vGPURamBytesQuota
          divisor: 1073741824
          unit:
            zh: "Gi-hours"
            en: "Gi-hours"
        - name: Used
          displayname:
            zh: "使用量"
            en: "Used Usage"
          item: vGPURamBytesUsed
          divisor: 1073741824
          unit:
            zh: "Gi-hours"
            en: "Gi-hours"
        - name: ProjectQuota
          displayname:
            zh: "项目配额"
            en: "Project Quota"
          item: vGPURamBytesProjectQuota
          unit:
            zh: "Gi-hours"
            en: "Gi-hours"
          divisor: 1024 # Mi/1024
kind: ConfigMap
metadata:
  labels:
    cpaas.io/slark.display.config: "true"
  name: slark-display-config-for-vgpu
  namespace: kube-public

After adding the yaml , you need to restart the Server Pod to reload the configurations.

kubectl delete pods -n cpaas-system -l service_name=slark

Add Price For a vGPU Cost Model

If the GPU cluster does not have a Cost model, you need to create a new cost model. Then you can add price for the cost model of the GPU cluster:

Billing Method Description

Billing ItemBilling MethodBilling RulesDescription
vGPUUsage (Core-hours)Calculated on an hourly basis using the POD's AVG(Usage) over the past hour, multiplied by the actual duration of the POD (counted as 5 minutes if less than 5 minutes).Based on actual vGPU consumption
vGPURequest (Core-hours)Calculated on an hourly basis using the POD's Request over the past hour, multiplied by the actual duration of the POD (counted as 5 minutes if less than 5 minutes).Based on vGPU resource requests
vGPUProject Quota (Core-hours)Calculated on an hourly basis using the project's allocated CPU quota limit, multiplied by time duration. Segmented calculation when quota changes.Based on project-level resource quotas
vGPUMemoryUsage (GiB-hours)Calculated on an hourly basis using the POD's AVG(Usage) over the past hour, multiplied by the actual duration of the POD (counted as 5 minutes if less than 5 minutes).Based on actual vGPU memory consumption
vGPUMemoryRequest (GiB-hours)Calculated on an hourly basis using the POD's Request over the past hour, multiplied by the actual duration of the POD (counted as 5 minutes if less than 5 minutes).Based on vGPU memory resource requests
vGPUMemoryProject Quota (GiB-hours)Calculated on an hourly basis using the project's allocated memory quota limit, multiplied by time duration. Segmented calculation when quota changes.Based on project-level resource quotas

Add Price For a Cost Model

  1. Select vGPU or vGPUMemory in Billing Items.

  2. Select Request Usage (core-hours) or Used Usage (core-hours) or Project Quota (core-hours) in Method.

  3. Set Default Price.

  4. Config Price By Label (optional). Currently only two keys are supported: modelName and device

    modelName: GPU model, example "Tesla P100-PCIE-16GB" or "Tesla T4"(Got it by run nvidia-smi).

    device: GPU manufacturer, example "nvidia" or "ascend".

Cost Details and Cost Statistics

Finally, after waiting for 1 or more hours, you can see the cost details in the Cost Details with namespace and card uuid dimensions. And you can see the total costs based on cluster, project, and namespace in the Cost Statistics.