Introduction

Resource Monitoring Introduction

Resource Monitoring is a critical component of the Kubernetes Hardware Accelerator Suite, designed to provide comprehensive visibility into GPU resource utilization across your containerized workloads. This module delivers real-time metrics and historical data analysis for both compute utilization and GPU memory consumption at two fundamental levels:

Resource Monitoring is a critical component of the Kubernetes Hardware Accelerator Suite, designed to provide comprehensive visibility into GPU resource utilization across your containerized workloads. This module delivers both compute utilization and GPU memory consumption at two fundamental levels:

  • Node-Level Monitoring: Track aggregate GPU resource usage across entire Kubernetes nodes
  • Pod-Level Monitoring: Analyze per-workload GPU consumption with pod granularity

Integrated with the platform's core accelerator modules (pGPU/vGPU(GPU-Manager)/MPS), this monitoring solution enables users to optimize GPU allocation, enforce resource quotas, and troubleshoot performance bottlenecks in AI/ML workloads, real-time inference services, etc.

Advantages

The core advantages of Resource Monitoring are as follows:

  • Multi-Dimensional Observability

    Simultaneously monitor both compute units (CUDA cores) and memory utilization across physical/virtual GPUs, providing holistic insights into accelerator usage patterns.

  • Hierarchical Metrics Collection

    Capture data at both node and pod granularity, enabling correlation between cluster-wide resource trends and individual workload demands.

  • Native Integration

    Seamlessly works with all accelerator modules (pGPU/vGPU/MPS) without requiring additional agents, leveraging Kubernetes-native metrics pipelines.

  • Historical Analysis

    Store GPU metrics with configurable retention periods (default 7 days) for capacity planning and usage pattern analysis through integrated visualization tools.

Application Scenarios

The main application scenarios for Resource Monitoring are as follows:

  • Performance Optimization

    Identify underutilized GPUs in training clusters and right-size resource requests for deep learning workloads. For example, detect pods consistently using <30% of allocated GPU memory to optimize memory allocations.

  • Multi-Tenant Governance

    Enforce GPU quota compliance in shared environments by monitoring vGPU consumption across teams. Track cumulative usage against allocated quotas in AI platform deployments.

  • Cost Attribution

    Generate per-namespace GPU utilization reports for chargeback/showback models in enterprise Kubernetes environments, correlating pod-level metrics with organizational units.

  • Fault Diagnosis

    Investigate OOM (Out-of-Memory) incidents in GPU-accelerated workloads by analyzing memory usage trends preceding container crashes. Cross-reference with Kubernetes events for root cause analysis.

  • Capacity Planning

    Analyze historical GPU utilization patterns (e.g., peak compute demand periods) to inform infrastructure scaling decisions and budget allocations for AI infrastructure.

Usage Limitations

When using Resource Monitoring, please note the following constraints:

  • Module Dependencies
    • Requires at least one accelerator module (pGPU/vGPU/MPS) to be deployed in the cluster