Release Notes

v2.8.1

Support Huawei Ascend NPU sharing
Support CDI (Container Device Interface) mode on NVIDIA devices
Sync with NVIDIA k8s-device-plugin v0.18.0
Add hami_build_info Prometheus metrics and version print
Watch and hot reload TLS certificates without restarting pods
Support NVIDIA GPU Operator toolkit readiness check
Support GPUDirect RDMA copy (GDRCopy) and GPUDirect Storage (GDS) configuration
Support mock device plugin for testing environments
HAMi-WebUI upgraded to v1.10.0 for Hami v2.8 compatibility; v1.10.0 is compatible with Hami v2.7 and v2.8, while v1.5.0 is not compatible with Hami v2.8

Fix: HAMi-core updated to fix vLLM-related issues
Fix: Quota calculation error
Fix: MIG instance allocation error where scheduler was allocating incorrect MIG instances
Fix: nvidia-mig-parted upgraded to v0.12.2 for security fixes
Fix: After removing the device plugin from a GPU node, it could still appear
Fix: Concurrent map read/write errors
Fix: Device-NUMA acquisition logic
Fix: ClusterRoleBinding error when changing release name or chart name

Fix: After removing the device plugin from the gpu node, it can still be scheduled in the node.

Optimize scheduler log
Support enflame gcu-share
Support metax GPU and metax sGPU
Helm chart add checksum annotation for restarting hami component after ConfigMap modification
Support for using RuntimeClass with nvidia devices
Add support for profiling via net/http/pprof package
Add nvidia gpu topoloy score registry to node
Feat: vGPUmonitor support MigInfo metrics