English

English

Overview

Install

Pre-installation Configuration

Install Alauda AI Essentials

Install Alauda AI

Upgrade

Upgrade from AI 1.3

Uninstall

Infrastructure Management

Device Management

About Alauda Build of Hami

About Alauda Build of NVIDIA GPU Device Plugin

Multi-Tenant

Guides

Namespace Management

Workbench

Overview

How To

Create WorkspaceKind

Create Workbench

Model Deployment & Inference

Overview

Inference Service

Guides

Inference Service

How To

Extend Inference Runtimes

Configure External Access for Inference Services

Configure Scaling for Inference Services

Troubleshooting

Experiencing Inference Service Timeouts with MLServer Runtime

Inference Service Fails to Enter Running State

Model Management

Guides

Model Repository

Monitoring & Ops

Overview

Features Overview

Logging & Tracing

Guides

Resource Monitoring

Guides

Resource Monitoring

API Reference

Kubernetes APIs

Inference Service APIs

ClusterServingRuntime [serving.kserve.io/v1alpha1]

InferenceService [serving.kserve.io/v1beta1]

Workbench APIs

Workspace Kind [kubeflow.org/v1beta1]

Workspace [kubeflow.org/v1beta1]

Manage APIs

AmlNamespace [manage.aml.dev/v1alpha1]

Operator APIs

AmlCluster [amlclusters.aml.dev/v1alpha1]

Previous PageIntroduction

Next PageInference Service

Features

TOC

Model Management

Git-based Model Repository
A complete Git-managed storage solution supporting:
- Repository Management: Create/delete repos with metadata (name/description/visibility) and dependency checks
- File Operations: Web UI upload for small files + CLI/Git LFS for large files (e.g., *.h5, *.bin)
- Version Control: Full Git capabilities including:
  - Branching (e.g., main/experimental)
  - Tagging (e.g., v1.0)
  - Automatic metadata sync from README.md
MLOps Integration
Seamless workflow connections:
- One-click deployment to inference services

Inference Service

Direct Model Deployment for Inference Services
- Enables users to directly select specific model versions from the repository and specify the inference runtime image for rapid online service deployment. The system automatically downloads, caches, and loads the model, then starts the inference service. This streamlines the deployment process and reduces operational complexity.
Custom Image Deployment for Inference Services
- Supports users in writing Dockerfiles to package models and their dependencies into custom images, and then deploy inference services through standard Kubernetes Deployments. This approach provides greater flexibility, allowing users to customize the inference environment according to their needs.
Inference Service Experience
- Supports batch operations on multiple inference services, such as batch starting, stopping, updating, and deleting.
- Able to support the creation, monitoring, and result export of batch inference tasks.
- Provides batch resource management, which can allocate and adjust the resources of inference services in batches.