logo
Alauda AI
English
Русский
English
Русский
logo
Alauda AI
Navigation

Overview

Introduction
Quick Start
Release Notes

Install

Pre-installation Configuration
Install Alauda AI Essentials
Install Alauda AI

Upgrade

Upgrade from AI 1.3

Uninstall

Uninstall

Infrastructure Management

Device Management

About Alauda Build of Hami
About Alauda Build of NVIDIA GPU Device Plugin

Multi-Tenant

Guides

Namespace Management

Workbench

Overview

Introduction
Install
Upgrade

How To

Create WorkspaceKind
Create Workbench

Model Deployment & Inference

Overview

Introduction
Features

Inference Service

Introduction

Guides

Inference Service

How To

Extend Inference Runtimes
Configure External Access for Inference Services
Configure Scaling for Inference Services

Troubleshooting

Experiencing Inference Service Timeouts with MLServer Runtime
Inference Service Fails to Enter Running State

Model Management

Introduction

Guides

Model Repository

Monitoring & Ops

Overview

Introduction
Features Overview

Logging & Tracing

Introduction

Guides

Logging

Resource Monitoring

Introduction

Guides

Resource Monitoring

API Reference

Introduction

Kubernetes APIs

Inference Service APIs

ClusterServingRuntime [serving.kserve.io/v1alpha1]
InferenceService [serving.kserve.io/v1beta1]

Workbench APIs

Workspace Kind [kubeflow.org/v1beta1]
Workspace [kubeflow.org/v1beta1]

Manage APIs

AmlNamespace [manage.aml.dev/v1alpha1]

Operator APIs

AmlCluster [amlclusters.aml.dev/v1alpha1]
Glossary
Previous PageIntroduction
Next PageInference Service

#Features

#TOC

#Model Management

  • Git-based Model Repository
    A complete Git-managed storage solution supporting:

    • Repository Management: Create/delete repos with metadata (name/description/visibility) and dependency checks
    • File Operations: Web UI upload for small files + CLI/Git LFS for large files (e.g., *.h5, *.bin)
    • Version Control: Full Git capabilities including:
      • Branching (e.g., main/experimental)
      • Tagging (e.g., v1.0)
      • Automatic metadata sync from README.md
  • MLOps Integration
    Seamless workflow connections:

    • One-click deployment to inference services

#Inference Service

  • Direct Model Deployment for Inference Services

    • Enables users to directly select specific model versions from the repository and specify the inference runtime image for rapid online service deployment. The system automatically downloads, caches, and loads the model, then starts the inference service. This streamlines the deployment process and reduces operational complexity.
  • Custom Image Deployment for Inference Services

    • Supports users in writing Dockerfiles to package models and their dependencies into custom images, and then deploy inference services through standard Kubernetes Deployments. This approach provides greater flexibility, allowing users to customize the inference environment according to their needs.
  • Inference Service Experience

    • Supports batch operations on multiple inference services, such as batch starting, stopping, updating, and deleting.
    • Able to support the creation, monitoring, and result export of batch inference tasks.
    • Provides batch resource management, which can allocate and adjust the resources of inference services in batches.