logo
Alauda AI
English
Русский
English
Русский
logo
Alauda AI
Navigation

Overview

Introduction
Quick Start
Release Notes

Install

Pre-installation Configuration
Install Alauda AI Essentials
Install Alauda AI

Upgrade

Upgrade from AI 1.3

Uninstall

Uninstall

Infrastructure Management

Device Management

About Alauda Build of Hami
About Alauda Build of NVIDIA GPU Device Plugin

Multi-Tenant

Guides

Namespace Management

Workbench

Overview

Introduction
Install
Upgrade

How To

Create WorkspaceKind
Create Workbench

Model Deployment & Inference

Overview

Introduction
Features

Inference Service

Introduction

Guides

Inference Service

How To

Extend Inference Runtimes
Configure External Access for Inference Services
Configure Scaling for Inference Services

Troubleshooting

Experiencing Inference Service Timeouts with MLServer Runtime
Inference Service Fails to Enter Running State

Model Management

Introduction

Guides

Model Repository

Monitoring & Ops

Overview

Introduction
Features Overview

Logging & Tracing

Introduction

Guides

Logging

Resource Monitoring

Introduction

Guides

Resource Monitoring

API Reference

Introduction

Kubernetes APIs

Inference Service APIs

ClusterServingRuntime [serving.kserve.io/v1alpha1]
InferenceService [serving.kserve.io/v1beta1]

Workbench APIs

Workspace Kind [kubeflow.org/v1beta1]
Workspace [kubeflow.org/v1beta1]

Manage APIs

AmlNamespace [manage.aml.dev/v1alpha1]

Operator APIs

AmlCluster [amlclusters.aml.dev/v1alpha1]
Glossary
Previous PageQuick Start
Next PageInstall

#Release Notes

#TOC

#AI 1.4.0

#New and Optimized Features

#Inference Services Now Offer "Standard" and "Advanced" Modes

When creating an inference service, users can now select either "Standard Mode" or "Advanced Mode," with "Standard Mode" being the default option.

  • Standard Mode: Ready to use out of the box. After successful deployment of Alauda AI, users can directly create an inference service in "Standard Mode".
  • Advanced Mode: Requires manual deployment of the "Alauda AI Model Serving" plugin. Supports serverless mode and enables the creation of inference services that can scale down to zero. This mode relies on Istio and provides additional monitoring metrics for inference services, such as "Traffic","QPS" and "Response time."

#Customizable Monitoring Dashboards

A new "Dashboards" feature has been introduced, allowing users to customize and add dashboard charts according to their needs. For example, projects using GPUs from vendors other than NVIDIA can add customized dashboards provided by the manufacturers.

#Workbench Plugin

The new "Alauda AI Workbench" plugin is available for installation, providing users with IDE environments such as Jupyter Notebook and VS Code. This plugin replaces the "advanced" capabilities of the previous version and streamlines unnecessary components and some functions originally found in Kubeflow.

#Kubeflow Solution

A native Kubeflow solution has been launched to meet the needs of project clients who are accustomed to using the native capabilities of the Kubeflow community.

#Multi-Node Multi-GPU Solution

A multi-node multi-GPU solution has been introduced to cater to users' requirements for deploying models with large parameter counts.

#Notebook-Based Pre-training and Fine-tuning Solutions

Notebook-based solutions for model pre-training and fine-tuning have been launched to support users in optimizing their models.

#Inference Service Authentication Solution

An inference service authentication solution based on Enovy AI Gateway has been introduced, supporting the creation of API Keys for inference services to enhance permission control capabilities.

#Enhanced Inference Service Logging

The logging functionality for inference services has been enhanced, including features such as automatic log updates, pause updates, and container switching, maintaining consistency with Alauda Container Platform capabilities.

#Deprecated Features

#Downgrading mlserver Inference Runtime to a Solution

Due to limited use cases and its impact on the user experience of large model inference services, the mlserver inference runtime has been downgraded to a solution. It is no longer included in the product by default, but a solution is provided to support specific scenarios, such as Small Language Model inference.

#Discontinuation of the Apps Feature

Both the Apps feature and Dify are positioned as AI Agent development capabilities, with Dify offering a simpler development approach through its low-code capabilities. In contrast, the pure customization and from-scratch development approach of the Apps feature is less convenient. Therefore, the Apps feature has been discontinued. Projects that require pure custom development of AI Agents can be accommodated through alternative solutions.

#Discontinuation of the Model Upload UI Feature

There are two ways to upload models: via git push command-line or through the UI. Command-line uploads offer better performance and faster speeds. Although the UI upload is user-friendly, it tends to freeze when dealing with large model files, which are typically several hundred GB in size. Therefore, the UI upload feature has been discontinued. To facilitate user access, a documentation link has been added in place of the original feature, allowing users to quickly navigate to the user manual for operation commands.

#Fixed Issues

No issues in this release.

#Known Issues

  • Modifying library_name in Gitlab by directly editing the readme file does not synchronize the model type change on the page.
    Temporary solution: Use UI operation to modify the library_name to avoid direct operation in Gitlab.