English

Release Notes

AI 1.3.3

Fixed Issues

When publishing the inference service using the shared model template, model pull failure may occur occasionally
The GPU acceleration column in the inference service list does not correctly display the usage of extended resources. When the inference service uses extended resources, this column is incorrectly displayed as "unused".

Known Issues

Modifying library_name in Gitlab by directly editing the readme file does not synchronize the model type change on the page.
Temporary solution: Use UI operation to modify the library_name to avoid direct operation in Gitlab.

AI 1.3.2

Fixed Issues

Inference Service Issue: Service Stuck in Pending State
When deploying an inference service, it remains in a pending state for an extended period, failing to transition to a running state.

Upon inspecting the real-time events within the container platform's compute component (Deployment), the following warning message is observed:

"Error creating: pods "xxx-predictor-xxx" is forbidden: failed quota: default: must specify limits.cpu for: queue-proxy; limits.memory for: queue-proxy; requests.memory for: queue-proxy"

This error indicates that the deployment is being prevented from creating pods due to a resource quota violation. The root cause of this issue is that the queue-proxy container, which is injected by Knative, does not have default resource limits configured. As a result, the queue-proxy is missing the required CPU and memory resource specifications (limits.cpu, limits.memory, requests.memory) as enforced by the default quota on your container platform.

Known Issues

When publishing the inference service using the shared model template, model pull failure may occur occasionally
Modifying library_name in Gitlab by directly editing the readme file does not synchronize the model type change on the page.
Temporary solution: Use UI operation to modify the library_name to avoid direct operation in Gitlab.

AI 1.3.1

New and Optimized Features

Product Rebranding to "Alauda AI"

The product name has been changed to "Alauda AI" from "Alauda Machine Learning".

Fixed Issues

No issues in this release.

Known Issues

When publishing the inference service using the shared model template, model pull failure may occur occasionally
Inference Service Issue: Service Stuck in Pending State
When deploying an inference service, it remains in a pending state for an extended period, failing to transition to a running state.

Upon inspecting the real-time events within the container platform's compute component (Deployment), the following warning message is observed:

"Error creating: pods "xxx-predictor-xxx" is forbidden: failed quota: default: must specify limits.cpu for: queue-proxy; limits.memory for: queue-proxy; requests.memory for: queue-proxy"

This error indicates that the deployment is being prevented from creating pods due to a resource quota violation. The root cause of this issue is that the queue-proxy container, which is injected by Knative, does not have default resource limits configured. As a result, the queue-proxy is missing the required CPU and memory resource specifications (limits.cpu, limits.memory, requests.memory) as enforced by the default quota on your container platform.
Modifying library_name in Gitlab by directly editing the readme file does not synchronize the model type change on the page.
Temporary solution: Use UI operation to modify the library_name to avoid direct operation in Gitlab.

AI 1.3.0

New and Optimized Features

Shared Model Permission Restriction

The model repository currently supports two model types: Shared Models and Private Models. In the original design, users could perform certain management actions on shared models, posing potential permission risks.
In this release, the functionality and permissions for Private Models remain unchanged, supporting full management operations. The permissions for Shared Models are restricted and optimized as follows:

Permission Restriction: All users can only use shared models. The ability to create, edit, or delete shared models is no longer supported.
Creation Flow Adjustment: The visibility parameter is removed from the "Create Model" flow. All newly created models default to Private Models.
Feature Removal:
- The following features are removed for Shared Models:
  - Edit Tags button
  - Edit Description button
  - Create Tag button
  - Delete button
  - File Management Tab
  - Version Management Tab

New Template Publishing for Inference Services

Previously, creating inference services required manual configuration of numerous interdependent parameters. This complexity often led to errors, reducing the success rate and impacting user experience.
In this release, the Template Publishing capability is introduced, enabling users to encapsulate verified configurations as templates and rapidly publish inference services based on them.
Benefits include:

Users can create custom templates, reusing verified best practices.
Auto-population of parameter configurations reduces repetitive input and dependency errors.
Lowers the barrier to publishing large model inference services, improving success rates and efficiency.

Multi-GPU Support on a Single Node for Inference Runtime

Previously, inference services deployed on a single node only supported single-GPU mode due to resource scheduling limitations. This restricted large model inference scenarios and underutilized GPU resources.
With this upgrade, multi-GPU scheduling within a single node is now supported. A single inference service can automatically allocate multiple GPUs on the same machine, enabling larger model inference, better resource utilization, and enhanced service capability.

"Business Monitoring" for Inference Services

Inference services previously only displayed basic information. To enhance observability and enable users to quickly detect issues, monitor service health in real-time, and optimize or adjust resources proactively, the following new feature is introduced:
Monitoring Dashboard

Added as a new tab in inference services, covering three dimensions:
- Resource Monitoring: CPU usage (cores), CPU utilization (%), Memory usage (GiB), Memory utilization (%)
- Compute Monitoring: GPU usage (cores), GPU utilization (%), GPU memory usage (GiB), GPU memory utilization (%)
- Other Metrics: Response Time, Traffic (inbound/outbound data volume), QPS (Queries Per Second), Total Calls, Token Throughput (/s)

Inference Runtime Expansion

To enhance AML inference runtime support, the following new runtimes are added in this version:

vllm-cuda-11.8
vllm-cpu

Dedicated "Platform Management View"

Previously, platform management features (including Namespace management and credential management) were mixed with business functionalities in a single view, causing confusion due to mixed permission levels.
In this release:

Platform management functions are separated into an independent view, visible and operable by Administrators only.
Admins can freely switch between "Management View" and "Business View" via top navigation.
Regular users can only access the Business View, with no access to platform management features.

Auto-Configuration of GitLab Token for Namespace Onboarding

Previously, when onboarding a namespace, users had to manually configure a GitLab Token to authorize repository access.
This release optimizes the GitLab authorization process by implementing automatic GitLab Token configuration:

For each newly onboarded namespace, the platform automatically configures the GitLab Token.
No manual operation or GitLab authorization management is required by users.
Continuous access to GitLab is ensured for all managed namespaces.

Deprecated Features

Downgrade of α Features to S2 Stage

During AML platform iterations, some modules were released as α features for exploratory validation of designs and user needs.
However, due to rapid changes in large model development scenarios and evolving user requirements, some α features have design flaws or limited applicability. These features will be re-evaluated and downgraded to S2 stage for future planning.
The following features are downgraded:

Dataset: Dataset Repository, Data Labeling
Model Optimization: Task Templates, Model Fine-tuning, Pre-training
Agents: Application Repository, Dify
Advanced Features: Notebook, Storage Volumes, MLFlow, Tensorboard, Workflow, Workflow Tasks, Scheduled Tasks, AutoML
Model: Build Inference API Image

Fixed Issues

Models of type "Shared" are not accessible to users in other namespaces, causing the "Reasoning Service Experience" effect to fall back from chat-completion to text-completion.
When the number of models exceeds 100, only the first 100 data items can be retrieved due to an API error, resulting in inaccurate statistics for the number of model repositories in the "Overview Page".
The logs of the inference service can only display the logs of the first pod, making it impossible to view the logs of multiple pods.
Fix an issue where a fixed model name was used in the reasoning service call example.
When creating an inference service with gpu-manager + vllm inference runtime on a node with cuda 12.4 driver, the error "enable_eager=True or user '--enforce-eager' in the CLI" occurs.
The reason is that when the application requests the address of a cuda function based on the cuda version, gpu-manager returns the latest function version every time, for example, when cumemoryalloc exists for v1, v2, and v3. v1 exists for cuda 10, v2 for cuda 11, and v3 for cuda 12, and returning the latest every time may result in an exception for the inference service. service exception.

Known Issues

When publishing the inference service using the shared model template, model pull failure may occur occasionally
Inference Service Issue: Service Stuck in Pending State
When deploying an inference service, it remains in a pending state for an extended period, failing to transition to a running state.

Upon inspecting the real-time events within the container platform's compute component (Deployment), the following warning message is observed:

"Error creating: pods "xxx-predictor-xxx" is forbidden: failed quota: default: must specify limits.cpu for: queue-proxy; limits.memory for: queue-proxy; requests.memory for: queue-proxy"

This error indicates that the deployment is being prevented from creating pods due to a resource quota violation. The root cause of this issue is that the queue-proxy container, which is injected by Knative, does not have default resource limits configured. As a result, the queue-proxy is missing the required CPU and memory resource specifications (limits.cpu, limits.memory, requests.memory) as enforced by the default quota on your container platform.
Modifying library_name in Gitlab by directly editing the readme file does not synchronize the model type change on the page.
Temporary solution: Use UI operation to modify the library_name to avoid direct operation in Gitlab.

Guides

Guides

Troubleshooting

Guides

Guides

Guides

Manage APIs

Operator APIs

Inference Service APIs

Release Notes

TOC

AI 1.3.3

Fixed Issues

Known Issues

AI 1.3.2

Fixed Issues

Known Issues

AI 1.3.1

New and Optimized Features

Product Rebranding to "Alauda AI"

Fixed Issues

Known Issues

AI 1.3.0

New and Optimized Features

Shared Model Permission Restriction

New Template Publishing for Inference Services

Multi-GPU Support on a Single Node for Inference Runtime

"Business Monitoring" for Inference Services

Inference Runtime Expansion

Dedicated "Platform Management View"

Auto-Configuration of GitLab Token for Namespace Onboarding

Deprecated Features

Downgrade of α Features to S2 Stage

Fixed Issues

Known Issues

Guides

Guides

Troubleshooting

Guides

Guides

Guides

Manage APIs

Operator APIs

Inference Service APIs

#Release Notes

#TOC

#AI 1.3.3

#Fixed Issues

#Known Issues

#AI 1.3.2

#Fixed Issues

#Known Issues

#AI 1.3.1

#New and Optimized Features

#Product Rebranding to "Alauda AI"

#Fixed Issues

#Known Issues

#AI 1.3.0

#New and Optimized Features

#Shared Model Permission Restriction

#New Template Publishing for Inference Services

#Multi-GPU Support on a Single Node for Inference Runtime

#"Business Monitoring" for Inference Services

#Inference Runtime Expansion

#Dedicated "Platform Management View"

#Auto-Configuration of GitLab Token for Namespace Onboarding

#Deprecated Features

#Downgrade of α Features to S2 Stage

#Fixed Issues

#Known Issues

Release Notes

TOC

AI 1.3.3

Fixed Issues

Known Issues

AI 1.3.2

Fixed Issues

Known Issues

AI 1.3.1

New and Optimized Features

Product Rebranding to "Alauda AI"

Fixed Issues

Known Issues

AI 1.3.0

New and Optimized Features

Shared Model Permission Restriction

New Template Publishing for Inference Services

Multi-GPU Support on a Single Node for Inference Runtime

"Business Monitoring" for Inference Services

Inference Runtime Expansion

Dedicated "Platform Management View"

Auto-Configuration of GitLab Token for Namespace Onboarding

Deprecated Features

Downgrade of α Features to S2 Stage

Fixed Issues

Known Issues