logo
Alauda AI
English
Русский
English
Русский
logo
Alauda AI
Navigation

Overview

Introduction
Quick Start
Release Notes

Install

Pre-installation Configuration
Install Alauda AI Essentials
Install Alauda AI

Upgrade

Upgrade from AI 1.3

Uninstall

Uninstall

Infrastructure Management

Device Management

About Alauda Build of Hami
About Alauda Build of NVIDIA GPU Device Plugin

Multi-Tenant

Guides

Namespace Management

Workbench

Overview

Introduction
Install
Upgrade

How To

Create WorkspaceKind
Create Workbench

Model Deployment & Inference

Overview

Introduction
Features

Inference Service

Introduction

Guides

Inference Service

How To

Extend Inference Runtimes
Configure External Access for Inference Services
Configure Scaling for Inference Services

Troubleshooting

Experiencing Inference Service Timeouts with MLServer Runtime
Inference Service Fails to Enter Running State

Model Management

Introduction

Guides

Model Repository

Monitoring & Ops

Overview

Introduction
Features Overview

Logging & Tracing

Introduction

Guides

Logging

Resource Monitoring

Introduction

Guides

Resource Monitoring

API Reference

Introduction

Kubernetes APIs

Inference Service APIs

ClusterServingRuntime [serving.kserve.io/v1alpha1]
InferenceService [serving.kserve.io/v1beta1]

Workbench APIs

Workspace Kind [kubeflow.org/v1beta1]
Workspace [kubeflow.org/v1beta1]

Manage APIs

AmlNamespace [manage.aml.dev/v1alpha1]

Operator APIs

AmlCluster [amlclusters.aml.dev/v1alpha1]
Glossary

Quick Start#

Previous PageIntroduction
Next PageRelease Notes

This document aims to help new users quickly understand how to deploy inference services in Alauda AI. By deploying a simple "text generation" inference service and experiencing it, you can quickly grasp the main features and usage methods of the platform.

#TOC

#Estimated Reading Time

It is estimated that completing the reading and operations in this document will take approximately 20 minutes.

#Notes

This document only demonstrates the basic process. For detailed parameter configurations, please refer to the complete documentation.

#Prerequisites

  • You already have a platform administrator account (used to create and manage namespaces).
  • You have prepared the model file to be deployed (you can download it in advance from websites such as Hugging Face or ModelScope).
  • If you need to use GPU inference, please ensure that the GPU plugin is installed. If not, please install the GPU plugin in the platform management plugin center.
  • You understand the basic concepts of Kubernetes and machine learning models.

#Step Overview

StepOperationDescriptionNotes
1Create NamespaceCreate a namespace in the container platform and configure relevant roles for Alauda AI for the userSkip this step if you already have a namespace and have assigned user permissions
2Manage NamespaceInclude the namespace in Alauda AI managementSkip this step if the namespace is already managed
3Upload ModelUpload the model file to the model repositorySkip this step if you have already uploaded the model or are using a platform-shared model
4Publish Inference ServicePublish the model as an online inference service
5Invoke Inference ServiceInvoke the inference service via API or the "Experience" feature

#Operation Steps

#Step 1: Create Namespace and Assign Permissions to User

Note:Skip this step if you already have a namespace and have assigned user permissions

Namespaces are the foundation for multi-tenant isolation in Alauda AI, and each project should use an independent namespace.

  1. Log in to the container platform as an administrator.
  2. Go to Project Management, select or create a project.
  3. On the project details page, click Namespace.
  4. Click Create Namespace and enter a name (e.g., "text-classification-demo").
  5. Click Create to complete the namespace creation.
  6. Assign namespace permissions to the user:
    • Go to Administrator > Users > Users.
    • Create a user or select an existing user who needs to use this namespace.
    • Click Configure Roles > Add Role.
    • Add Alauda AI Roles and associate them with the created namespace and the project to which the namespace belongs.
      • aml-namespace-editor: Used by namespace developers, with permissions to create, delete, modify, and query models and inference services.
      • aml-namespace-owner: Used by namespace managers.
      • aml-namespace-viewer: Can only view models, inference services, and other resources.

#Step 2: Manage Namespace

Note:Skip this step if the namespace is already managed

Include the created namespace in Alauda AI management:

  1. Enter Alauda AI, select Admin in the top navigation, and select the cluster where the newly created namespace is located in "Clusters" on the right side of Admin.
  2. Click Namespace Manage in the left navigation bar and click the Management Namespace button.
  3. Select the newly created "text-classification-demo" namespace in the pop-up dialog box.
  4. Click Management to complete the management operation.

#Step 3: Upload Model

Note:Skip this step if you have already uploaded the model or are using a platform-shared model

Upload the text classification model to the model repository:

  1. Enter Alauda AI, select Business view in the top navigation, and select the managed namespace from the previous step.
  2. Click Model Repository in the left navigation bar, click Create Model Repository, and enter the prepared model name, such as "Meta-Llama-3-8B-Instruct".
  3. To complete model uploading, refer to the Create Model Repository.
  4. In the File Management tab, click Update metadata and select the correct "Task Type" and "Framework" according to the attributes of the large model.
    • Task Type: It is an attribute of the model itself and can be obtained by viewing the label on the model download details page. It is divided into "Text Generation", "Image Generation", etc.
    • Framework: It is also an attribute of the model itself and can be obtained by viewing the label on the model download details page. It is divided into "Transformers", "MLflow", etc. Most popular open-source Large Language Models are of the "Transformers" type.

#Step 4: Publish Inference Service

Publish the model as an online inference service:

  1. On the model details page, click Publish inference API > Custom publishing.
  2. Configure service parameters:
    • Name: meta-llama-3-8b-service
    • Model: Meta-Llama-3-8B-Instruct
    • Version: Branch-main
    • Inference Runtimes: Needs to be selected based on the cuda version installed in the GPU node. For example,if cuda12.6 or later is installed, select "vllm-cuda12.6-x86".
    • Resource Requests: 2CPU/20Gi Memory
    • Resource Limits: 2CPU/20Gi Memory
    • GPU Acceleration: HAMi NVIDIA
      • gpu number: 1
      • vgpu cores: 50
      • GPU vmemory: 23552
    • Storage: Mount existing PVC/Capacity 10Gi
    • Auto Scaling: Off
    • Number of instances: 1
  3. Click Publish and wait for the service to start.
  4. View the service status on the Inference Services page.

#Step 5: Invoke Inference Service

Test the published inference service:

  1. Click Inference Services in the left navigation bar, click the name of the "Published Inference Service", and click Experience on the inference service details page.
  2. Enter the test text, such as "Recommend a few good books".
  3. View the generated text and generation parameters returned by the model.