This document aims to help new users quickly understand how to deploy inference services in Alauda AI. By deploying a simple "text generation" inference service and experiencing it, you can quickly grasp the main features and usage methods of the platform.
TOC
Applicable Scenarios
- You are a new user of Alauda AI and want to quickly understand how to publish a model as a callable inference service.
- You want to understand the basic functionalities of Alauda AI through a simple example, including uploading a model, publishing an inference service, and invoking an inference service.
- You have just deployed a new Alauda AI environment and want to quickly verify if it is available.
Estimated Reading Time
It is estimated that completing the reading and operations in this document will take approximately 20 minutes.
Notes
This document only demonstrates the basic process. For detailed parameter configurations, please refer to the complete documentation.
Prerequisites
- You already have a platform administrator account (used to create and manage namespaces).
- You have prepared the model file to be deployed (you can download it in advance from websites such as Hugging Face or ModelScope).
- If you need to use GPU inference, please ensure that the GPU plugin is installed. If not, please install the GPU plugin in the platform management plugin center.
- You understand the basic concepts of Kubernetes and machine learning models.
Step Overview
Step | Operation | Description | Notes |
---|
1 | Create Namespace | Create a namespace in the container platform and configure relevant roles for Alauda AI for the user | Skip this step if you already have a namespace and have assigned user permissions |
2 | Manage Namespace | Include the namespace in Alauda AI management | Skip this step if the namespace is already managed |
3 | Upload Model | Upload the model file to the model repository | Skip this step if you have already uploaded the model or are using a platform-shared model |
4 | Publish Inference Service | Publish the model as an online inference service | |
5 | Invoke Inference Service | Invoke the inference service via API or the "Experience" feature | |
Operation Steps
Step 1: Create Namespace and Assign Permissions to User
Note:Skip this step if you already have a namespace and have assigned user permissions
Namespaces are the foundation for multi-tenant isolation in Alauda AI, and each project should use an independent namespace.
- Log in to the container platform as an administrator.
- Go to Project Management, select or create a project.
- On the project details page, click Namespace.
- Click Create Namespace and enter a name (e.g., "text-classification-demo").
- Click Create to complete the namespace creation.
- Assign namespace permissions to the user:
- Go to Administrator > Users > Users.
- Create a user or select an existing user who needs to use this namespace.
- Click Configure Roles > Add Role.
- Add Alauda AI Roles and associate them with the created namespace and the project to which the namespace belongs.
- aml-namespace-editor: Used by namespace developers, with permissions to create, delete, modify, and query models and inference services.
- aml-namespace-owner: Used by namespace managers.
- aml-namespace-viewer: Can only view models, inference services, and other resources.
Step 2: Manage Namespace
Note:Skip this step if the namespace is already managed
Include the created namespace in Alauda AI management:
- Enter Alauda AI, select Admin in the top navigation, and select the cluster where the newly created namespace is located in "Clusters" on the right side of Admin.
- Click Namespace Manage in the left navigation bar and click the Management Namespace button.
- Select the newly created "text-classification-demo" namespace in the pop-up dialog box.
- Click Management to complete the management operation.
Step 3: Upload Model
Note:Skip this step if you have already uploaded the model or are using a platform-shared model
Upload the text classification model to the model repository:
- Enter Alauda AI, select Business view in the top navigation, and select the managed namespace from the previous step.
- Click Model Repository in the left navigation bar, click Create Model Repository, and enter the prepared model name, such as "gpt2".
- After creation, enter the File Management tab on the model details page.
- Click Import Model File, and drag or select the model files/subfolders to upload. If uploading a Large Language Model, the UI may freeze due to the large file size. It is recommended to use the git push command to push large model files to the model repository.
- Click the Import button and wait for the upload to complete.
- In the File Management tab, click Update metadata and select the correct "Task Type" and "Framework" according to the attributes of the large model.
- Task Type: It is an attribute of the model itself and can be obtained by viewing the label on the model download details page. It is divided into "Text Generation", "Image Generation", etc.
- Framework: It is also an attribute of the model itself and can be obtained by viewing the label on the model download details page. It is divided into "Transformers", "MLflow", etc. Most popular open-source Large Language Models are of the "Transformers" type.
Step 4: Publish Inference Service
Publish the model as an online inference service:
- On the model details page, click Publish inference API > Custom publishing.
- Configure service parameters:
- Name: gpt2-service
- Model: gpt2
- Version: Branch-main
- Inference Runtimes: Needs to be selected based on the cuda version installed in the GPU node. For example, if the cuda11 driver is installed, select "vllm-cuda11.8-x86". If cuda12 is installed, select "vllm-cuda12.1-x86".
- Resource Requests: 1CPU/4Gi Memory
- Resource Limits: 2CPU/6Gi Memory
- GPU Acceleration: GPU Manager
- GPU vcore: 30
- GPU vmemory: 32
- Storage: Mount existing PVC/created PVC or Temporary Storage/Capacity 10Gi
- Auto Scaling: Off
- Number of instances: 1
- Click Publish and wait for the service to start.
- View the service status on the Inference Services page.
Step 5: Invoke Inference Service
Test the published inference service:
- Click Inference Services in the left navigation bar, click the name of the "Published Inference Service", and click Experience on the inference service details page.
- Enter the test text, such as "Recommend a few good books".
- View the generated text and generation parameters returned by the model.