English

Introduction

The Inference Service is a core feature of the Alauda AI platform, dedicated to efficiently deploying LLM models as online inference services, supporting various invocation methods such as HTTP API and gRPC. Through the Inference Service, users can rapidly build LLM applications and provide stable, high-performance LLM capabilities externally.

WARNING

Running the built-in runtime in the container requires root privileges. Please ensure it is used in a trusted environment and follow your security policies.

Core Advantages

Rapid Model Deployment:
- Supports direct deployment of inference services from the Model Repository, simplifying deployment steps.
- Support user defined docker images, to deploy complex user defined inference services.
Multi-Framework Runtime Support:
- Integrates mainstream inference runtimes such as Seldon MLServer and vLLM, supporting various model frameworks and meeting the deployment needs of different models.
Visual Inference Demonstration:
- Provides visual "inference demonstration" features for common task types, allowing users to quickly verify inference results.
Flexible Invocation Methods:
- Supports various invocation methods such as HTTP API and gRPC, allowing users to invoke LLM capabilities in different application scenarios.

Application Scenarios

Online LLM Applications:
- Deploying LLM models as online services to provide LLM capabilities externally.
Real-Time Inference:
- Supports real-time inference scenarios, meeting the needs of applications with high response speed requirements.
Batch Inference:
- Supports batch inference, performing inference calculations on large datasets.
Application Integration:
- Integrating LLM capabilities into existing applications through APIs.

Guides

Guides

Troubleshooting

how_to

Guides

Guides

Guides

Manage APIs

Operator APIs

Inference Service APIs

Introduction

TOC

Core Advantages

Application Scenarios

Guides

Guides

Troubleshooting

how_to

Guides

Guides

Guides

Manage APIs

Operator APIs

Inference Service APIs

#Introduction

#TOC

#Core Advantages

#Application Scenarios

Introduction

TOC

Core Advantages

Application Scenarios