-
Direct Model Deployment for Inference Services
- Enables users to directly select specific model versions from the repository and specify the inference runtime image for rapid online service deployment. The system automatically downloads, caches, and loads the model, then starts the inference service. This streamlines the deployment process and reduces operational complexity.
-
Custom Image Deployment for Inference Services
- Supports users in writing Dockerfiles to package models and their dependencies into custom images, and then deploy inference services through standard Kubernetes Deployments. This approach provides greater flexibility, allowing users to customize the inference environment according to their needs.
-
Inference Service Experience
- Supports batch operations on multiple inference services, such as batch starting, stopping, updating, and deleting.
- Able to support the creation, monitoring, and result export of batch inference tasks.
- Provides batch resource management, which can allocate and adjust the resources of inference services in batches.