Features Overview
TOC
Model Repository
-
Model Repository Creation & Deletion
Supports creating 'private' type model repositories, and users can delete these repositories. Only administrators can create 'shared' type model repositories, and only administrators can delete them.
-
Model Version Management
Supports version control for models, including creating Tags and branches. It also allows submitting Commits to modify files in existing branches.
Inference Services
-
Custom Inference Service Deployment
Enables deploying any model from the model repository as an inference service by customizing parameters.
-
Template-Based Inference Service Deployment
Users can define inference service templates and use them to create inference services.
-
Dynamic Scaling of Inference Services
Supports automatic scaling of replicas based on traffic volume. Allows Serverless configuration to reduce replicas to 0 during idle periods (releasing GPU resources) and automatically scale up when traffic resumes.
-
Inference Runtimes
Pre-installed with common inference runtimes:
vllm-cpu,vllm-gpu,mlserver-cpu,mlserver-gpu, etc. Also supports custom third-party runtimes. -
Inference Experience
After deployment, users can experience inference services via the UI. Currently supports experiencing for three categories: 'Text Generation', 'Text Classification', and 'Image Generation'.
-
Inference Service Observability
Provides monitoring and log viewing, including:
- Resource monitoring
- Computing power monitoring
- Business metric monitoring (e.g., tokens).
-
Batch Operations for Inference Services
Enables bulk operations for managing multiple inference services, including 'Batch Start', 'Batch Stop', and 'Batch Delete'.