Model Repository Creation & Deletion
Supports creating 'private' type model repositories, and users can delete these repositories. Only administrators can create 'shared' type model repositories, and only administrators can delete them.
Model Version Management
Supports version control for models, including creating Tags and branches. It also allows submitting Commits to modify files in existing branches.
Custom Inference Service Deployment
Enables deploying any model from the model repository as an inference service by customizing parameters.
Template-Based Inference Service Deployment
Users can define inference service templates and use them to create inference services.
Dynamic Scaling of Inference Services
Supports automatic scaling of replicas based on traffic volume. Allows Serverless configuration to reduce replicas to 0 during idle periods (releasing GPU resources) and automatically scale up when traffic resumes.
Inference Runtimes
Pre-installed with common inference runtimes: vllm-cpu
, vllm-gpu
, mlserver-cpu
, mlserver-gpu
, etc. Also supports custom third-party runtimes.
Inference Experience
After deployment, users can experience inference services via the UI. Currently supports experiencing for three categories: 'Text Generation', 'Text Classification', and 'Image Generation'.
Inference Service Observability
Provides monitoring and log viewing, including:
Batch Operations for Inference Services
Enables bulk operations for managing multiple inference services, including 'Batch Start', 'Batch Stop', and 'Batch Delete'.