Features Overview

Model Repository

Model Repository Creation & Deletion

Supports creating 'private' type model repositories, and users can delete these repositories. Only administrators can create 'shared' type model repositories, and only administrators can delete them.
Model Version Management

Supports version control for models, including creating Tags and branches. It also allows submitting Commits to modify files in existing branches.

Custom Inference Service Deployment

Enables deploying any model from the model repository as an inference service by customizing parameters.
Template-Based Inference Service Deployment

Users can define inference service templates and use them to create inference services.
Dynamic Scaling of Inference Services

Supports automatic scaling of replicas based on traffic volume. Allows Serverless configuration to reduce replicas to 0 during idle periods (releasing GPU resources) and automatically scale up when traffic resumes.
Inference Runtimes

Pre-installed with common inference runtimes: vllm-cpu, vllm-gpu, mlserver-cpu, mlserver-gpu, etc. Also supports custom third-party runtimes.
Inference Experience

After deployment, users can experience inference services via the UI. Currently supports experiencing for three categories: 'Text Generation', 'Text Classification', and 'Image Generation'.
Inference Service Observability

Provides monitoring and log viewing, including:
- Resource monitoring
- Computing power monitoring
- Business metric monitoring (e.g., tokens).
Batch Operations for Inference Services

Enables bulk operations for managing multiple inference services, including 'Batch Start', 'Batch Stop', and 'Batch Delete'.