Features Overview

Model Repository

  • Model Repository Creation & Deletion

    Supports creating 'private' type model repositories, and users can delete these repositories. Only administrators can create 'shared' type model repositories, and only administrators can delete them.

  • Model Version Management

    Supports version control for models, including creating Tags and branches. It also allows submitting Commits to modify files in existing branches.

Inference Services

  • Custom Inference Service Deployment

    Enables deploying any model from the model repository as an inference service by customizing parameters.

  • Template-Based Inference Service Deployment

    Users can define inference service templates and use them to create inference services.

  • Dynamic Scaling of Inference Services

    Supports automatic scaling of replicas based on traffic volume. Allows Serverless configuration to reduce replicas to 0 during idle periods (releasing GPU resources) and automatically scale up when traffic resumes.

  • Inference Runtimes

    Pre-installed with common inference runtimes: vllm-cpu, vllm-gpu, mlserver-cpu, mlserver-gpu, etc. Also supports custom third-party runtimes.

  • Inference Experience

    After deployment, users can experience inference services via the UI. Currently supports experiencing for three categories: 'Text Generation', 'Text Classification', and 'Image Generation'.

  • Inference Service Observability

    Provides monitoring and log viewing, including:

    • Resource monitoring
    • Computing power monitoring
    • Business metric monitoring (e.g., tokens).
  • Batch Operations for Inference Services

    Enables bulk operations for managing multiple inference services, including 'Batch Start', 'Batch Stop', and 'Batch Delete'.