logo
Alauda AI
English
Русский
English
Русский
logo
Alauda AI
Navigation

Overview

Introduction
Quick Start
Release Notes

Install

Pre-installation Configuration
Install Alauda AI Essentials
Install Alauda AI

Upgrade

Upgrade from AI 1.3

Uninstall

Uninstall

Infrastructure Management

Device Management

About Alauda Build of Hami
About Alauda Build of NVIDIA GPU Device Plugin

Multi-Tenant

Guides

Namespace Management

Workbench

Overview

Introduction
Install
Upgrade

How To

Create WorkspaceKind
Create Workbench

Model Deployment & Inference

Overview

Introduction
Features

Inference Service

Introduction

Guides

Inference Service

How To

Extend Inference Runtimes
Configure External Access for Inference Services
Configure Scaling for Inference Services

Troubleshooting

Experiencing Inference Service Timeouts with MLServer Runtime
Inference Service Fails to Enter Running State

Model Management

Introduction

Guides

Model Repository

Monitoring & Ops

Overview

Introduction
Features Overview

Logging & Tracing

Introduction

Guides

Logging

Resource Monitoring

Introduction

Guides

Resource Monitoring

API Reference

Introduction

Kubernetes APIs

Inference Service APIs

ClusterServingRuntime [serving.kserve.io/v1alpha1]
InferenceService [serving.kserve.io/v1beta1]

Workbench APIs

Workspace Kind [kubeflow.org/v1beta1]
Workspace [kubeflow.org/v1beta1]

Manage APIs

AmlNamespace [manage.aml.dev/v1alpha1]

Operator APIs

AmlCluster [amlclusters.aml.dev/v1alpha1]
Glossary

Glossary#

Previous PageAmlCluster [amlclusters.aml.dev/v1alpha1]
nameDescription
Large Language ModelLLM (Large Language Model) is an AI model trained on massive text data, capable of understanding and generating natural language, with a parameter scale typically ranging from billions to hundreds of billions.
Inference ServiceInference Service refers to a service in the field of machine learning or deep learning that provides high-performance, scalable prediction or inference capabilities for trained models.
Inference RuntimesInference Runtime provides a high-performance container environment for model-based inference services, optimizing resource utilization, accelerating the inference process, and reducing latency.
AI AgentAI Agent is an AI entity that perceives its environment, makes decisions, and executes tasks autonomously, featuring autonomy, adaptability, and goal-orientation.
Text GenerationText Generation refers to the process of automatically generating coherent and meaningful text content using Natural Language Processing (NLP) techniques, based on given inputs such as context, prompts, or rules.
Text ClassificationText Classification is the process of assigning text data to predefined categories or labels, typically achieved through machine learning or deep learning models, and used in various applications such as information retrieval, sentiment analysis, and spam detection.
Text-to-ImageText-to-Image is the process of automatically generating images from input text descriptions using AI technology that combines NLP and computer vision to convert text into visual content.
Virtual GPUVirtual GPU (vGPU) is a technology that leverages virtualization to split and allocate physical GPU resources to multiple virtual machines, enabling shared and efficient utilization of graphics processing capabilities.
Physical GPUPhysical GPU (pGPU) refers to the technology that directly attaches an entire physical GPU card on the host machine to a virtual machine for exclusive access to GPU resources.