logo
Alauda AI
English
Русский
English
Русский
logo
Alauda AI
Navigation

Overview

Introduction
Quick Start
Release Notes

Install

Pre-installation Configuration
Install Alauda AI Essentials
Install Alauda AI

Upgrade

Upgrade from AI 1.3

Uninstall

Uninstall

Infrastructure Management

Device Management

About Alauda Build of Hami
About Alauda Build of NVIDIA GPU Device Plugin

Multi-Tenant

Guides

Namespace Management

Workbench

Overview

Introduction
Install
Upgrade

How To

Create WorkspaceKind
Create Workbench

Model Deployment & Inference

Overview

Introduction
Features

Inference Service

Introduction

Guides

Inference Service

How To

Extend Inference Runtimes
Configure External Access for Inference Services
Configure Scaling for Inference Services

Troubleshooting

Experiencing Inference Service Timeouts with MLServer Runtime
Inference Service Fails to Enter Running State

Model Management

Introduction

Guides

Model Repository

Monitoring & Ops

Overview

Introduction
Features Overview

Logging & Tracing

Introduction

Guides

Logging

Resource Monitoring

Introduction

Guides

Resource Monitoring

API Reference

Introduction

Kubernetes APIs

Inference Service APIs

ClusterServingRuntime [serving.kserve.io/v1alpha1]
InferenceService [serving.kserve.io/v1beta1]

Workbench APIs

Workspace Kind [kubeflow.org/v1beta1]
Workspace [kubeflow.org/v1beta1]

Manage APIs

AmlNamespace [manage.aml.dev/v1alpha1]

Operator APIs

AmlCluster [amlclusters.aml.dev/v1alpha1]
Glossary
Previous PageExtend Inference Runtimes
Next PageConfigure Scaling for Inference Services

#Configure External Access for Inference Services

#TOC

#Introduction

This document provides a step-by-step guide on how to configure external access for your inference services, including checking external access addresses, creating domains, setting up load balancers, and verifying the configuration.

#Steps

#1. View External Access Address of the Inference Service

You can:

  1. Navigate to the service details page and copy the address from the Access Method card, or
  2. View the address in the YAML card under status.url.

#2. Create a Domain

In the Administrator Console, go to Network > Domains, and then click Create Domain.

  • In the Domain field, enter the domain name of your inference service, excluding the protocol (e.g., qwen2-0b5-kubeflow-admin-cpaas-io.my-company.com).
  • For Allocated To (Cluster), select the cluster where your service is located.
  • For Allocated Projects, choose the project where your inference service resides.

#3. Create a Load Balancer

(One load balancer can be shared by multiple projects; create a new one only if necessary.)

In the Administrator Console, go to Network > Load Balancers, and then click Create Load Balancer. For detailed help documentation, please refer to .

#4. Configure the Load Balancer

In the Alauda Container Platform Console, navigate to Network > Load Balancers, then click the name of the load balancer you just created to enter its configuration page.

#4.1 Ports Configuration

Add listening ports for your service. You can add more ports as needed.

Step 1: Add a Port

  1. In the Port Management section, click the Add Port button.
  2. In the configuration page, select a protocol (HTTP or HTTPS).
  3. Enter the corresponding port number. For HTTP, the standard port is 80. For HTTPS, the standard port is 443.

Step 2: Configure HTTPS (if applicable) If you choose the HTTPS protocol, you must select a default certificate.

  1. Ensure you have switched to the istio-system namespace. This is crucial for the next step.
  2. Select knative-serving-cert as the Default Certificate.

#4.2 Rules Configuration

Configure forwarding rules for the ports you added in the previous step.

Step 1: Add a Rule

  1. In the Rules section, click the Add Rule button.
  2. On the configuration page, click the Add Rule Indicator button to add your first rule.
  3. For Type, select Domains.
  4. From the dropdown menu, choose the domain name you created for your inference service, for example: qwen2-0b5-kubeflow-admin-cpaas-io.my-company.com.

Step 2: Configure the Service Group

  1. Locate the Service Group configuration area.
  2. Ensure the Namespace is set to istio-system. If not, switch your project's namespace to istio-system first.
  3. Under Services, select knative-ingressgateway from the dropdown list.
  4. Choose port 80.

Note: The process for configuring rules for the HTTPS protocol (port 443) is the same as described above.

For more detailed parameter configurations, you can refer to .

#5. Verify Access to the Inference Service via External Address

To verify that your inference service is accessible externally, use the curl command below. Remember to replace the placeholders with your actual load balancer IP address, port, and inference service address.

# For HTTP
curl -v --resolve "your-inference-service-domain.com:your-port:your-load-balancer-ip" \
    http://your-inference-service-domain.com/v1/models

# For HTTPS (skip `-k` if you have a valid certificate)
curl -vk --resolve "your-inference-service-domain.com:443:your-load-balancer-ip" \
    https://your-inference-service-domain.com/v1/models

Here's what each part of the command means and what you need to replace:

  • your-inference-service-domain.com: This should be the domain name you created for your inference service (e.g., qwen2-0b5-kubeflow-admin-cpaas-io.my-company.com).
  • your-port: This is the port your load balancer is listening on for HTTP traffic (commonly 80).
  • your-load-balancer-ip: This is the actual IP address of your load balancer (e.g., 192.168.137.21).

If the request successfully returns the model list, your configuration is complete! If it fails, double-check your load balancer settings or review the inference service logs to pinpoint the problem.