Upgrade from AML 1.2

Install Alauda AI Cluster Components

Please visit Alauda AI Cluster for:

WARNING

Please ignore Creating Alauda AI Cluster Instance since we are upgrading from AML 1.2.

  1. Downloading operator bundle packages for Alauda AI Cluster and KServeless.
  2. Uploading operator bundle packages to the destination cluster.
  3. Install the Alauda AI Cluster Operator to the destination cluster.

Upgrading

The following procedure describes how to upgrade from AML 1.2 to Alauda AI 1.3.

Please ensure aml-operator pod is up and running inaml-operator namespace before upgrading:

kubectl -naml-operator get pod

Migrate profile resources

Since Alauda AI 1.3 has removed the use of Profile from kubeflow.org and introduced a new Custom Resource Definition (CRD) called AmlNamespace, we need to migrate existing Profile resources to AmlNamespace.

Execute following script in destination cluster:

migrate-profile-resources.sh
#!/bin/bash
BASE_DOMAIN=$(kubectl -n kube-public get cm  global-info -o jsonpath='{.data.labelBaseDomain}' | sed 's/"//g')
CLUSTER_NAME=$(kubectl -n kube-public get cm  global-info -o jsonpath='{.data.clusterName}' | sed 's/"//g')

profiles=$(kubectl get profiles.kubeflow.org -o jsonpath='{.items[*].metadata.name}')

for profile in $profiles; do
  if kubectl get amlnamespace "$profile" &>/dev/null; then
    echo "⚠️  AmlNamespace $profile already exists. Skipping..."
    continue
  fi

  profile_yaml=$(kubectl get profiles.kubeflow.org "$profile" -o yaml)

  build_registry_file=$(mktemp)
  from_registry_file=$(mktemp)
  s3_file=$(mktemp)

  echo "$profile_yaml" | yq '.spec.plugins[] | select(.kind == "AmlConfig") | .spec.buildRegistry' > "$build_registry_file"
  echo "$profile_yaml" | yq '.spec.plugins[] | select(.kind == "AmlConfig") | .spec.fromRegistry' > "$from_registry_file"
  echo "$profile_yaml" | yq '.spec.plugins[] | select(.kind == "AmlConfig") | .spec.s3' > "$s3_file"

  amlnamespace_yaml=$(cat <<EOF
apiVersion: manage.aml.dev/v1alpha1
kind: AmlNamespace
metadata:
  name: $profile
  labels:
    $BASE_DOMAIN/cluster: $CLUSTER_NAME
spec:
  config: {}
EOF
)

  if [[ -s "$build_registry_file" && $(yq 'type' "$build_registry_file") != "null" ]]; then
    amlnamespace_yaml=$(echo "$amlnamespace_yaml" | yq ".spec.config.buildRegistry = load(\"$build_registry_file\")")
  fi

  if [[ -s "$from_registry_file" && $(yq 'type' "$from_registry_file") != "null" ]]; then
    amlnamespace_yaml=$(echo "$amlnamespace_yaml" | yq ".spec.config.fromRegistry = load(\"$from_registry_file\")")
  fi

  if [[ -s "$s3_file" && $(yq 'type' "$s3_file") != "null" ]]; then
    amlnamespace_yaml=$(echo "$amlnamespace_yaml" | yq ".spec.config.s3 = load(\"$s3_file\")")
  fi

  rm -f "$build_registry_file" "$from_registry_file" "$s3_file"

  echo "$amlnamespace_yaml" | kubectl apply -f -
  echo "✅ AmlNamespace $profile created."
done

Migrate InferenceService resources

In Alauda AI 1.3, KServe introduces a breaking change that requires model.name to be kserve-container in InferenceService resources. Therefore, we need to patch all InferenceService resources created in AML 1.2 to accommodate this change.

Execute following script in destination cluster:

migrate-inferenceservice-resources.sh
#!/bin/bash

set -e

inferenceservices=$(kubectl get inferenceservice --all-namespaces -o jsonpath='{range .items[*]}{.metadata.namespace}{" "}{.metadata.name}{"\n"}{end}')

while IFS=" " read -r namespace svc; do
    model_name=$(kubectl -n "$namespace" get inferenceservice "$svc" -o jsonpath='{.spec.predictor.model.name}')
    if [ "$model_name" != "kserve-container" ]; then
        echo ">> Updating $namespace/$svc..."
        kubectl -n "$namespace" patch inferenceservice "$svc" --type=json -p='[{
          "op": "replace",
          "path": "/spec/predictor/model/name",
          "value": "kserve-container"
        }]'
    fi
done <<< "$inferenceservices"

Migrate Knative CRD

Due to the breaking change of Knative CRD, you need to migrate the CRD to the new version before upgrading.

# Migrate storageVersion of domainmappings from v1alpha1 to v1beta1
kubectl patch crd domainmappings.serving.knative.dev --type='json' -p='[
  {"op": "replace", "path": "/spec/versions/0/storage", "value": true},
  {"op": "replace", "path": "/spec/versions/1/storage", "value": false}
]'

kubectl exec -naml-operator -it deploy/aml-operator -- /app/storageversion-migrate \
  domainmappings.serving.knative.dev

Wait for the command to complete, the command output should like this:

INFO storageversion/main.go:61 Migrating group resources {"len": 1} INFO storageversion/main.go:64 Migrating group resource {"resource": "domainmappings.serving.knative.dev"} INFO storageversion/main.go:74 Migration complete

Fix unexpected key in Knative config

Run the following commands to remove _example key in Knative configs, which blocks upgrading.

CONFIGMAPS=(
  $(kubectl -nknative-serving get configmaps -l app.kubernetes.io/name=knative-serving -o jsonpath='{.items[*].metadata.name}')
)

# Please ignore error message like `The request is invalid: ...`
for cm in "${CONFIGMAPS[@]}"; do
  kubectl -nknative-serving patch configmap "$cm" --type='json' -p='[{"op": "remove", "path": "/data/_example"}]'
done

Cleanup Knative Serving resources

Some Knative Serving resources block upgrading so manual action is required.

Execute following script to remove problematic resources:

cleanup-resources.sh
kubectl -nknative-serving delete pdb activator-pdb webhook-pdb

kubectl -nknative-serving patch configmap config-istio --type=json -p='[{
  "op": "remove",
  "path": "/data/gateway.knative-serving.knative-local-gateway"
}]'

kubectl -nknative-serving patch gateways.networking.istio.io knative-local-gateway --type=json -p='[{
  "op": "replace",
  "path": "/spec/selector",
  "value": {"knative": "ingressgateway"}
}]'

kubectl -nistio-system patch service knative-local-gateway --type=json -p='[{
  "op": "replace",
  "path": "/spec/selector",
  "value": {"knative": "ingressgateway"}
}]'

Retrieve AML 1.2 Values

INFO

helm and yq utilities should be installed on the cluster node if not present.

In the AML cluster, you can retrieve the values of AML 1.2 with the following helm command:

helm get values aml -nkubeflow -oyaml | tee /tmp/aml-values.yaml

Please keep the generated /tmp/aml-values.yaml file during upgrading.

Creating GitLab Admin Token Secret

We need to create a secret for GitLab admin token, which is required to run Alauda AI Cluster.

Please follow instructions described in Pre-Configuration.

Creating MySQL Secret

We also need to create a MySQL secret for Alauda AI Cluster.

Run the following command:

MYSQL_PASSWORD=$(kubectl -nkubeflow get secret mysql-secret -o jsonpath='{.data.password}' | base64 -d)  

kubectl create secret generic aml-mysql-secret \
  --from-literal="password=${MYSQL_PASSWORD}" \
  -n cpaas-system
  1. Retrieve the MySQL password from previous AML 1.2 installed secret resource.
  2. Create a MySQL admin token secret named aml-mysql-secret.
  3. The password is saved under password key.
  4. The secret is created under cpaas-system namespace.

Creating Alauda AI Cluster Instance

Finally we create an AmlCluster resource named default for Alauda AI Cluster to finish the upgrading.

Create a yaml file named aml-cluster.yaml with the following content:

aml-cluster.yaml
apiVersion: amlclusters.aml.dev/v1alpha1
kind: AmlCluster
metadata:
  name: default
spec:
  values:
    global:
      # single-node or ha-cluster
      deployFlavor: single-node
      gitlabBaseUrl: "${GITLAB_BASE_URL}"
      gitlabAdminTokenSecretRef:
        name: aml-gitlab-admin-token
        namespace: cpaas-system
      mysql:
        host: "${MYSQL_HOST}"
        port: 3306
        user: "${MYSQL_USER}"
        database: aml
        passwordSecretRef:
          name: aml-mysql-secret
          namespace: cpaas-system
    buildkitd:
      storage:
        type: emptyDir
  components:
    kserve:
      managementState: Managed
    knativeServing:
      managementState: Managed
      ingressGateway:
        domain: "*.example.com"
        certificate:
          secretName: knative-serving-cert
          type: SelfSigned
  1. The name and namespace of GitLab admin token secret created previously.
  2. The name and namespace of MySQL secret created previously.
  3. Set the management state of kserve to Managed, which means the KServe will be installed and managed by Alauda AI Cluster.
  4. Set the management state of knativeServing to Managed, which means the Knative Serving will be installed and managed by Alauda AI Cluster.
  5. The wildcard domain for exposing inference services. For now, just keep it as-is.
INFO

The domain and certificate are required for exposing inference services and reserved fo future use.

Executing the following commands to retrieve necessary context from AML 1.2, and then install Alauda AI Cluster:

# Retrieve the GitLab host scheme and base from the AML 1.2 values file.
GITLAB_HOST_SCHEME=$(yq -r .global.gitScheme < /tmp/aml-values.yaml)
GITLAB_HOST_BASE=$(yq -r .global.gitBase < /tmp/aml-values.yaml)
export GITLAB_BASE_URL="${GITLAB_HOST_SCHEME}://${GITLAB_HOST_BASE}"

# Retrieve the MySQL host, port and user from the AML 1.2 values file.
export MYSQL_HOST=$(yq -r .global.mysqlHost < /tmp/aml-values.yaml)
export MYSQL_PORT=$(yq -r .global.mysqlPort < /tmp/aml-values.yaml)
export MYSQL_USER=$(yq -r .global.mysqlUsername < /tmp/aml-values.yaml)

yq '((.. | select(tag == "!!str")) |= envsubst) | (.spec.values.global.mysql.port = env(MYSQL_PORT))' aml-cluster.yaml \
  | kubectl apply -f -
  1. Interpolate environment variables in aml-cluster.yaml file previously created.
  2. Create the AmlCluster resource to install Alauda AI Cluster.

Verification

Check the status field from the AmlCluster resource which named default:

kubectl get amlcluster default

Should returns Ready:

NAME READY REASON default True Succeeded

Install Alauda AI Essentials

Please visit Alauda AI Essentials for installation instructions.