Migrate profile resources
Since Alauda AI 1.3 has removed the use of Profile from kubeflow.org and introduced a new
Custom Resource Definition (CRD) called AmlNamespace, we need to migrate existing Profile resources to AmlNamespace.
Execute following script in destination cluster:
migrate-profile-resources.sh
#!/bin/bash
BASE_DOMAIN=$(kubectl -n kube-public get cm global-info -o jsonpath='{.data.labelBaseDomain}' | sed 's/"//g')
CLUSTER_NAME=$(kubectl -n kube-public get cm global-info -o jsonpath='{.data.clusterName}' | sed 's/"//g')
profiles=$(kubectl get profiles.kubeflow.org -o jsonpath='{.items[*].metadata.name}')
for profile in $profiles; do
if kubectl get amlnamespace "$profile" &>/dev/null; then
echo "⚠️ AmlNamespace $profile already exists. Skipping..."
continue
fi
profile_yaml=$(kubectl get profiles.kubeflow.org "$profile" -o yaml)
build_registry_file=$(mktemp)
from_registry_file=$(mktemp)
s3_file=$(mktemp)
echo "$profile_yaml" | yq '.spec.plugins[] | select(.kind == "AmlConfig") | .spec.buildRegistry' > "$build_registry_file"
echo "$profile_yaml" | yq '.spec.plugins[] | select(.kind == "AmlConfig") | .spec.fromRegistry' > "$from_registry_file"
echo "$profile_yaml" | yq '.spec.plugins[] | select(.kind == "AmlConfig") | .spec.s3' > "$s3_file"
amlnamespace_yaml=$(cat <<EOF
apiVersion: manage.aml.dev/v1alpha1
kind: AmlNamespace
metadata:
name: $profile
labels:
$BASE_DOMAIN/cluster: $CLUSTER_NAME
spec:
config: {}
EOF
)
if [[ -s "$build_registry_file" && $(yq 'type' "$build_registry_file") != "null" ]]; then
amlnamespace_yaml=$(echo "$amlnamespace_yaml" | yq ".spec.config.buildRegistry = load(\"$build_registry_file\")")
fi
if [[ -s "$from_registry_file" && $(yq 'type' "$from_registry_file") != "null" ]]; then
amlnamespace_yaml=$(echo "$amlnamespace_yaml" | yq ".spec.config.fromRegistry = load(\"$from_registry_file\")")
fi
if [[ -s "$s3_file" && $(yq 'type' "$s3_file") != "null" ]]; then
amlnamespace_yaml=$(echo "$amlnamespace_yaml" | yq ".spec.config.s3 = load(\"$s3_file\")")
fi
rm -f "$build_registry_file" "$from_registry_file" "$s3_file"
echo "$amlnamespace_yaml" | kubectl apply -f -
echo "✅ AmlNamespace $profile created."
done
Migrate InferenceService resources
In Alauda AI 1.3, KServe introduces a breaking change that requires model.name to be kserve-container in InferenceService resources.
Therefore, we need to patch all InferenceService resources created in AML 1.2 to accommodate this change.
Execute following script in destination cluster:
migrate-inferenceservice-resources.sh
#!/bin/bash
set -e
inferenceservices=$(kubectl get inferenceservice --all-namespaces -o jsonpath='{range .items[*]}{.metadata.namespace}{" "}{.metadata.name}{"\n"}{end}')
while IFS=" " read -r namespace svc; do
model_name=$(kubectl -n "$namespace" get inferenceservice "$svc" -o jsonpath='{.spec.predictor.model.name}')
if [ "$model_name" != "kserve-container" ]; then
echo ">> Updating $namespace/$svc..."
kubectl -n "$namespace" patch inferenceservice "$svc" --type=json -p='[{
"op": "replace",
"path": "/spec/predictor/model/name",
"value": "kserve-container"
}]'
fi
done <<< "$inferenceservices"
Migrate Knative CRD
Due to the breaking change of Knative CRD, you need to migrate the CRD to the new version before upgrading.
# Migrate storageVersion of domainmappings from v1alpha1 to v1beta1
kubectl patch crd domainmappings.serving.knative.dev --type='json' -p='[
{"op": "replace", "path": "/spec/versions/0/storage", "value": true},
{"op": "replace", "path": "/spec/versions/1/storage", "value": false}
]'
kubectl exec -naml-operator -it deploy/aml-operator -- /app/storageversion-migrate \
domainmappings.serving.knative.dev
Wait for the command to complete, the command output should like this:
INFO storageversion/main.go:61 Migrating group resources {"len": 1}
INFO storageversion/main.go:64 Migrating group resource {"resource": "domainmappings.serving.knative.dev"}
INFO storageversion/main.go:74 Migration complete
Fix unexpected key in Knative config
Run the following commands to remove _example key in Knative configs, which blocks upgrading.
CONFIGMAPS=(
$(kubectl -nknative-serving get configmaps -l app.kubernetes.io/name=knative-serving -o jsonpath='{.items[*].metadata.name}')
)
# Please ignore error message like `The request is invalid: ...`
for cm in "${CONFIGMAPS[@]}"; do
kubectl -nknative-serving patch configmap "$cm" --type='json' -p='[{"op": "remove", "path": "/data/_example"}]'
done
Cleanup Knative Serving resources
Some Knative Serving resources block upgrading so manual action is required.
Execute following script to remove problematic resources:
cleanup-resources.sh
kubectl -nknative-serving delete pdb activator-pdb webhook-pdb
kubectl -nknative-serving patch configmap config-istio --type=json -p='[{
"op": "remove",
"path": "/data/gateway.knative-serving.knative-local-gateway"
}]'
kubectl -nknative-serving patch gateways.networking.istio.io knative-local-gateway --type=json -p='[{
"op": "replace",
"path": "/spec/selector",
"value": {"knative": "ingressgateway"}
}]'
kubectl -nistio-system patch service knative-local-gateway --type=json -p='[{
"op": "replace",
"path": "/spec/selector",
"value": {"knative": "ingressgateway"}
}]'
Retrieve AML 1.2 Values
INFO
helm and yq utilities should be installed on the cluster node if not present.
In the AML cluster, you can retrieve the values of AML 1.2 with the following helm command:
helm get values aml -nkubeflow -oyaml | tee /tmp/aml-values.yaml
Please keep the generated /tmp/aml-values.yaml file during upgrading.
Creating GitLab Admin Token Secret
We need to create a secret for GitLab admin token, which is required to run Alauda AI Cluster.
Please follow instructions described in Pre-Configuration.
Creating MySQL Secret
We also need to create a MySQL secret for Alauda AI Cluster.
Run the following command:
MYSQL_PASSWORD=$(kubectl -nkubeflow get secret mysql-secret -o jsonpath='{.data.password}' | base64 -d)
kubectl create secret generic aml-mysql-secret \
--from-literal="password=${MYSQL_PASSWORD}" \
-n cpaas-system
- Retrieve the MySQL password from previous AML
1.2 installed secret resource.
- Create a MySQL admin token secret named aml-mysql-secret.
- The password is saved under password key.
- The secret is created under cpaas-system namespace.
Creating Alauda AI Cluster Instance
Finally we create an AmlCluster resource named default for Alauda AI Cluster to finish the upgrading.
Create a yaml file named aml-cluster.yaml with the following content:
aml-cluster.yaml
apiVersion: amlclusters.aml.dev/v1alpha1
kind: AmlCluster
metadata:
name: default
spec:
values:
global:
# single-node or ha-cluster
deployFlavor: single-node
gitlabBaseUrl: "${GITLAB_BASE_URL}"
gitlabAdminTokenSecretRef:
name: aml-gitlab-admin-token
namespace: cpaas-system
mysql:
host: "${MYSQL_HOST}"
port: 3306
user: "${MYSQL_USER}"
database: aml
passwordSecretRef:
name: aml-mysql-secret
namespace: cpaas-system
buildkitd:
storage:
type: emptyDir
components:
kserve:
managementState: Managed
knativeServing:
managementState: Managed
ingressGateway:
domain: "*.example.com"
certificate:
secretName: knative-serving-cert
type: SelfSigned
- The
name and namespace of GitLab admin token secret created previously.
- The
name and namespace of MySQL secret created previously.
- Set the management state of
kserve to Managed, which means the KServe will be installed and managed by Alauda AI Cluster.
- Set the management state of
knativeServing to Managed, which means the Knative Serving will be installed and managed by Alauda AI Cluster.
- The wildcard
domain for exposing inference services. For now, just keep it as-is.
INFO
The domain and certificate are required for exposing inference services and reserved fo future use.
Executing the following commands to retrieve necessary context from AML 1.2, and then install Alauda AI Cluster:
# Retrieve the GitLab host scheme and base from the AML 1.2 values file.
GITLAB_HOST_SCHEME=$(yq -r .global.gitScheme < /tmp/aml-values.yaml)
GITLAB_HOST_BASE=$(yq -r .global.gitBase < /tmp/aml-values.yaml)
export GITLAB_BASE_URL="${GITLAB_HOST_SCHEME}://${GITLAB_HOST_BASE}"
# Retrieve the MySQL host, port and user from the AML 1.2 values file.
export MYSQL_HOST=$(yq -r .global.mysqlHost < /tmp/aml-values.yaml)
export MYSQL_PORT=$(yq -r .global.mysqlPort < /tmp/aml-values.yaml)
export MYSQL_USER=$(yq -r .global.mysqlUsername < /tmp/aml-values.yaml)
yq '((.. | select(tag == "!!str")) |= envsubst) | (.spec.values.global.mysql.port = env(MYSQL_PORT))' aml-cluster.yaml \
| kubectl apply -f -
- Interpolate environment variables in
aml-cluster.yaml file previously created.
- Create the
AmlCluster resource to install Alauda AI Cluster.