Migrate profile resources
Since Alauda AI 1.3
has removed the use of Profile
from kubeflow.org
and introduced a new
Custom Resource Definition (CRD) called AmlNamespace
, we need to migrate existing Profile
resources to AmlNamespace
.
Execute following script in destination cluster:
migrate-profile-resources.sh
#!/bin/bash
BASE_DOMAIN=$(kubectl -n kube-public get cm global-info -o jsonpath='{.data.labelBaseDomain}' | sed 's/"//g')
CLUSTER_NAME=$(kubectl -n kube-public get cm global-info -o jsonpath='{.data.clusterName}' | sed 's/"//g')
profiles=$(kubectl get profiles.kubeflow.org -o jsonpath='{.items[*].metadata.name}')
for profile in $profiles; do
if kubectl get amlnamespace "$profile" &>/dev/null; then
echo "⚠️ AmlNamespace $profile already exists. Skipping..."
continue
fi
profile_yaml=$(kubectl get profiles.kubeflow.org "$profile" -o yaml)
build_registry_file=$(mktemp)
from_registry_file=$(mktemp)
s3_file=$(mktemp)
echo "$profile_yaml" | yq '.spec.plugins[] | select(.kind == "AmlConfig") | .spec.buildRegistry' > "$build_registry_file"
echo "$profile_yaml" | yq '.spec.plugins[] | select(.kind == "AmlConfig") | .spec.fromRegistry' > "$from_registry_file"
echo "$profile_yaml" | yq '.spec.plugins[] | select(.kind == "AmlConfig") | .spec.s3' > "$s3_file"
amlnamespace_yaml=$(cat <<EOF
apiVersion: manage.aml.dev/v1alpha1
kind: AmlNamespace
metadata:
name: $profile
labels:
$BASE_DOMAIN/cluster: $CLUSTER_NAME
spec:
config: {}
EOF
)
if [[ -s "$build_registry_file" && $(yq 'type' "$build_registry_file") != "null" ]]; then
amlnamespace_yaml=$(echo "$amlnamespace_yaml" | yq ".spec.config.buildRegistry = load(\"$build_registry_file\")")
fi
if [[ -s "$from_registry_file" && $(yq 'type' "$from_registry_file") != "null" ]]; then
amlnamespace_yaml=$(echo "$amlnamespace_yaml" | yq ".spec.config.fromRegistry = load(\"$from_registry_file\")")
fi
if [[ -s "$s3_file" && $(yq 'type' "$s3_file") != "null" ]]; then
amlnamespace_yaml=$(echo "$amlnamespace_yaml" | yq ".spec.config.s3 = load(\"$s3_file\")")
fi
rm -f "$build_registry_file" "$from_registry_file" "$s3_file"
echo "$amlnamespace_yaml" | kubectl apply -f -
echo "✅ AmlNamespace $profile created."
done
Migrate InferenceService resources
In Alauda AI 1.3
, KServe
introduces a breaking change that requires model.name
to be kserve-container
in InferenceService
resources.
Therefore, we need to patch all InferenceService
resources created in AML 1.2
to accommodate this change.
Execute following script in destination cluster:
migrate-inferenceservice-resources.sh
#!/bin/bash
set -e
inferenceservices=$(kubectl get inferenceservice --all-namespaces -o jsonpath='{range .items[*]}{.metadata.namespace}{" "}{.metadata.name}{"\n"}{end}')
while IFS=" " read -r namespace svc; do
model_name=$(kubectl -n "$namespace" get inferenceservice "$svc" -o jsonpath='{.spec.predictor.model.name}')
if [ "$model_name" != "kserve-container" ]; then
echo ">> Updating $namespace/$svc..."
kubectl -n "$namespace" patch inferenceservice "$svc" --type=json -p='[{
"op": "replace",
"path": "/spec/predictor/model/name",
"value": "kserve-container"
}]'
fi
done <<< "$inferenceservices"
Migrate Knative CRD
Due to the breaking change of Knative CRD, you need to migrate the CRD to the new version before upgrading.
# Migrate storageVersion of domainmappings from v1alpha1 to v1beta1
kubectl patch crd domainmappings.serving.knative.dev --type='json' -p='[
{"op": "replace", "path": "/spec/versions/0/storage", "value": true},
{"op": "replace", "path": "/spec/versions/1/storage", "value": false}
]'
kubectl exec -naml-operator -it deploy/aml-operator -- /app/storageversion-migrate \
domainmappings.serving.knative.dev
Wait for the command to complete, the command output should like this:
INFO storageversion/main.go:61 Migrating group resources {"len": 1}
INFO storageversion/main.go:64 Migrating group resource {"resource": "domainmappings.serving.knative.dev"}
INFO storageversion/main.go:74 Migration complete
Fix unexpected key in Knative config
Run the following commands to remove _example
key in Knative configs, which blocks upgrading.
CONFIGMAPS=(
$(kubectl -nknative-serving get configmaps -l app.kubernetes.io/name=knative-serving -o jsonpath='{.items[*].metadata.name}')
)
# Please ignore error message like `The request is invalid: ...`
for cm in "${CONFIGMAPS[@]}"; do
kubectl -nknative-serving patch configmap "$cm" --type='json' -p='[{"op": "remove", "path": "/data/_example"}]'
done
Cleanup Knative Serving resources
Some Knative Serving
resources block upgrading so manual action is required.
Execute following script to remove problematic resources:
cleanup-resources.sh
kubectl -nknative-serving delete pdb activator-pdb webhook-pdb
kubectl -nknative-serving patch configmap config-istio --type=json -p='[{
"op": "remove",
"path": "/data/gateway.knative-serving.knative-local-gateway"
}]'
kubectl -nknative-serving patch gateways.networking.istio.io knative-local-gateway --type=json -p='[{
"op": "replace",
"path": "/spec/selector",
"value": {"knative": "ingressgateway"}
}]'
kubectl -nistio-system patch service knative-local-gateway --type=json -p='[{
"op": "replace",
"path": "/spec/selector",
"value": {"knative": "ingressgateway"}
}]'
Retrieve AML 1.2 Values
INFO
helm
and yq
utilities should be installed on the cluster node if not present.
In the AML cluster, you can retrieve the values of AML 1.2
with the following helm
command:
helm get values aml -nkubeflow -oyaml | tee /tmp/aml-values.yaml
Please keep the generated /tmp/aml-values.yaml
file during upgrading.
Creating GitLab Admin Token Secret
We need to create a secret for GitLab admin token, which is required to run Alauda AI Cluster.
Please follow instructions described in Pre-Configuration.
Creating MySQL Secret
We also need to create a MySQL secret for Alauda AI Cluster.
Run the following command:
MYSQL_PASSWORD=$(kubectl -nkubeflow get secret mysql-secret -o jsonpath='{.data.password}' | base64 -d)
kubectl create secret generic aml-mysql-secret \
--from-literal="password=${MYSQL_PASSWORD}" \
-n cpaas-system
- Retrieve the MySQL password from previous AML
1.2
installed secret resource.
- Create a MySQL admin token secret named aml-mysql-secret.
- The password is saved under password key.
- The secret is created under cpaas-system namespace.
Creating Alauda AI Cluster Instance
Finally we create an AmlCluster
resource named default
for Alauda AI Cluster to finish the upgrading.
Create a yaml file named aml-cluster.yaml
with the following content:
aml-cluster.yaml
apiVersion: amlclusters.aml.dev/v1alpha1
kind: AmlCluster
metadata:
name: default
spec:
values:
global:
# single-node or ha-cluster
deployFlavor: single-node
gitlabBaseUrl: "${GITLAB_BASE_URL}"
gitlabAdminTokenSecretRef:
name: aml-gitlab-admin-token
namespace: cpaas-system
mysql:
host: "${MYSQL_HOST}"
port: 3306
user: "${MYSQL_USER}"
database: aml
passwordSecretRef:
name: aml-mysql-secret
namespace: cpaas-system
buildkitd:
storage:
type: emptyDir
components:
kserve:
managementState: Managed
knativeServing:
managementState: Managed
ingressGateway:
domain: "*.example.com"
certificate:
secretName: knative-serving-cert
type: SelfSigned
- The
name
and namespace
of GitLab admin token secret created previously.
- The
name
and namespace
of MySQL secret created previously.
- Set the management state of
kserve
to Managed
, which means the KServe
will be installed and managed by Alauda AI Cluster.
- Set the management state of
knativeServing
to Managed
, which means the Knative Serving
will be installed and managed by Alauda AI Cluster.
- The wildcard
domain
for exposing inference services. For now, just keep it as-is.
INFO
The domain
and certificate
are required for exposing inference services and reserved fo future use.
Executing the following commands to retrieve necessary context from AML 1.2
, and then install Alauda AI Cluster:
# Retrieve the GitLab host scheme and base from the AML 1.2 values file.
GITLAB_HOST_SCHEME=$(yq -r .global.gitScheme < /tmp/aml-values.yaml)
GITLAB_HOST_BASE=$(yq -r .global.gitBase < /tmp/aml-values.yaml)
export GITLAB_BASE_URL="${GITLAB_HOST_SCHEME}://${GITLAB_HOST_BASE}"
# Retrieve the MySQL host, port and user from the AML 1.2 values file.
export MYSQL_HOST=$(yq -r .global.mysqlHost < /tmp/aml-values.yaml)
export MYSQL_PORT=$(yq -r .global.mysqlPort < /tmp/aml-values.yaml)
export MYSQL_USER=$(yq -r .global.mysqlUsername < /tmp/aml-values.yaml)
yq '((.. | select(tag == "!!str")) |= envsubst) | (.spec.values.global.mysql.port = env(MYSQL_PORT))' aml-cluster.yaml \
| kubectl apply -f -
- Interpolate environment variables in
aml-cluster.yaml
file previously created.
- Create the
AmlCluster
resource to install Alauda AI Cluster.