Upgrade the global cluster

consists of a global cluster and one or more workload clusters. The global cluster must be upgraded before any workload clusters.

This document walks you through the upgrade procedure for the global cluster.

If the global cluster is configured with the global DR (Disaster Recovery) solution, follow the global DR procedure strictly. Otherwise, follow the Standard procedure .

Standard procedure

Upload images

Copy the core package to any control plane node of the global cluster. Extract the package and cd into the extracted directory.

If the global cluster uses the built-in registry, run:
bash upgrade.sh --only-sync-image=true

If the global cluster uses an external registry, you also need to provide the registry address:

bash upgrade.sh --only-sync-image=true --registry <registry-address> --username <username> --password <password>

If you plan to upgrade the Operator and Cluster Plugin together during the global cluster upgrade, you can pre-push their images to the global cluster's registry in advance. For bulk upload instructions, see Push only images from all packages in a directory.

INFO

Uploading images typically takes about 2 hours, depending on your network and disk performance.

If your platform is configured for global disaster recovery (DR), remember that the standby global cluster also requires image upload. Be sure to plan your maintenance window accordingly.

WARNING

When using violet to upload packages to a standby cluster, the parameter --dest-repo <VIP addr of standby cluster> must be specified.
Otherwise, the packages will be uploaded to the image repository of the primary cluster, preventing the standby cluster from installing or upgrading extensions.

Also be awared that either authentication info of the standby cluster's image registry or --no-auth parameter MUST be provided.

For details of the violet push subcommand, please refer to Upload Packages.

Trigger the upgrade

After the image upload is complete, run the following command to start the upgrade process:

bash upgrade.sh --skip-sync-image

Wait for the script to finish before proceeding.

If you have already pre-pushed the Operator and Cluster Plugin images to the global cluster's registry, you can then follow Create only CRs from all packages in a directory. After running this command, wait about 10–15 minutes until upgrade notifications appear for functional components. You will then be able to upgrade the Operator and Cluster Plugin together as part of the subsequent upgrade steps.

WARNING

When upgrading the global cluster, do not use the --clusters parameter to create CRs on workload clusters in the Create only CRs from all packages in a directory step. Doing so may cause upgrade failures during subsequent workload cluster upgrades.
If you are upgrading from 3.18 or 4.0 and the directory contains the Build of TopoLVM package, you must remove it before running the Create only CRs from all packages in a directory step. After completing that step, create the CRs for TopoLVM separately, and make sure to include the --target-catalog-source "platform" parameter.

(Conditional) Remove TopoLVM

If you are upgrading from 3.18 and the Build of TopoLVM is installed, you must back up and delete the related TopoLVM resources before proceeding with the upgrade.

Otherwise, the cluster upgrade will fail.

Run the following commands on any control plane node of the cluster to be upgraded:

kubectl get artifactversion -n cpaas-system  $(kubectl get artifactversion -n cpaas-system -l cpaas.io/artifact-version=operatorhub-topolvm-operator --no-headers | grep 4.1 | head -1 | awk '{print $1}') -o yaml > topolvm-artifact.yaml
kubectl delete -f topolvm-artifact.yaml

Then, run the following command on any control plane node of the global cluster:

# Replace cluster_name with the name of the cluster to be upgraded
kubectl delete minfo $(kubectl get minfo | grep topolvm-migrate-catalog-updater | grep $cluster_name | awk '{print $1}')

Upgrade the global cluster

WARNING

If you are upgrading from 3.16 or 3.18 and the platform has Data Services installed, you must also upgrade the related extensions when upgrading the clusters.

For more information, see Upgrade Data Services.

Log in to the Web Console of the global cluster and switch to Administrator view.
Navigate to Clusters > Clusters.
Click on the global cluster to open its detail view.
Go to the Functional Components tab.
Click the Upgrade button.

Review the available component updates in the dialog, and confirm to proceed.

INFO

Upgrading the Kubernetes version is optional. However, since service disruptions may occur regardless, we recommend including the Kubernetes upgrade to avoid multiple maintenance windows.

If the Alauda Container Platform GitOps plugin is installed in the global cluster and its pods are running abnormally after the upgrade, refer to Upgrading Alauda Container Platform GitOps.

(Conditional) Upgrade TopoLVM

If you are upgrading from 3.18 and the Build of TopoLVM is installed, and you have already completed the Remove TopoLVM step.

On a control plane node of the cluster to be upgraded, continue by running the following command to upgrade TopoLVM:

kubectl create -f topolvm-artifact.yaml

After running the command, wait approximately 5–10 minutes. The TopoLVM component will be automatically upgraded and reflected in the web console.

Install Product Docs Plugin

INFO

The Alauda Container Platform Product Docs plugin provides access to product documentation within the platform. All help links throughout the platform will direct users to this documentation. If this plugin is not installed, clicking help links in the platform will result in 404 access errors.

Starting from 4.0, the built-in product documentation has been separated into the Alauda Container Platform Product Docs plugin. If you are upgrading from 3.x, you need to install this plugin by following these steps:

Navigate to Administrator.
In the left sidebar, click Marketplace > Cluster Plugins and select the global cluster.
Locate the Alauda Container Platform Product Docs plugin and click Install.

Install Alauda Container Platform Cluster Enhancer Plugin

INFO

This step is only to ensure that the cluster enhancer plugin is installed. If you found this cluster plugin already installed, nothing need to do.

Navigate to Administrator.
In the left sidebar, click Marketplace > Cluster Plugins and select the global cluster.
Locate the Alauda Container Platform Cluster Enhancer plugin and click Install.

(Conditional) Install Service Mesh Essentials

If Service Mesh v1 is installed, refer to the Alauda Service Mesh Essentials Cluster Plugin documentation before upgrading the workload clusters.

Post-upgrade

global DR procedure

Verify data consistency

Follow your regular global DR inspection procedures to ensure that data in the standby global cluster is consistent with the primary global cluster.

If inconsistencies are detected, contact technical support before proceeding.

On both clusters, run the following command to ensure no Machine nodes are in a non-running state:

kubectl get machines.platform.tkestack.io

If any such nodes exist, contact technical support to resolve them before continuing.

Uninstall the etcd sync plugin

Upgrading from 3.18

Upgrading from 4.0 / 4.1

Access the Web Console of the primary cluster via its IP or VIP.
Switch to the Administrator view.
Navigate to Catalog > Cluster Plugin.
MAKE SURE you have switched to the global cluster.
Find the EtcdSync plugin and Uninstall it. Wait for the uninstallation to complete.

Upload images

Perform the Upload images step on both the standby cluster and the primary cluster.

See Upload images in Standard procedure for details.

Upgrade the standby cluster

INFO

Accessing the standby cluster Web Console is required to perform the upgrade.

Before proceeding, verify that the ProductBase resource of the standby cluster is correctly configured with the cluster VIP under spec.alternativeURLs.

If not, update the configuration as follows:

apiVersion: product.alauda.io/v1alpha2
kind: ProductBase
metadata:
  name: base
spec:
  alternativeURLs:
    - https://<standby-cluster-vip>

On the standby cluster, follow the steps in the Standard procedure to complete the upgrade.

Upgrade the primary cluster

After the standby cluster has been upgraded, proceed with the Standard procedure on the primary cluster.

Reinstall the etcd sync plugin

Before reinstalling, verify that port 2379 is properly forwarded from both global cluster VIPs to their control plane nodes.

To reinstall:

Access the Web Console of the standby global cluster via its IP or VIP.
Switch to Administrator view.
Go to Marketplace > Cluster Plugins.
Select the global cluster.
Locate Alauda Container Platform etcd Synchronizer, click Install, and provide the required parameters.

To verify installation:

kubectl get po -n cpaas-system -l app=etcd-sync  # Ensure pod is 1/1 Running

kubectl logs -n cpaas-system $(kubectl get po -n cpaas-system -l app=etcd-sync --no-headers | awk '{print $1}' | head -1) | grep -i "Start Sync update"
# Wait until the logs contain "Start Sync update"

# Recreate the pod to trigger synchronization of resources with ownerReferences
kubectl delete po -n cpaas-system $(kubectl get po -n cpaas-system -l app=etcd-sync --no-headers | awk '{print $1}' | head -1)

Check Synchronization Status

Run the following to verify the synchronization status:

curl "$(kubectl get svc -n cpaas-system etcd-sync-monitor -ojsonpath='{.spec.clusterIP}')/check"

Explanation of output:

"LOCAL ETCD missed keys:" – Keys exist in the primary cluster but are missing in the standby. This often resolves after a pod restart.
"LOCAL ETCD surplus keys:" – Keys exist in the standby cluster but not in the primary. Review these with your operations team before deletion.

#Upgrade the global cluster

#TOC

#Standard procedure

#Upload images

#Trigger the upgrade

#(Conditional) Remove TopoLVM

#Upgrade the global cluster

#(Conditional) Upgrade TopoLVM

#Install Product Docs Plugin

#Install Alauda Container Platform Cluster Enhancer Plugin

#(Conditional) Install Service Mesh Essentials

#Post-upgrade

#global DR procedure

#Verify data consistency

#Uninstall the etcd sync plugin

#Upload images

#Upgrade the standby cluster

#Upgrade the primary cluster

#Reinstall the etcd sync plugin

#Check Synchronization Status

Upgrade the global cluster

TOC

Standard procedure

Upload images

Trigger the upgrade

(Conditional) Remove TopoLVM

Upgrade the global cluster

(Conditional) Upgrade TopoLVM

Install Product Docs Plugin

Install Alauda Container Platform Cluster Enhancer Plugin

(Conditional) Install Service Mesh Essentials

Post-upgrade

global DR procedure

Verify data consistency

Uninstall the etcd sync plugin

Upload images

Upgrade the standby cluster

Upgrade the primary cluster

Reinstall the etcd sync plugin

Check Synchronization Status