Upgrade the global cluster

consists of a global cluster and one or more workload clusters. The global cluster must be upgraded before any workload clusters.

This document walks you through the upgrade procedure for the global cluster.

If the global cluster is configured with the global DR (Disaster Recovery) solution, follow the global DR procedure strictly. Otherwise, follow the Standard procedure .

Standard procedure

Upload images

Copy the upgrade package to any control plane node of the global cluster. Extract the package and cd into the extracted directory.

  • If the global cluster uses the built-in registry, run:

    bash upgrade.sh --only-sync-image=true
    
  • If the global cluster uses an external registry, you also need to provide the registry address:

    bash upgrade.sh --only-sync-image=true --registry <registry-address> --username <username> --password <password>
    
INFO

Uploading images typically takes about 2 hours, depending on your network and disk performance. If your platform uses the global DR, remember that the standby global cluster also requires image upload, and plan your maintenance window accordingly.

Trigger the upgrade

After the image upload is complete, run the following command to start the upgrade process:

bash upgrade.sh --skip-sync-image

Wait for the script to finish before proceeding.

Upgrade the global cluster

  1. Log into the Web Console of the global cluster and switch to Administrator view.
  2. Navigate to Clusters > Clusters.
  3. Click on the global cluster to open its detail view.
  4. Go to the Functional Components tab.
  5. Click the Upgrade button.

Review the available component updates shown in the dialog, and confirm to continue.

INFO

Upgrading the Kubernetes version is optional. However, since service disruptions may occur regardless, we recommend including the Kubernetes upgrade to avoid multiple maintenance windows.

global DR procedure

Compare data consistency

  1. Follow your regular global DR inspection procedures to ensure that data in the standby global cluster is consistent with the primary global cluster.If inconsistencies are detected, contact technical support before proceeding.
  2. On both clusters, run the following to ensure no Machine nodes are in a non-running state:
kubectl get machines.platform.tkestack.io

If any such nodes exist, contact technical support to resolve them before continuing.

Uninstall the etcd sync plugin

Upgrading from 3.18
Upgrading from 3.16
  1. Access the Web Console of the standby global cluster via its IP or VIP.
  2. Switch to the Platform Management view.
  3. Navigate to Catalog > Cluster Plugin.
  4. Select global from the cluster dropdown.
  5. Find the EtcdSync plugin and click Uninstall. Wait for the uninstallation to complete.

Upgrade the standby global cluster

Follow the same procedure as described in the Standard procedure section to upgrade the standby global cluster first.

Upgrade the primary global cluster

After the standby is upgraded, follow the same Standard procedure to upgrade the primary global cluster.

Reinstall the etcd sync plugin

Before reinstalling, verify that port 2379 is properly forwarded from both global cluster VIPs to their control plane nodes.

To reinstall:

  1. Access the Web Console of the standby global cluster via its IP or VIP.
  2. Switch to Administrator view.
  3. Go to Marketplace > Cluster Plugins.
  4. Select the global cluster.
  5. Locate Alauda Container Platform etcd Synchronizer, click Install, and provide the required parameters.

To verify installation:

kubectl get po -n cpaas-system -l app=etcd-sync  # Ensure pod is 1/1 Running

kubectl logs -n cpaas-system $(kubectl get po -n cpaas-system -l app=etcd-sync --no-headers | awk '{print $1}' | head -1) | grep -i "Start Sync update"
# Wait until the logs contain "Start Sync update"

# Recreate the pod to trigger synchronization of resources with ownerReferences
kubectl delete po -n cpaas-system $(kubectl get po -n cpaas-system -l app=etcd-sync --no-headers | awk '{print $1}' | head -1)

Check Synchronization Status

Run the following to verify the synchronization status:

curl "$(kubectl get svc -n cpaas-system etcd-sync-monitor -ojsonpath='{.spec.clusterIP}')/check"

Explanation of output:

  • "LOCAL ETCD missed keys:" – Keys exist in the primary cluster but are missing in the standby. This often resolves after a pod restart.
  • "LOCAL ETCD surplus keys:" – Keys exist in the standby cluster but not in the primary. Review these with your operations team before deletion.