Installing the global Cluster

This document describes how to install the global cluster onto Immutable Infrastructure. The global cluster is the platform control plane and is provisioned through Cluster API. Use this path when the platform control plane must run on an immutable operating system such as MicroOS.

When to Use This Path

Choose this installation path when all of the following conditions apply:

  • You want the global cluster to run on an immutable operating system. MicroOS is the supported image today.
  • Your infrastructure is one of the documented providers: Huawei DCS, VMware vSphere, or Huawei Cloud Stack. Bare-metal support for the global cluster is planned.
  • You can run a temporary KIND host that has network access to the target IaaS platform.

For traditional operating systems such as Ubuntu or RHEL, use the standard installation path instead.

Common Prerequisites

The following prerequisites apply to every provider:

  • A KIND host that meets the minimum hardware and network requirements. See the Overview for sizing guidance.
  • The Core Package from the Customer Portal.
  • The Alauda Container Platform Kubeadm Provider package.
  • The infrastructure provider package for your target platform.
  • Network reachability between the KIND host and the target IaaS platform API endpoint.
  • IP and hostname planning for the global control plane and worker nodes. See Infrastructure Resources for the resource model used by each provider.
  • A stable Kubernetes API endpoint for the global cluster, such as a VIP or load balancer address.
  • A platform access address, registry address, and Pod and Service CIDR ranges.
  • For x86_64 nodes that use ACP-provided MicroOS images, the underlying CPUs must support the x86-64-v2 ISA baseline. See OS Support Matrix.
Naming Convention (Required)

This rule applies to every infrastructure provider supported by this install path — Huawei DCS, Huawei Cloud Stack, VMware vSphere, and any provider added in the future. Every manifest you author in Step 4 must follow it. Misnaming these resources has two distinct failure modes, both detailed below; one breaks initial provisioning, the other only surfaces during disaster recovery.

  • The CAPI Cluster and the provider's infrastructure cluster resource (for example, DCSCluster for Huawei DCS or HCSCluster for Huawei Cloud Stack; each provider has its own equivalent) must be named exactly global. cpaas-installer looks them up by literal name, and the Huawei Cloud Stack provider only allocates the global ELB listener ports (11443 for the registry and console, 2379 for DR etcd-sync, 443 for web access) when the infra cluster is named global. A different name silently breaks registry pull, DR etcd-sync, and the web console.
  • Every other CAPI resource (KubeadmControlPlane, KubeadmConfigTemplate, MachineDeployment) and every other provider infrastructure resource (machine templates, IP/hostname pools, machine config pools, and any other per-provider resource) must use a name with the global- prefix. The DR (failover) mechanism uses this prefix to identify resources owned by the global cluster. A global cluster resource without the global- prefix is invisible to DR and causes the standby cluster's machines to be deleted at failover time — the cluster will provision and run normally, then lose nodes the first time DR is exercised. This is a hard requirement, not a stylistic convention.
  • Cluster.spec.controlPlaneRef.name and any other cross-references must match the prefixed names exactly.

Compatibility and Version Inputs

Before installation, record the supported version set for the delivery package:

InputPurpose
Core Package versionProvides the installer, local registry, and base platform payload.
Kubeadm provider chart versionMust match the Cluster API control plane resources used by the global manifest.
Infrastructure provider chart versionUse the VMware vSphere, DCS, or HCS provider chart version delivered with the target release.
MicroOS image or VM templateMust contain the Kubernetes version used by K8S_VERSION.
K8S_VERSIONUse v-prefixed semver that matches the target MicroOS image, such as v<major>.<minor>.<patch>.

Procedure

Step 1 — Prepare Common Variables

Set the common variables on the KIND host.

export HOST_IP="<kind-host-ip>"
export LOCAL_REGISTRY_ADDRESS="127.0.0.1:11443"
export BOOTSTRAP_REGISTRY_ADDRESS="172.18.0.1:11443"
export NODE_REGISTRY_ADDRESS="${HOST_IP}:11443"
export CONTROL_PLANE_VIP="<global-control-plane-vip>"
export PLATFORM_HOST="<platform-access-domain-or-vip>"
export REGISTRY_DOMAIN="<platform-registry-domain-or-vip>:11443"
export CLUSTER_CIDR="100.3.0.0/16"
export SERVICE_CIDR="100.4.0.0/16"
export K8S_VERSION="<target-kubernetes-version>"
export INGRESS_CLASS_NAME="global-alb2"
export HCS_SECRET_NAME="global-secret"
# Use v-prefixed semver that matches the target MicroOS image.

Use LOCAL_REGISTRY_ADDRESS when pushing packages from the KIND host. Use BOOTSTRAP_REGISTRY_ADDRESS in AppRelease chart repository values because provider Pods read the chart repository from inside the bootstrap KIND network. Use NODE_REGISTRY_ADDRESS in Cluster API registry annotations because provisioned global nodes must pull images through an address reachable from their subnet.

Step 2 — Bootstrap the KIND Host

Run the bootstrap script provided by the Core Package. This brings up a temporary management cluster, minialauda, on the KIND host.

mkdir -p /root/cpaas-install
tar -xvf <core-package> -C /root/cpaas-install
cd /root/cpaas-install/installer
sh setup.sh
mkdir -p ~/.kube
cp /var/cpaas/data/alauda.kubeconfig ~/.kube/config

The bootstrap script provisions an embedded registry, the Cluster API control plane, and the installer components that drive the global cluster installation.

Step 3 — Upload and Install Provider Packages

Upload the Kubeadm provider package and the infrastructure provider package to the local registry.

Why cluster.type is Baremetal for every provider

The AppRelease values in the tabs below all set global.cluster.type: Baremetal. This is a chart-internal classifier, not the IaaS provider name. Keep Baremetal for the Huawei DCS, VMware vSphere, and Huawei Cloud Stack global installations. The value drives how the platform configures node-level components; it does not select the infrastructure provider.

Huawei DCS
VMware vSphere
Huawei Cloud Stack
Bare Metal

Set the provider package paths and chart versions.

export DCS_PROVIDER_PACK="/root/cluster-api-provider-dcs.amd64.<version>.tgz"
export KUBEADM_PROVIDER_PACK="/root/cluster-api-provider-kubeadm.amd64.<version>.tgz"
export DCS_PROVIDER_VERSION="<dcs-provider-chart-version>"
export KUBEADM_PROVIDER_VERSION="<kubeadm-provider-chart-version>"

Upload the packages.

/root/cpaas-install/installer/res/amd64/packtool pack push \
  -r "${LOCAL_REGISTRY_ADDRESS}" -c "${DCS_PROVIDER_PACK}"

/root/cpaas-install/installer/res/amd64/packtool pack push \
  -r "${LOCAL_REGISTRY_ADDRESS}" -c "${KUBEADM_PROVIDER_PACK}"

Create and apply the AppRelease resources for the Kubeadm provider and the DCS provider.

mkdir -p /root/yamls
export DCS_PROVIDER_APPRELEASES="/root/yamls/dcs-provider-appreleases.yaml"

cat > "${DCS_PROVIDER_APPRELEASES}" <<EOF
---
apiVersion: operator.alauda.io/v1alpha1
kind: AppRelease
metadata:
  annotations:
    auto-recycle: "true"
    interval-sync: "true"
  name: cluster-api-provider-kubeadm
  namespace: cpaas-system
spec:
  destination:
    cluster: ""
    namespace: ""
  source:
    chartPullSecret: global-registry-auth
    charts:
      - name: ait/chart-cluster-api-provider-kubeadm
        releaseName: cluster-api-provider-kubeadm
        targetRevision: ${KUBEADM_PROVIDER_VERSION}
    repoURL: ${BOOTSTRAP_REGISTRY_ADDRESS}
  timeout: 120
  values:
    global:
      albName: ${INGRESS_CLASS_NAME}
      auth:
        default_admin: admin@cpaas.io
      cluster:
        isGlobal: true
        name: global
        networkType: kube-ovn
        type: Baremetal
      host: ${PLATFORM_HOST}
      ingress:
        ingressClassName: ${INGRESS_CLASS_NAME}
      labelBaseDomain: cpaas.io
      namespace: cpaas-system
      platformUrl: https://${PLATFORM_HOST}
      protectSecretFiles:
        enabled: false
      region: global
      registry:
        address: ${BOOTSTRAP_REGISTRY_ADDRESS}
        imagePullSecrets:
          - global-registry-auth
      replicas: 1
      scheme: https
---
apiVersion: operator.alauda.io/v1alpha1
kind: AppRelease
metadata:
  annotations:
    auto-recycle: "true"
    interval-sync: "true"
  name: cluster-api-provider-dcs
  namespace: cpaas-system
spec:
  destination:
    cluster: ""
    namespace: ""
  source:
    chartPullSecret: global-registry-auth
    charts:
      - name: ait/chart-cluster-api-provider-dcs
        releaseName: cluster-api-provider-dcs
        targetRevision: ${DCS_PROVIDER_VERSION}
    repoURL: ${BOOTSTRAP_REGISTRY_ADDRESS}
  timeout: 120
  values:
    global:
      albName: ${INGRESS_CLASS_NAME}
      auth:
        default_admin: admin@cpaas.io
      cluster:
        isGlobal: true
        name: global
        networkType: kube-ovn
        type: Baremetal
      host: ${PLATFORM_HOST}
      ingress:
        ingressClassName: ${INGRESS_CLASS_NAME}
      labelBaseDomain: cpaas.io
      namespace: cpaas-system
      platformUrl: https://${PLATFORM_HOST}
      protectSecretFiles:
        enabled: false
      region: global
      registry:
        address: ${BOOTSTRAP_REGISTRY_ADDRESS}
        imagePullSecrets:
          - global-registry-auth
      replicas: 1
      scheme: https
EOF

kubectl apply -f "${DCS_PROVIDER_APPRELEASES}"

until kubectl get crd kubeadmcontrolplanes.controlplane.cluster.x-k8s.io --ignore-not-found 2>/dev/null | grep -q kubeadmcontrolplanes.controlplane.cluster.x-k8s.io; do
  sleep 10
done

until kubectl get crd dcsclusters.infrastructure.cluster.x-k8s.io --ignore-not-found 2>/dev/null | grep -q dcsclusters.infrastructure.cluster.x-k8s.io; do
  sleep 10
done

Step 4 — Configure the Provider-Specific global Manifest

Create one provider-specific manifest for the global cluster. The manifest uses the same provider resources as a workload cluster, but it must also include the global-specific labels, annotations, registry values, installer-compatible kubeadm settings, and persistent data paths required by the platform control plane.

Use the provider creation guides as the detailed resource reference:

Apply the naming convention from Common Prerequisites to every resource in the manifest you author below.

Set KubeadmControlPlane.spec.kubeadmConfigSpec.format to the value that the target provider accepts. The provider controllers enforce this:

ProviderBootstrap userdata format
Huawei DCSignition (provider-enforced; the DCS provider rejects any other format with invalid format, expected ignition, got <other>).
VMware vSpherecloud-init (provider default; setting ignition is not supported).
Huawei Cloud Stackcloud-init (provider-enforced; the HCS provider rejects ignition with ignition format is not supported).
Huawei DCS
VMware vSphere
Huawei Cloud Stack
Bare Metal

Set the output path for the DCS global manifest before you render it.

export GLOBAL_DCS_YAML="/root/yamls/new-global.yaml"

The DCS global manifest must contain the following resources in the cpaas-system namespace:

ResourcePurpose
Secret with type: CloudCredentialStores authUser, authKey, endpoint, and site for DCS API access.
DCSIpHostnamePool for control plane nodesAssigns static IPs, hostnames, network settings, and any pool-managed persistent disks.
DCSMachineTemplate for control plane nodesDefines the DCS VM template, folder, CPU, memory, and template-local disks.
KubeadmControlPlaneBootstraps the Kubernetes control plane. Set spec.version to ${K8S_VERSION}.
DCSClusterDefines the DCS infrastructure cluster and control plane endpoint.
ClusterConnects the Cluster API Cluster to DCSCluster and KubeadmControlPlane.
DCSIpHostnamePool, DCSMachineTemplate, KubeadmConfigTemplate, and MachineDeployment for workersCreates worker nodes.

Use the DCS resource fields from Creating Clusters on Huawei DCS and Infrastructure Resources for Huawei DCS. For the global cluster, keep these additional requirements:

  • Set Cluster.metadata.name and DCSCluster.metadata.name to global (the infra cluster shares the CAPI Cluster name). Prefix every other CAPI resource and provider resource with global-; the wiring fragment below uses KubeadmControlPlane.metadata.name: global-kcp.
  • Add Cluster.metadata.labels.is-global: "true" and Cluster.metadata.labels.cluster-type: DCS.
  • Add Cluster.metadata.annotations["cpaas.io/registry-address"] with ${NODE_REGISTRY_ADDRESS}.
  • Set KubeadmControlPlane.spec.kubeadmConfigSpec.format: ignition for MicroOS.
  • Keep the release manifest's non-encryption kubeadm files, kubelet patches, audit policy, and installer RBAC entries.
  • For a normal non-DR deployment, do not set DCSCluster.spec.encryptionProviderConfigRef and do not add /etc/kubernetes/encryption-provider.conf to KubeadmControlPlane.spec.kubeadmConfigSpec.files.
  • Keep /var/cpaas as platform state. If you need the disk to survive rolling replacement, declare it in DCSIpHostnamePool.spec.pool[].persistentDisk; do not rely on DCSMachineTemplate template disks as preserved state.
  • Use concrete datastoreName values for DCS local storage unless you have verified that the selected datastore cluster can place volumes on hosts that can run the target VM.
Fragment Scope

The following YAML is a differential fragment, not a complete manifest that you can apply directly. Merge these global-specific changes into the manifest that you prepare from the DCS create-cluster references, then apply the complete manifest file.

The following fragment shows the global-specific Cluster API wiring. Fill the provider resource fields by using the DCS create-cluster references above.

apiVersion: cluster.x-k8s.io/v1beta1
kind: Cluster
metadata:
  name: global
  namespace: cpaas-system
  labels:
    cluster-type: DCS
    is-global: "true"
  annotations:
    capi.cpaas.io/resource-group-version: infrastructure.cluster.x-k8s.io/v1beta1
    capi.cpaas.io/resource-kind: DCSCluster
    cpaas.io/registry-address: "${NODE_REGISTRY_ADDRESS}"
spec:
  clusterNetwork:
    pods:
      cidrBlocks:
        - ${CLUSTER_CIDR}
    services:
      cidrBlocks:
        - ${SERVICE_CIDR}
  controlPlaneRef:
    apiVersion: controlplane.cluster.x-k8s.io/v1beta1
    kind: KubeadmControlPlane
    name: global-kcp
  infrastructureRef:
    apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
    kind: DCSCluster
    name: global
---
apiVersion: controlplane.cluster.x-k8s.io/v1beta1
kind: KubeadmControlPlane
metadata:
  name: global-kcp
  namespace: cpaas-system
  annotations:
    controlplane.cluster.x-k8s.io/skip-kube-proxy: ""
spec:
  replicas: 3
  version: ${K8S_VERSION}
  rolloutStrategy:
    type: RollingUpdate
    rollingUpdate:
      maxSurge: 0
  machineTemplate:
    nodeDrainTimeout: 1m
    nodeDeletionTimeout: 5m
    infrastructureRef:
      apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
      kind: DCSMachineTemplate
      name: global-master-template
  kubeadmConfigSpec:
    format: ignition
    clusterConfiguration:
      etcd:
        local:
          serverCertSANs:
            - "${CONTROL_PLANE_VIP}"
            - "${PLATFORM_HOST}"

Step 5 — Apply the global Manifest

Apply the provider-specific manifest to minialauda.

Huawei DCS
VMware vSphere
Huawei Cloud Stack
Bare Metal
kubectl apply -f "${GLOBAL_DCS_YAML}"

Step 6 — Wait for the Control Plane

Wait for the Cluster API provider to provision the virtual machines and bring up the Kubernetes control plane.

kubectl get clusters.cluster.x-k8s.io -n cpaas-system
kubectl get kubeadmcontrolplane -n cpaas-system
kubectl get machines -n cpaas-system

The control plane is ready when the KubeadmControlPlane reports Ready: True and the Cluster reports Phase: Provisioned.

Step 7 — Import Provider Resources

Before triggering the installer, create the dcs-import-extra-resources ConfigMap in the cpaas-system namespace for providers that require extra resource import. The ConfigMap name keeps the dcs prefix for historical installer compatibility, even when the provider is not Huawei DCS.

VMware vSphere and Huawei Cloud Stack require this ConfigMap for both normal and disaster recovery global installations. Huawei DCS does not require it for the default installation because DCS provider resources are migrated by the built-in flow; create it for DCS only when you need to import additional resources beyond the built-in provider resource migration.

Huawei DCS
VMware vSphere
Huawei Cloud Stack
Bare Metal

Do not create the DCS dcs-import-extra-resources ConfigMap for the default DCS installation path. DCS provider resources are migrated by the built-in flow.

Step 8 — Trigger the Platform Installation

Submit the platform installation request to the embedded installer REST API. The installer imports the Cluster API resources into the new global cluster, deploys the base operator, and installs the selected plugins.

export INSTALLER_IP=$(kubectl get pods -n cpaas-system -l service_name=cpaas-installer \
  -o jsonpath='{.items[0].status.podIP}')
Network Scope

INSTALLER_IP is the Pod IP of the embedded installer in minialauda. The endpoint is used only during installation.

Create the provider-specific installer configuration JSON file on the current KIND host, then submit it to the installer endpoint. DCS, VMware vSphere, and HCS use the same endpoint path, but their request bodies are different.

FieldHuawei DCSVMware vSphereHuawei Cloud Stack
Endpoint path/cpaas-installer/api/config/dcs/cpaas-installer/api/config/dcs/cpaas-installer/api/config/dcs
console.hostLocal global HA VIP listEmpty list, []Empty list, []
console.globalHostPlatform access addressPlatform access addressPlatform access address
cluster.clusterCIDR and cluster.serviceCIDRRequiredNot set; cluster CIDRs are declared in the VMware vSphere Cluster manifestNot set
cluster.features.haRequired, points to the local HA VIP with isThirdParty: trueNot set; the control plane endpoint is declared in VSphereCluster.spec.controlPlaneEndpoint.hostNot set; HCS ELB is declared in HCSCluster
hostIPCurrent KIND host IPCurrent KIND host IPCurrent KIND host IP
Huawei DCS
VMware vSphere
Huawei Cloud Stack
Bare Metal

The DCS installer request includes the external HA VIP because DCS uses a third-party control plane VIP.

mkdir -p /root/yamls
export INSTALLER_CONFIG_JSON="/root/yamls/installer-config-dcs.json"

cat > "${INSTALLER_CONFIG_JSON}" <<EOF
{
  "basic": {
    "username": "admin@cpaas.io",
    "password": "<base64-platform-admin-password>"
  },
  "registry": {
    "domain": "${REGISTRY_DOMAIN}",
    "username": "<registry-username>",
    "password": "<base64-registry-password>"
  },
  "console": {
    "host": [
      "${CONTROL_PLANE_VIP}"
    ],
    "globalHost": "${PLATFORM_HOST}",
    "httpPort": 80,
    "httpsPort": 443,
    "cert": {
      "selfSigned": {}
    }
  },
  "cluster": {
    "clusterCIDR": "${CLUSTER_CIDR}",
    "serviceCIDR": "${SERVICE_CIDR}",
    "features": {
      "ha": {
        "vip": "${CONTROL_PLANE_VIP}",
        "vport": 6443,
        "isThirdParty": true
      }
    }
  },
  "product": [
    "base",
    "acp"
  ],
  "deployMode": "normal",
  "hostIP": "${HOST_IP}"
}
EOF

curl -k -X POST "http://${INSTALLER_IP}:8080/cpaas-installer/api/config/dcs" \
  -H 'Content-Type: application/json' \
  -d @"${INSTALLER_CONFIG_JSON}"

Set console.host and cluster.features.ha.vip to the local global HA VIP. Do not use the platform domain in console.host; use console.globalHost for the platform access address.

Third-Party Console Certificates

The examples use a self-signed console certificate. If the environment requires a third-party certificate, replace console.cert with a thirdParty block that contains the base64 full certificate chain, private key, and optional PKCS#12 values before you submit the installer request.

Step 9 — Monitor the Installation

After the installer accepts the request, the install runs through several phases that are observable from the KIND host. A typical immutable-OS global cluster takes 30–60 minutes; total time depends on IaaS provisioning speed, image pull time, and the number of plugins selected.

Phases You Will Observe

PhaseWhat is happeningFirst place to watch
BootstrapThe bootstrap KIND, embedded registry, and Cluster API providers are running on the KIND host. Completed in Step 2 and Step 3.KIND host terminal; kubectl get pods -n cpaas-system
Infrastructure provisioningThe Cluster API provider creates VMs from the MicroOS template on the target IaaS platform.kubectl get machines -n cpaas-system
Control plane bootstrapKubeadmControlPlane bootstraps the first control plane node, etcd starts, and additional control plane nodes join.kubectl get kubeadmcontrolplane -n cpaas-system
Network and core add-onsThe CAPI provider reconciles Kube-OVN, CoreDNS, and kube-proxy on the new cluster.kubectl --kubeconfig <global-kubeconfig> get pods -n kube-system
Platform installationThe installer imports Cluster API resources into the new global cluster, deploys the base operator, and installs the selected plugins.Installer progress API; installer log
CompletionThe installer marks the request as Success and writes the final cluster state into ClusterModule/global.Installer progress API; kubectl --kubeconfig <global-kubeconfig> get clustermodule global

Signals During Installation

Watch the installer progress API and the installer log together. If one appears stalled, check the underlying Cluster API resources directly on the bootstrap KIND host.

# Installer progress and live log
curl "http://${INSTALLER_IP}:8080/cpaas-installer/api/progress"
tail -f /var/cpaas/data/installer.log

# Cluster API resources on the bootstrap KIND host
kubectl get clusters.cluster.x-k8s.io -A
kubectl get kubeadmcontrolplane -A
kubectl get machines -A

The installer log records every phase transition. Transient errors retry on a short interval; persistent errors stay visible in the log and surface in the progress API as a stalled stage.

Check the global cluster after the installer reports success.

kubectl --kubeconfig <global-kubeconfig> get nodes
kubectl --kubeconfig <global-kubeconfig> get pods -n cpaas-system
kubectl --kubeconfig <global-kubeconfig> get clustermodule global

Common Stalls and Where to Look

SymptomFirst place to lookWhat you are looking for
Machines stay in Pending or do not appearkubectl describe machine -n cpaas-system <machine>The provider-specific failure reason on the machine Bootstrap and Infrastructure conditions. IaaS quota, network, and credential issues surface here.
KubeadmControlPlane does not reach Readykubectl get nodes with the new cluster kubeconfig and kubectl describe kubeadmcontrolplane -n cpaas-systemetcd health on the first control plane node and join progress for the remaining nodes.
Pods in kube-system stay Pending or fail to pull imageskubectl --kubeconfig <global-kubeconfig> describe pod -n kube-system <pod>Image pull errors usually mean the node-facing registry address is not reachable from the new cluster's subnet.
Installer progress API shows a stalled stage/var/cpaas/data/installer.logThe most recent phase line and the most recent error message. Retried errors repeat on a short interval; persistent errors do not advance.
ClusterModule/global does not reach a healthy phasekubectl --kubeconfig <global-kubeconfig> describe clustermodule globalThe Status.conditions describe which module is blocking the cluster from completing.

Issues that are not listed here usually point to environment-specific causes. Capture the installer log, the progress API response, and the relevant kubectl describe output, then escalate.

Optional Disaster Recovery Deployment

Use this section when you deploy primary and standby global clusters for disaster recovery. Complete these additions before you apply the provider-specific manifest for each global cluster.

Primary and standby clusters must use the same encryption provider configuration. For DCS and HCS, normal non-DR deployments do not add /etc/kubernetes/encryption-provider.conf to KubeadmControlPlane.spec.kubeadmConfigSpec.files; for DCS, normal non-DR deployments also do not set DCSCluster.spec.encryptionProviderConfigRef. VMware vSphere keeps the release manifest's /etc/kubernetes/encryption-provider.conf file entry.

Prepare Shared DR Variables

Set the same encryption key value on both the primary and standby installation environments.

export ENCRYPTION_PROVIDER_CONF="/root/yamls/encryption-provider.conf"
export ENCRYPTION_PROVIDER_SECRET_B64="<base64-shared-etcd-encryption-key>"
export PRIMARY_CLUSTER_VIP="<primary-ha-vip>"
export STANDBY_CLUSTER_VIP="<standby-ha-vip>"
export ETCD_SYNC_VERSION="<global-etcd-sync-version>"
export ETCD_SYNC_MODULEINFO="/root/yamls/global-etcd-sync-moduleinfo.json"

Create the encryption provider configuration file on both installation environments.

mkdir -p "$(dirname "${ENCRYPTION_PROVIDER_CONF}")"
cat > "${ENCRYPTION_PROVIDER_CONF}" <<EOF_CONF
apiVersion: apiserver.config.k8s.io/v1
kind: EncryptionConfiguration
resources:
- resources:
  - secrets
  providers:
  - aescbc:
      keys:
      - name: key1
        secret: ${ENCRYPTION_PROVIDER_SECRET_B64}
EOF_CONF

Add DR Certificate SANs to KubeadmControlPlane

In the manifest generated in Step 4, include both the primary and standby control plane VIPs and the platform access address in KubeadmControlPlane.spec.kubeadmConfigSpec.clusterConfiguration.etcd.local.serverCertSANs. Use the same SAN list on both the primary and standby installation environments.

serverCertSANs:
  - "${PRIMARY_CLUSTER_VIP}"
  - "${STANDBY_CLUSTER_VIP}"
  - "${PLATFORM_HOST}"

Add Provider-Specific DR Fields

Huawei DCS
VMware vSphere
Huawei Cloud Stack
Bare Metal

Create the encryption provider Secret in minialauda.

kubectl create secret generic encryption-provider-config \
  --from-file=encryption-provider.conf="${ENCRYPTION_PROVIDER_CONF}" \
  -n cpaas-system \
  --dry-run=client -o yaml | kubectl apply -f -

Add the Secret reference to DCSCluster.spec.

encryptionProviderConfigRef:
  name: encryption-provider-config

DCS uses DCSCluster.spec.encryptionProviderConfigRef to deliver the disaster recovery encryption provider configuration. Do not add /etc/kubernetes/encryption-provider.conf to KubeadmControlPlane.spec.kubeadmConfigSpec.files for the DCS DR path.

If you created dcs-import-extra-resources, keep the ConfigMap on both the primary and standby installation environments.

Install Primary and Standby Clusters

Run Steps 1 through 9 for both the primary and standby global clusters.

Use the provider-specific installer configuration differences for both sides:

ProviderPrimary installationStandby installation
Huawei DCSSet console.host and cluster.features.ha.vip to the primary HA VIP.Set console.host and cluster.features.ha.vip to the standby HA VIP.
VMware vSphereSet VSphereCluster.spec.controlPlaneEndpoint.host to the primary HA VIP used by the primary manifest. Create the VMware vSphere dcs-import-extra-resources ConfigMap from Step 7 and keep global-vsphere-credentials aligned with VSphereCluster.spec.identityRef.name.Set VSphereCluster.spec.controlPlaneEndpoint.host to the standby HA VIP used by the standby manifest. Create the VMware vSphere dcs-import-extra-resources ConfigMap from Step 7 and keep global-vsphere-credentials aligned with VSphereCluster.spec.identityRef.name.
Huawei Cloud StackKeep console.host: []; the primary VIP is managed by the HCS ELB. Create the HCS dcs-import-extra-resources ConfigMap from Step 7 and keep HCS_SECRET_NAME aligned with HCSCluster.spec.identityRef.name.Keep console.host: []; the standby VIP is managed by the HCS ELB. Create the HCS dcs-import-extra-resources ConfigMap from Step 7 and keep HCS_SECRET_NAME aligned with HCSCluster.spec.identityRef.name.

For the primary cluster, make sure the platform domain resolves to the primary HA VIP. In Step 8, set hostIP to the primary KIND node IP. For DCS, set console.host and cluster.features.ha.vip to the primary HA VIP. For VMware vSphere, set the control plane endpoint in the primary manifest to the primary HA VIP. For HCS, keep console.host: [] because the VIP is owned by the HCS ELB.

After the primary cluster installation succeeds, switch the platform domain to the standby HA VIP as required by the DR procedure. Then install the standby cluster. In Step 8 on the standby KIND host, set hostIP to the standby KIND node IP. For DCS, set console.host and cluster.features.ha.vip to the standby HA VIP. For VMware vSphere, set the control plane endpoint in the standby manifest to the standby HA VIP. For HCS, keep console.host: []. Get INSTALLER_IP from the cpaas-installer Pod on the standby KIND host; do not reuse the primary KIND host value.

After both clusters are installed, get the primary k8sadmin token on a primary control plane node. etcd-sync is installed only on the standby cluster, and its active_cluster_* values point to the primary cluster. Keep this value in its original base64 Secret form for active_cluster_token.

export PRIMARY_CLUSTER_TOKEN_B64="$(sudo kubectl get secret -n cpaas-system k8sadmin -o jsonpath='{.data.token}')"

Get the standby k8sadmin token on a standby control plane node. Use this decoded bearer token to call the standby cluster ModuleInfo API.

export STANDBY_CLUSTER_BEARER_TOKEN="$(sudo kubectl get secret -n cpaas-system k8sadmin -o jsonpath='{.data.token}' | base64 -d)"

If you create the global-etcd-sync ModuleInfo payload from a different host, securely transfer the decoded value from the standby control plane node and export it there.

export STANDBY_CLUSTER_BEARER_TOKEN="<decoded-standby-token>"

Create the global-etcd-sync ModuleInfo payload for the standby cluster. The active_cluster_vip and active_cluster_token values must point to the primary cluster.

cat > "${ETCD_SYNC_MODULEINFO}" <<EOF
{
  "kind": "ModuleInfo",
  "apiVersion": "cluster.alauda.io/v1alpha1",
  "metadata": {
    "name": "global-etcd-sync",
    "labels": {
      "cpaas.io/cluster-name": "global",
      "cpaas.io/module-name": "etcd-sync",
      "cpaas.io/module-type": "plugin"
    }
  },
  "spec": {
    "version": "${ETCD_SYNC_VERSION}",
    "config": {
      "monitor_check_interval": 1,
      "detail": false,
      "active_cluster_vip": "${PRIMARY_CLUSTER_VIP}",
      "active_cluster_token": "${PRIMARY_CLUSTER_TOKEN_B64}"
    }
  }
}
EOF

Install global-etcd-sync by calling the ModuleInfo API on the standby cluster.

curl -sk -X POST "https://${STANDBY_CLUSTER_VIP}/apis/cluster.alauda.io/v1alpha1/moduleinfoes" \
  -H "Authorization: Bearer ${STANDBY_CLUSTER_BEARER_TOKEN}" \
  -H "Content-Type: application/json" \
  -d @"${ETCD_SYNC_MODULEINFO}"

Restart the Pods that must reload DR and endpoint configuration. Run the same commands on a primary control plane node and on a standby control plane node.

sudo kubectl delete po -n cpaas-system -l 'service_name in (alertmanager,vmselect,vminsert)'
sudo kubectl delete po -n cpaas-system -l service_name=cpaas-elasticsearch
sudo kubectl delete po -n cpaas-system -l service_name=cluster-transformer

For the DR lifecycle after installation, see Global Cluster Disaster Recovery.

Verification

After the installer reports completion, verify that the global cluster is healthy.

kubectl --kubeconfig <global-kubeconfig> get nodes
kubectl --kubeconfig <global-kubeconfig> get clusters.platform.tkestack.io global \
  -o jsonpath='{.status.phase}'
kubectl --kubeconfig <global-kubeconfig> get pods -n cpaas-system
kubectl --kubeconfig <global-kubeconfig> get clustermodule global

The installation is successful when all of the following conditions are true:

  • The installer progress API reports status: Success and type: Complete.
  • All global cluster nodes are Ready.
  • Critical Pods in cpaas-system are Running or Completed.
  • ClusterModule/global reports the base module as healthy.

Next Steps