Architecture
TOC
Functional Perspective
()'s complete functionality consists of Core and extensions based on two technical stacks: Operator and Cluster Plugin.
-
Core
The minimal deliverable unit of , providing core capabilities such as cluster management, container orchestration, projects, and user administration.
- Meets the highest security standards
- Delivers maximum stability
- Offers the longest support lifecycle
-
Extensions
Extensions in both the Operator and Cluster Plugin stacks can be classified into:
- Aligned – Life cycle strategy consisting of multiple maintenance streams, with alignment to .
- Agnostic – Life cycle strategy consisting of multiple maintenance streams, released independently from .
For more details about extensions, see Extend.
Deployment Perspective
is composed of a global
cluster and one or more workload clusters.
-
global
Cluster
- The central hub for multi-cluster management
- All clusters must be registered to
global
before they can be managed
- Hosts multi-cluster and cross-cluster functionality
- Kubernetes is deployed and managed by the platform
-
Workload Cluster
- Hosts user workloads and services
- Kubernetes may be deployed by the platform or provided by third parties
- Supports Kubernetes services from major cloud providers as well as CNCF-compliant Kubernetes clusters
- In certain scenarios, the
global
cluster may also host business workloads
Technical Perspective
Platform Component Runtime
All platform components run as containers within a Kubernetes management cluster (the global
cluster).
High Availability Architecture
- The
global
cluster typically consists of at least three control plane nodes and multiple worker nodes
- High availability of etcd is central to cluster HA; see Key Component High Availability Mechanisms for details
- Load balancing can be provided by an external load balancer or a self-built VIP inside the cluster
Request Routing
- Client requests first pass through the load balancer or self-built VIP
- Requests are forwarded to ALB (the platform's default Kubernetes Ingress Gateway) running on designated ingress nodes (or control-plane nodes if configured)
- ALB routes traffic to the target component pods according to configured rules
Replica Strategy
- Core components run with at least two replicas
- Key components (such as registry, MinIO, ALB) run with three replicas
Fault Tolerance & Self-healing
- Achieved through cooperation between kubelet, kube-controller-manager, kube-scheduler, kube-proxy, ALB, and other components
- Includes health checks, failover, and traffic redirection
Data Storage & Recovery
- Control-plane configuration and platform state are stored in etcd as Kubernetes resources
- In catastrophic failures, recovery can be performed from etcd snapshots
Primary / Standby Disaster Recovery
- Two separate
global
clusters: Primary Cluster and Standby Cluster
- The disaster recovery mechanism is based on real-time synchronization of etcd data from the Primary Cluster to the Standby Cluster.
- If the Primary Cluster becomes unavailable due to a failure, services can quickly switch to the Standby Cluster.
Key Component High Availability Mechanisms
etcd
- Deployed on three (or five) control plane nodes
- Uses the RAFT protocol for leader election and data replication
- Three-node deployments tolerate up to one node failure; five-node deployments tolerate up to two
- Supports local and remote S3 snapshot backups
Monitoring Components
- Prometheus: Multiple instances, deduplication with Thanos Query, and cross-region redundancy
- VictoriaMetrics: Cluster mode with distributed VMStorage, VMInsert, and VMSelect components
Logging Components
- Nevermore collects logs and audit data
- Kafka / Elasticsearch / Razor / Lanaya are deployed in distributed and multi-replica modes
Networking Components (CNI)
- Kube-OVN / Calico / Flannel: Achieve HA via stateless DaemonSets or triple-replica control plane components
ALB
- Operator deployed with three replicas, leader election enabled
- Instance-level health checks and load balancing
Self-built VIP
- High-availability virtual IP based on Keepalived
- Supports heartbeat detection and active-standby failover
Harbor
- ALB-based load balancing
- PostgreSQL with Patroni HA
- Redis Sentinel mode
- Stateless services deployed in multiple replicas
Registry and MinIO
- Registry deployed with three replicas
- MinIO in distributed mode with erasure coding, data redundancy, and automatic recovery