Architecture
TOC
Introduction to
The () provides an enterprise-grade Kubernetes-based platform that enables organizations to build, deploy, and manage applications consistently across hybrid and multi-cloud environments. integrates core Kubernetes capabilities with enhanced management, observability, and security services, offering a unified control plane and flexible workload clusters.
The architecture follows a hub-and-spoke model, consisting of a global cluster and multiple workload clusters. This design provides centralized governance while allowing independent workload execution and scalability.

Core Architectural Components
Global Cluster
The global cluster serves as the centralized management and control hub of . It provides platform-wide services such as authentication, policy management, cluster lifecycle operations, and observability. It's also a central hub for multi-cluster management and provides cross-cluster functionality.
Key components include:
- Gateway
Acts as the main entry point to the platform. It manages API requests from the UI, CLI (kubectl), and automation tools, routing them to appropriate backend services.
- Authentication and Authorization (Auth)
Integrates with external Identity Providers (IdPs) to provide Single Sign-On (SSO) and RBAC-based access control.
- Web Console
Provides a web-based interface for . It interfaces with platform APIs through the gateway.
- Cluster Management
Handles the registration, provisioning, and lifecycle management of workload clusters.
- Services
- Operator Lifecycle Manager (OLM) and Cluster Plugins
Manages the installation, updates, and lifecycle of operators and cluster extensions.
- Internal Image Registry
Offers an out-of-box integrated container image repository with role-based access.
- Observability
Provides centralized logging, metrics, and tracing for both the
global and workload clusters.
- Cluster Proxy
Enables secure communication between the
global cluster and workload clusters.
Workload Cluster
Workload clusters are Kubernetes-based environments managed by the global cluster. Each workload cluster runs isolated application workloads and inherits governance and configuration from the central control plane.
External Integrations
- Identity Provider (IdP)
Supports federated authentication via standard protocols (OIDC, SAML) for unified user management.
- API and CLI Access
Users can interact with through RESTful APIs, the web console, or command-line tools like
kubectl and ac.
- Load Balancer (VIP/DNS/SLB)
Provides high availability and traffic distribution to the Gateway and ingress endpoints of the
global and workload Clusters.
Scalability and High Availability
is designed for horizontal scalability and high availability:
- Each component can be deployed redundantly to eliminate single points of failure.
- The
global cluster supports managing dozens to hundreds of workload clusters.
- Workload clusters can scale independently according to workload demand.
- The use of VIP/DNS/Ingress ensures seamless routing and failover.
Functional Perspective
()'s complete functionality consists of Core and extensions based on two technical stacks: Operator and Cluster Plugin.
-
Core
The minimal deliverable unit of , providing core capabilities such as cluster management, container orchestration, projects, and user administration.
- Meets the highest security standards
- Delivers maximum stability
- Offers the longest support lifecycle
-
Extensions
Extensions in both the Operator and Cluster Plugin stacks can be classified into:
- Aligned – Life cycle strategy consisting of multiple maintenance streams, with alignment to .
- Agnostic – Life cycle strategy consisting of multiple maintenance streams, released independently from .
For more details about extensions, see Extend.
Technical Perspective
Platform Component Runtime
All platform components run as containers within a Kubernetes management cluster (the global cluster).
High Availability Architecture
- The
global cluster typically consists of at least three control plane nodes and multiple worker nodes
- High availability of etcd is central to cluster HA; see Key Component High Availability Mechanisms for details
- Load balancing can be provided by an external load balancer or a self-built VIP inside the cluster
Request Routing
- Client requests first pass through the load balancer or self-built VIP
- Requests are forwarded to ALB (the platform's default Kubernetes Ingress Gateway) running on designated ingress nodes (or control-plane nodes if configured)
- ALB routes traffic to the target component pods according to configured rules
Replica Strategy
- Core components run with at least two replicas
- Key components (such as registry, MinIO, ALB) run with three replicas
Fault Tolerance & Self-healing
- Achieved through cooperation between kubelet, kube-controller-manager, kube-scheduler, kube-proxy, ALB, and other components
- Includes health checks, failover, and traffic redirection
Data Storage & Recovery
- Control-plane configuration and platform state are stored in etcd as Kubernetes resources
- In catastrophic failures, recovery can be performed from etcd snapshots
Primary / Standby Disaster Recovery
- Two separate
global clusters: Primary Cluster and Standby Cluster
- The disaster recovery mechanism is based on real-time synchronization of etcd data from the Primary Cluster to the Standby Cluster.
- If the Primary Cluster becomes unavailable due to a failure, services can quickly switch to the Standby Cluster.
Key Component High Availability Mechanisms
etcd
- Deployed on three (or five) control plane nodes
- Uses the RAFT protocol for leader election and data replication
- Three-node deployments tolerate up to one node failure; five-node deployments tolerate up to two
- Supports local and remote S3 snapshot backups
Monitoring Components
- Prometheus: Multiple instances, deduplication with Thanos Query, and cross-region redundancy
- VictoriaMetrics: Cluster mode with distributed VMStorage, VMInsert, and VMSelect components
Logging Components
- Nevermore collects logs and audit data
- Kafka / Elasticsearch / Razor / Lanaya are deployed in distributed and multi-replica modes
Networking Components (CNI)
- Kube-OVN / Calico / Flannel: Achieve HA via stateless DaemonSets or triple-replica control plane components
ALB
- Operator deployed with three replicas, leader election enabled
- Instance-level health checks and load balancing
Self-built VIP
- High-availability virtual IP based on Keepalived
- Supports heartbeat detection and active-standby failover
Harbor
- ALB-based load balancing
- PostgreSQL with Patroni HA
- Redis Sentinel mode
- Stateless services deployed in multiple replicas
Registry and MinIO
- Registry deployed with three replicas
- MinIO in distributed mode with erasure coding, data redundancy, and automatic recovery