High Availability Architecture

Architecture Overview

The PostgreSQL Operator utilizes Patroni to achieve high availability, which primarily includes the following components:

  1. Leader Election: The primary node is elected via etcd or the Kubernetes API.
  2. Fault Detection: Continuous monitoring of the cluster's health status.
  3. Automated Failover: Automatic switching when the primary node fails.
  4. Configuration Management: Unified management of cluster configuration.

Working Principle

  1. Patroni monitors the status of PostgreSQL instances.
  2. It regularly reports the status to the DCS (Distributed Configuration Store).
  3. When the primary node fails:
    • The failure of the primary node is detected.
    • The failed node is removed from DCS.
    • A new primary node is elected.
    • The replica nodes are reconfigured.

Configuration Parameters

ParameterDefault ValueDescription
ttl30Primary node lease time (seconds)
loop_wait10Status check interval (seconds)
retry_timeout10Retry timeout (seconds)
maximum_lag_on_failover1048576Maximum allowable replication lag (bytes)

Best Practices

  1. Use an odd number of nodes (3 nodes recommended).
  2. Configure reasonable resource limits.
  3. Regularly test failover.
  4. Monitor the cluster's health status.