High Availability Architecture
Architecture Overview
The PostgreSQL Operator utilizes Patroni to achieve high availability, which primarily includes the following components:
- Leader Election: The primary node is elected via etcd or the Kubernetes API.
- Fault Detection: Continuous monitoring of the cluster's health status.
- Automated Failover: Automatic switching when the primary node fails.
- Configuration Management: Unified management of cluster configuration.
Working Principle
- Patroni monitors the status of PostgreSQL instances.
- It regularly reports the status to the DCS (Distributed Configuration Store).
- When the primary node fails:
- The failure of the primary node is detected.
- The failed node is removed from DCS.
- A new primary node is elected.
- The replica nodes are reconfigured.
Configuration Parameters
Parameter | Default Value | Description |
---|
ttl | 30 | Primary node lease time (seconds) |
loop_wait | 10 | Status check interval (seconds) |
retry_timeout | 10 | Retry timeout (seconds) |
maximum_lag_on_failover | 1048576 | Maximum allowable replication lag (bytes) |
Best Practices
- Use an odd number of nodes (3 nodes recommended).
- Configure reasonable resource limits.
- Regularly test failover.
- Monitor the cluster's health status.