Configuring a High Availability PostgreSQL Cluster
TOC
Overview
This guide explains how to configure a highly available PostgreSQL cluster on Kubernetes, using Patroni for automated failover and cluster management, with support for synchronous replication, connection pooling, monitoring and more.
Prerequisites
- PostgreSQL Operator must be installed
- A storage class supporting dynamic provisioning must be configured
- Permissions to create CRD resources must be granted
- Ensure the Kubernetes cluster has sufficient resources (at least 3 available nodes)
- Backup storage must be configured (S3 or compatible storage)
Procedure
1. Create a High Availability Cluster
cat <<EOF | kubectl create -n $NAMESPACE -f -
apiVersion: acid.zalan.do/v1
kind: postgresql
metadata:
name: pg-ha-cluster
spec:
teamId: ACID
enableExporter: true
enablePgpool2: false
spiloPrivileged: false
spiloRunAsGroup: 103
spiloRunAsUser: 101
spiloAllowPrivilegeEscalation: false
enableReadinessProbe: true
numberOfInstances: 3
postgresql:
version: "14"
parameters:
shared_buffers: "1GB"
work_mem: "64MB"
resources:
requests:
cpu: "1"
memory: 2Gi
limits:
cpu: "2"
memory: 4Gi
volume:
size: 50Gi
storageClass: ssd
iops: 3000
throughput: 125
patroni:
ttl: 30
loop_wait: 10
retry_timeout: 10
maximum_lag_on_failover: 1048576
synchronous_mode: true
synchronous_node_count: 2
pg_hba:
- host all all 0.0.0.0/0 md5
backup:
schedule: "0 0 * * *"
retainDay: 7
storage:
bucket: "my-backups"
name: "backup-storage"
namespace: "default"
EOF
2. Verify Cluster Status
kubectl -n $NAMESPACE get postgresql pg-ha-cluster -o yaml
Example Output:
status:
PostgresClusterStatus: Running
pods:
- pg-ha-cluster-0
- pg-ha-cluster-1
- pg-ha-cluster-2
master: pg-ha-cluster-0
patroniStatus:
pg-ha-cluster-0:
role: leader
state: running
replication:
state: streaming
sync_state: sync
pg-ha-cluster-1:
role: replica
state: running
replication:
state: streaming
sync_state: async
3. Configure Connection Pooler (Optional)
kubectl patch postgresql pg-ha-cluster --type='merge' -p '
spec:
enableConnectionPooler: true
connectionPooler:
numberOfInstances: 2
mode: "transaction"
resources:
requests:
cpu: "500m"
memory: "512Mi"
limits:
cpu: "1"
memory: "1Gi"
'
4. Configure Monitoring (Optional)
kubectl patch postgresql pg-ha-cluster --type='merge' -p '
spec:
enableExporter: true
exporter:
resources:
requests:
cpu: "200m"
memory: "256Mi"
limits:
cpu: "500m"
memory: "512Mi"
'
Key Parameters
Cluster Configuration
Parameter | Default Value | Description |
---|
numberOfInstances | 3 | Number of cluster nodes (recommend odd number) |
postgresql.version | "14" | PostgreSQL major version |
postgresql.parameters | | PostgreSQL configuration parameters |
resources | - | Resource requests and limits |
volume.size | - | Storage volume size |
volume.storageClass | - | Storage class name |
volume.iops | - | IOPS (only for storage classes supporting IOPS) |
Patroni Configuration
Parameter | Default Value | Description |
---|
patroni.ttl | 30 | Master node lease time (seconds) |
patroni.loop_wait | 10 | Status check interval (seconds) |
patroni.retry_timeout | 10 | Retry timeout (seconds) |
patroni.maximum_lag_on_failover | 1048576 | Maximum allowed replication lag (bytes) |
patroni.synchronous_mode | false | Whether to enable synchronous replication |
patroni.synchronous_node_count | 1 | Number of synchronous replication nodes |
patroni.pg_hba | [] | Custom pg_hba configuration |
Backup Configuration
Parameter | Default Value | Description |
---|
backup.schedule | - | Backup schedule (cron format) |
backup.retainDay | 7 | Backup retention days |
backup.storage.bucket | - | Backup storage bucket name |
backup.storage.name | - | Storage configuration name |
backup.storage.namespace | - | Storage configuration namespace |
Result Validation
- Verify cluster status is Running
- Confirm all Pods are running normally
- Check Patroni status:
kubectl exec pg-ha-cluster-0 -- patronictl list
- Test failover:
kubectl delete pod pg-ha-cluster-0
Observe new leader election process
- Verify synchronous replication status:
kubectl exec pg-ha-cluster-0 -- psql -c "SELECT * FROM pg_stat_replication;"
Best Practices
- For production environments, use SSD storage with appropriate IOPS configuration
- Configure resource limits with 20-30% buffer
- Regularly test failover and recovery procedures
- Configure monitoring alerts for:
- Primary-replica lag
- Connection counts
- Disk usage
- CPU/memory usage
- Enable scheduled backups and test restore procedures
- When using synchronous replication, configure at least 2 synchronous replicas
- Perform regular performance tuning and parameter optimization
Learn More