Configuring a High Availability PostgreSQL Cluster

TOC

Overview

This guide explains how to configure a highly available PostgreSQL cluster on Kubernetes, using Patroni for automated failover and cluster management, with support for synchronous replication, connection pooling, monitoring and more.

Prerequisites

  1. PostgreSQL Operator must be installed
  2. A storage class supporting dynamic provisioning must be configured
  3. Permissions to create CRD resources must be granted
  4. Ensure the Kubernetes cluster has sufficient resources (at least 3 available nodes)
  5. Backup storage must be configured (S3 or compatible storage)

Procedure

CLI
Web Console

1. Create a High Availability Cluster

cat <<EOF | kubectl create -n $NAMESPACE -f -
apiVersion: acid.zalan.do/v1
kind: postgresql
metadata:
  name: pg-ha-cluster
spec:
  teamId: ACID
  enableExporter: true
  enablePgpool2: false
  spiloPrivileged: false
  spiloRunAsGroup: 103
  spiloRunAsUser: 101
  spiloAllowPrivilegeEscalation: false
  enableReadinessProbe: true
  numberOfInstances: 3
  postgresql:
    version: "14"
    parameters:
      shared_buffers: "1GB"
      work_mem: "64MB"
  resources:
    requests:
      cpu: "1"
      memory: 2Gi
    limits:
      cpu: "2"
      memory: 4Gi
  volume:
    size: 50Gi
    storageClass: ssd
    iops: 3000
    throughput: 125
  patroni:
    ttl: 30
    loop_wait: 10
    retry_timeout: 10
    maximum_lag_on_failover: 1048576
    synchronous_mode: true
    synchronous_node_count: 2
    pg_hba:
      - host all all 0.0.0.0/0 md5
  backup:
    schedule: "0 0 * * *"
    retainDay: 7
    storage:
      bucket: "my-backups"
      name: "backup-storage"
      namespace: "default"
EOF

2. Verify Cluster Status

kubectl -n $NAMESPACE get postgresql pg-ha-cluster -o yaml

Example Output:

status:
  PostgresClusterStatus: Running
  pods:
  - pg-ha-cluster-0
  - pg-ha-cluster-1
  - pg-ha-cluster-2
  master: pg-ha-cluster-0
  patroniStatus:
    pg-ha-cluster-0:
      role: leader
      state: running
      replication:
        state: streaming
        sync_state: sync
    pg-ha-cluster-1:
      role: replica
      state: running
      replication:
        state: streaming
        sync_state: async

3. Configure Connection Pooler (Optional)

kubectl patch postgresql pg-ha-cluster --type='merge' -p '
spec:
  enableConnectionPooler: true
  connectionPooler:
    numberOfInstances: 2
    mode: "transaction"
    resources:
      requests:
        cpu: "500m"
        memory: "512Mi"
      limits:
        cpu: "1"
        memory: "1Gi"
'

4. Configure Monitoring (Optional)

kubectl patch postgresql pg-ha-cluster --type='merge' -p '
spec:
  enableExporter: true
  exporter:
    resources:
      requests:
        cpu: "200m"
        memory: "256Mi"
      limits:
        cpu: "500m"
        memory: "512Mi"
'

Key Parameters

Cluster Configuration

ParameterDefault ValueDescription
numberOfInstances3Number of cluster nodes (recommend odd number)
postgresql.version"14"PostgreSQL major version
postgresql.parametersPostgreSQL configuration parameters
resources-Resource requests and limits
volume.size-Storage volume size
volume.storageClass-Storage class name
volume.iops-IOPS (only for storage classes supporting IOPS)

Patroni Configuration

ParameterDefault ValueDescription
patroni.ttl30Master node lease time (seconds)
patroni.loop_wait10Status check interval (seconds)
patroni.retry_timeout10Retry timeout (seconds)
patroni.maximum_lag_on_failover1048576Maximum allowed replication lag (bytes)
patroni.synchronous_modefalseWhether to enable synchronous replication
patroni.synchronous_node_count1Number of synchronous replication nodes
patroni.pg_hba[]Custom pg_hba configuration

Backup Configuration

ParameterDefault ValueDescription
backup.schedule-Backup schedule (cron format)
backup.retainDay7Backup retention days
backup.storage.bucket-Backup storage bucket name
backup.storage.name-Storage configuration name
backup.storage.namespace-Storage configuration namespace

Result Validation

  1. Verify cluster status is Running
  2. Confirm all Pods are running normally
  3. Check Patroni status:
    kubectl exec pg-ha-cluster-0 -- patronictl list
  4. Test failover:
    kubectl delete pod pg-ha-cluster-0
    Observe new leader election process
  5. Verify synchronous replication status:
    kubectl exec pg-ha-cluster-0 -- psql -c "SELECT * FROM pg_stat_replication;"

Best Practices

  1. For production environments, use SSD storage with appropriate IOPS configuration
  2. Configure resource limits with 20-30% buffer
  3. Regularly test failover and recovery procedures
  4. Configure monitoring alerts for:
    • Primary-replica lag
    • Connection counts
    • Disk usage
    • CPU/memory usage
  5. Enable scheduled backups and test restore procedures
  6. When using synchronous replication, configure at least 2 synchronous replicas
  7. Perform regular performance tuning and parameter optimization

Learn More