From Zero to Hero: Scaling Postgres in Kubernetes Using the Power of CloudNativePG

A short Talk as Part of the Data on Kubernetes day - presented by the VP of Cloud Native at EDB (one of the biggest PG contributors) Stated target: Make the world your single point of failure

Proposal

Get rid of Vendor-Lockin using the OSS projects PG, K8S and CnPG
PG was the DB of the year 2023 and a bunch of other times in the past
CnPG is a Level 5 mature operator

4 Pillars

Seamless Kube API Integration (Operator Pattern)
Advanced observability (Prometheus Exporter, JSON logging)
Declarative Config (Deploy, Scale, Maintain)
Secure by default (Robust containers, mTLS, and so on)

Clusters

Basic Resource that defines name, instances, sync and storage (and other parameters that have same defaults)
Implementation: Operator creates:
- The volumes (PG_Data, WAL (Write ahead log)
- Primary and Read-Write Service
- Replicas
- Read-Only Service (points at replicas)
Failover:
- Failure detected
- Stop R/W Service
- Promote Replica
- Activate R/W Service
- Kill old primary and demote to replica

Backup/Recovery

Continuous Backup: Write Ahead Log Backup to object store
Physical: Create from primary or standby to object store or kube volumes
Recovery: Copy full backup and apply WAL until target (last transaction or specific timestamp) is reached
Replica Cluster: Basically recreates a new cluster to a full recovery but keeps the cluster in Read-Only Replica Mode
Planned: Backup Plugin Interface

Multi-Cluster

Just create a replica cluster via WAL-files from S3 on another kube cluster (lags 5 mins behind)
You can also activate replication streaming

Recommended architecture

Dev Cluster: 1 Instance without PDB and with Continuous backup
Prod: 3 Nodes with automatic failover and continuous backups
Symmetric: Two clusters
- Primary: 3-Node Cluster
- Secondary: WAL based 3-Node Cluster with a designated primary (to take over if primary cluster fails)
Symmetric Streaming: Same as Secondary, but you manually enable the streaming API for live replication
Cascading Replication: Scale Symmetric to more clusters
Single availability zone: Well, do your best to spread to nodes and aspire to stretched Kubernetes to more AZs

Roadmap

Replica Cluster (Symmetric) Switchover
Synchronous Symmetric
3rd Party Plugins
Manage DBs via the Operator
Storage Autoscaling