From Zero to Hero: Scaling Postgres in Kubernetes Using the Power of CloudNativePG

A short Talk as Part of the Data on Kubernetes day - presented by the VP of Cloud Native at EDB (one of the biggest PG contributors) Stated target: Make the world your single point of failure

Proposal

  • Get rid of Vendor-Lockin using the OSS projects PG, K8S and CnPG
  • PG was the DB of the year 2023 and a bunch of other times in the past
  • CnPG is a Level 5 mature operator

4 Pillars

  • Seamless Kube API Integration (Operator Pattern)
  • Advanced observability (Prometheus Exporter, JSON logging)
  • Declarative Config (Deploy, Scale, Maintain)
  • Secure by default (Robust containers, mTLS, and so on)

Clusters

  • Basic Resource that defines name, instances, sync and storage (and other parameters that have same defaults)
  • Implementation: Operator creates:
    • The volumes (PG_Data, WAL (Write ahead log)
    • Primary and Read-Write Service
    • Replicas
    • Read-Only Service (points at replicas)
  • Failover:
    • Failure detected
    • Stop R/W Service
    • Promote Replica
    • Activate R/W Service
    • Kill old primary and demote to replica

Backup/Recovery

  • Continuous Backup: Write Ahead Log Backup to object store
  • Physical: Create from primary or standby to object store or kube volumes
  • Recovery: Copy full backup and apply WAL until target (last transaction or specific timestamp) is reached
  • Replica Cluster: Basically recreates a new cluster to a full recovery but keeps the cluster in Read-Only Replica Mode
  • Planned: Backup Plugin Interface

Multi-Cluster

  • Just create a replica cluster via WAL-files from S3 on another kube cluster (lags 5 mins behind)
  • You can also activate replication streaming
  • Dev Cluster: 1 Instance without PDB and with Continuous backup
  • Prod: 3 Nodes with automatic failover and continuous backups
  • Symmetric: Two clusters
    • Primary: 3-Node Cluster
    • Secondary: WAL based 3-Node Cluster with a designated primary (to take over if primary cluster fails)
  • Symmetric Streaming: Same as Secondary, but you manually enable the streaming API for live replication
  • Cascading Replication: Scale Symmetric to more clusters
  • Single availability zone: Well, do your best to spread to nodes and aspire to stretched Kubernetes to more AZs

Roadmap

  • Replica Cluster (Symmetric) Switchover
  • Synchronous Symmetric
  • 3rd Party Plugins
  • Manage DBs via the Operator
  • Storage Autoscaling