Building a large scale multi-cloud multi-region SaaS platform with kubernetes controllers
Watch talk on YouTubeInterchangeable wording in this talk: controller == operator
A talk by elastic.
About elastic
- Elastic cloud as a managed service
 - Deployed across AWS/GCP/Azure in over 50 regions
 - 600000+ Containers
 
Elastic and Kube
- They offer elastic observability
 - They offer the ECK operator for simplified deployments
 
The baseline
- Goal: A large scale (1M+ containers) resilient platform on k8s
 - Architecture
- Global Control: The control plane (API) for users with controllers
 - Regional Apps: The “shitload” of Kubernetes clusters where the actual customer services live
 
 
Scalability
- Challenge: How large can our cluster be, how many clusters do we need
 - Problem: Only basic guidelines exist for that
 - Decision: Horizontally scale the number of clusters (500-1K nodes each)
 - Decision: Disposable clusters
- Throw away without data loss
 - Single source of truth is not cluster etcd but external -> No etcd backups needed
 - Everything can be recreated any time
 
 
Controllers
 Note
  I won’t copy the explanations of operators/controllers in these notes
- Many controllers, including (but not limited to)
- cluster controller: Register cluster to controller
 - Project controller: Schedule user’s project to cluster
 - Product controllers (Elasticsearch, Kibana, etc.)
 - Ingress/Cert manager
 
 - Sometimes controllers depend on controllers -> potential complexity
 - Pro:
- Resilient (Self-healing)
 - Level triggered (desired state vs procedure triggered)
 - Simple reasoning when comparing desired state vs state machine
 - Official controller runtime lib
 
 - Workqueue: Automatic Dedup, Retry back off and so on
 
Global Controllers
- Basic operation
- Uses project config from Elastic cloud as the desired state
 - The actual state is a k9s resource in another cluster
 
 - Challenge: Where is the source of truth if the data is not stored in etcd
 - Solution: External data store (Postgres)
 - Challenge: How do we sync the db sources to Kubernetes
 - Potential solutions: Replace etcd with the external db
 - Chosen solution:
- The controllers don’t use CRDs for storage, but they expose a web-API
 - Reconciliation still now interacts with the external db and go channels (queue) instead
 - Then the CRs for the operators get created by the global controller
 
 
Large scale
- Problem: Reconcile gets triggered for all objects on restart -> Make sure nothing gets missed and is used with the latest controller version
 - Idea: Just create more workers for 100K+ Objects
 - Problem: CPU go brrr and db gets overloaded
 - Problem: If you create an item during restart, suddenly it is at the end of a 100Kü item work-queue
 
Reconcile
- User-driven events are processed asap
 - reconcile of everything should happen, bus with low priority slowly in the background
 - Solution: Status: LastReconciledRevision (timestamp) gets compare to revision, if larger -> User change
 - Prioritization: Just a custom event handler with the normal queue and a low priority
 - Queue: Just a queue that adds items to the normal work-queue with a rate limit
 
flowchart LR
    low-->rl(ratelimit)
    rl-->wq(work queue)
    wq-->controller
    high-->wqRelated
- Argo for CI/CD
 - Crossplane for cluster autoprovision