Key Takeaways from Scaling Adobe's CI/CD Solution to Support >50K Argo CD Apps

Watch talk on YouTube

Part of the Multi-tenancy Con presented by Adobe

Challenges

  • Spin up Edge Infra globally fast

Implementation

First try - Single Tenant Cluster

  • Azure in Base - AWS on the edge
  • Single Tenant Clusters (Simpler Governance)
  • Responsibility is Shared between App and Platform (Monitoring, Ingress, etc.)
  • Problem: Huge manual investment and over provisioning
  • Result: Access Control to tenant Namespaces and Capacity Planning -> Pretty much a multi tenant cluster with one tenant per cluster

Second Try - Micro Clusters

  • One Cluster per Service

Third Try - Multi-tenancy

  • Use a bunch of components deployed by platform Team (Ingress, CD/CD, Monitoring, …)
  • Harmonized general Runtime (cloud-agnostic): Code-named Ethos -> Over 300 Clusters
  • Both shared clusters (shared by namespace) and dedicated clusters
  • Cluster config is a basic JSON with name, capacity, teams
  • Capacity Management gets Monitored using Prometheus
  • Cluster Changes should be nondestructive -> K8S-Shredder
  • Cost efficiency: Use good PDBs and liveliness/readiness Probes alongside resource requests and limits

Conclusion

  • There is a balance between cost, customization, setup and security between single-tenant and multi-tenant