Key Takeaways from Scaling Adobe's CI/CD Solution to Support >50K Argo CD Apps
Watch talk on YouTubePart of the Multi-tenancy Con presented by Adobe
Challenges
- Spin up Edge Infra globally fast
Implementation
First try - Single Tenant Cluster
- Azure in Base - AWS on the edge
- Single Tenant Clusters (Simpler Governance)
- Responsibility is Shared between App and Platform (Monitoring, Ingress, etc.)
- Problem: Huge manual investment and over provisioning
- Result: Access Control to tenant Namespaces and Capacity Planning -> Pretty much a multi tenant cluster with one tenant per cluster
Second Try - Micro Clusters
- One Cluster per Service
Third Try - Multi-tenancy
- Use a bunch of components deployed by platform Team (Ingress, CD/CD, Monitoring, …)
- Harmonized general Runtime (cloud-agnostic): Code-named Ethos -> Over 300 Clusters
- Both shared clusters (shared by namespace) and dedicated clusters
- Cluster config is a basic JSON with name, capacity, teams
- Capacity Management gets Monitored using Prometheus
- Cluster Changes should be nondestructive -> K8S-Shredder
- Cost efficiency: Use good PDBs and liveliness/readiness Probes alongside resource requests and limits
Conclusion
- There is a balance between cost, customization, setup and security between single-tenant and multi-tenant