Comparing sidecarless service mesh from cilium and istio
Watch talk on YouTubeGlobal field CTO at Solo.io with a hint of service mesh background.
History
- LinkerD 1.X was the first modern service mesh and basically an opt-in service proxy
- Challenges: JVM (size), latencies, …
Why not node-proxy?
- Per-node resource consumption is unpredictable
- Per-node proxy must ensure fairness
- Blast radius is always the entire node
- Per-node proxy is a fresh attack vector
Why sidecar?
- Transparent (ish)
- Part of app lifecycle (up/down)
- Single tenant
- No noisy neighbor
Sidecar drawbacks
- Race conditions
- Security of certs/keys
- Difficult sizing
- Apps need to be proxy aware
- Can be circumvented
- Challenging upgrades (infra and app live side by side)
Our lord and savior
- Potential solution: eBPF
- Problem: Not quite the perfect solution
- Result: We still need a L7 proxy (but some L4 stuff can be implemented in kernel)
Why sidecarless
- Full transparency
- Optimized networking
- Lower resource allocation
- No race conditions
- No manual pod injection
- No credentials in the app
Architecture
- Control Plane
- Data Plane
- mTLS
- Observability
- Traffic Control
Cilium
Basics
- CNI with eBPF on L3/4
- A lot of nice observability
- Kubeproxy replacement
- Ingress (via Gateway API)
- Mutual Authentication
- Specialized CiliumNetworkPolicy
- Configure Envoy through Cilium
Control Plane
- Cilium-Agent on each node that reacts to scheduled workloads by programming the local data-plane
- API via Gateway API and CiliumNetworkPolicy
flowchart TD subgraph kubeserver kubeapi end subgraph node1 kubeapi<-->control1 control1-->data1 end subgraph node2 kubeapi<-->control2 control2-->data2 end subgraph node3 kubeapi<-->control3 control3-->data3 end
Data plane
- Configured by control plane
- Does all the eBPF things in L4
- Does all the envoy things in L7
- In-Kernel WireGuard for optional transparent encryption
mTLS
- Network Policies get applied at the eBPF layer (check if ID a can talk to ID 2)
- When mTLS is enabled there is an auth check in advance -> If it fails, proceed with agents
- Talk to each other for mTLS Auth and save the result to a cache -> Now eBPF can say yes
- Problems: The caches can lead to ID confusion
Istio
Basics
- L4/7 Service mesh without its own CNI
- Based on envoy
- mTLS
- Classically via sidecar, nowadays
Ambient mode
- Separate L4 and L7 -> Can run on cilium
- mTLS
- Gateway API
Control plane
flowchart TD kubeapi-->xDS xDS-->dataplane1 xDS-->dataplane2 subgraph node1 dataplane1 end subgraph node2 dataplane2 end
- Central xDS Control Plane
- Per-Node Data-plane that reads updates from Control Plane
Data Plane
- L4 runs via zTunnel Daemonset that handles mTLS
- The zTunnel traffic gets handed over to the CNI
- L7 Proxy lives somewhere™ and traffic gets routed through it as an “extra hop” aka waypoint
mTLS
- The zTunnel creates a HBONE (HTTP overlay network) tunnel with mTLS