Comparing sidecarless service mesh from cilium and istio

Watch talk on YouTube

Global field CTO at Solo.io with a hint of service mesh background.

History

  • LinkerD 1.X was the first modern service mesh and basically an opt-in service proxy
  • Challenges: JVM (size), latencies, …

Why not node-proxy?

  • Per-node resource consumption is unpredictable
  • Per-node proxy must ensure fairness
  • Blast radius is always the entire node
  • Per-node proxy is a fresh attack vector

Why sidecar?

  • Transparent (ish)
  • Part of app lifecycle (up/down)
  • Single tenant
  • No noisy neighbor

Sidecar drawbacks

  • Race conditions
  • Security of certs/keys
  • Difficult sizing
  • Apps need to be proxy aware
  • Can be circumvented
  • Challenging upgrades (infra and app live side by side)

Our lord and savior

  • Potential solution: eBPF
  • Problem: Not quite the perfect solution
  • Result: We still need a L7 proxy (but some L4 stuff can be implemented in kernel)

Why sidecarless

  • Full transparency
  • Optimized networking
  • Lower resource allocation
  • No race conditions
  • No manual pod injection
  • No credentials in the app

Architecture

  • Control Plane
  • Data Plane
  • mTLS
  • Observability
  • Traffic Control

Cilium

Basics

  • CNI with eBPF on L3/4
  • A lot of nice observability
  • Kubeproxy replacement
  • Ingress (via Gateway API)
  • Mutual Authentication
  • Specialized CiliumNetworkPolicy
  • Configure Envoy through Cilium

Control Plane

  • Cilium-Agent on each node that reacts to scheduled workloads by programming the local data-plane
  • API via Gateway API and CiliumNetworkPolicy
flowchart TD
    subgraph kubeserver
        kubeapi
    end
    subgraph node1
        kubeapi<-->control1
        control1-->data1
    end
    subgraph node2
        kubeapi<-->control2
        control2-->data2
    end
    subgraph node3
        kubeapi<-->control3
        control3-->data3
    end

Data plane

  • Configured by control plane
  • Does all the eBPF things in L4
  • Does all the envoy things in L7
  • In-Kernel WireGuard for optional transparent encryption

mTLS

  • Network Policies get applied at the eBPF layer (check if ID a can talk to ID 2)
  • When mTLS is enabled there is an auth check in advance -> If it fails, proceed with agents
  • Talk to each other for mTLS Auth and save the result to a cache -> Now eBPF can say yes
  • Problems: The caches can lead to ID confusion

Istio

Basics

  • L4/7 Service mesh without its own CNI
  • Based on envoy
  • mTLS
  • Classically via sidecar, nowadays

Ambient mode

  • Separate L4 and L7 -> Can run on cilium
  • mTLS
  • Gateway API

Control plane

flowchart TD
    kubeapi-->xDS

    xDS-->dataplane1
    xDS-->dataplane2

    subgraph node1
        dataplane1
    end

    subgraph node2
        dataplane2
    end
  • Central xDS Control Plane
  • Per-Node Data-plane that reads updates from Control Plane

Data Plane

  • L4 runs via zTunnel Daemonset that handles mTLS
  • The zTunnel traffic gets handed over to the CNI
  • L7 Proxy lives somewhere™ and traffic gets routed through it as an “extra hop” aka waypoint

mTLS

  • The zTunnel creates a HBONE (HTTP overlay network) tunnel with mTLS