Chapter 3

Day 3

Spent most of the early day with headache therefore talk notes only start at noon.

Subsections of Day 3

Stop leaking Kubernetes service information via DNS

Watch talk on YouTube

A talk by Google and Ivanti.

Background

  • RBAC is there to limit information access and control
  • RBAC can be used to avoid interference in shared envs
  • DNS is not really applicable when it comes to RBAC

DNS in Kubernetes

  • DNS Info is always public -> No auth
  • Services are exposed to all clients

Isolation and Clusters

Just don’t share

  • Specially for smaller, high growth companies with infinite VC money
  • Just give everyone their own cluster -> Problem solved
  • Smaller (<1000) typically use many small clusters

Shared Clusters

  • Becomes important when cost is a question and engineers don’t have any platform knowledge
  • A dedicated kube team can optimize both hardware and deliver updates fast -> Increased productivity by utilizing specialists
  • Problem: Noisy neighbors by leaky DNS

Leaks (demo)

Base scenario

  • Cluster with a bunch of deployments and services
  • Creating a simple pod results in binding to default RBAC -> No access to anything
  • Querying DNS info (aka services) still leaks everything (namespaces, services)

Leak mechanics

  • Leaks are based on the <service>.<nemspace>.<svc>.cluster.local pattern
  • You can also just reverse lookup the entire service CIDR
  • SRV records get created for each service including the service ports

Fix the leak

CoreDNS Firewall Plugin

  • External plugin provided by the CoreDNS team
  • Expression engine built-in with support for external policy engines
flowchart LR
    req-->metadata
    metadata-->firewall
    firewall-->kube
    kube-->|Adds namespace/clientnamespace metadata|firewall
    firewall-->|send nxdomain|metadata
    metadata-->res

Demo

  • Firewall rule that only allows queries from the same namespace, kube-system or default
  • Every other cross-namespace request gets blocked
  • Same SVC requests from before now return NXDOMAIN

Why is this a plugin and not default?

  • Requires pods verified mode -> Puts the watch on pods and only returns a query result if the pod actually exists
  • Puts a watch on all pods -> higher API load and CoreDNS memory usage
  • Potential race conditions with initial lookups in larger clusters -> Alternative is to fail open (not really secure)

Per tenant DNS

  • Just run a CoreDNS instance for each tenant
  • Use a mutating webhook to inject the right DNS into each pod
  • Pro: No more pods verified -> Aka no more constant watch
  • Limitation: Platform services still need a central CoreDNS

Why is this so hard! Conveying the business value of open source

Watch talk on YouTube

Bob a Program Manager at Google and Kubernetes steering committee member with a bunch of contributor and maintainer experience. The value should be rated even higher than the pure business value.

Baseline

  • A large chunk of CNCF contributors and maintainers (95%) are company affiliated
  • Most (50%) of the people contributed in professional personal time (and 30 only on work time)
  • Explaining business value can be very complex
  • Base question: What does this contribute to the business

Data enablement

  • Problem: Insufficient data (data collection is often an afterthought)
  • Example used: Random CNCF selection
    • 50% of issues are labeled consistently
    • 17% of projects label PRs
    • 58% of projects use milestones
  • Labels provide: Context, Prioritization, Scope, State
  • Milestones enable: Filtering outside date range
  • Sample queries:
    • How many features have been in milestone XY?
    • How many bugs have been fixed in this version?
    • What have I/my team worked on over time?

Triage

  • Many projects don’t triage b/C
    • Auth (No Role that can only edit labels+milestones)
    • Thought of as overhead
    • Project is too small
  • Tools:
    • Actions/Pipelines for auto-label, copy label sync labels
    • Prow: The label system for Kubernetes projects
  • People with high project, but low code knowledge can triage -> Make them feel recognized

Conclusions

  • Consistent labels & milestones are critical for state analysis
  • Data is the evidence needed in messaging for leadership
  • Recruiting triage-specific people and using automations streamlines the process

Communication

Personas

  • OSS enthusiast: Knows the ecosystem and project with a knack for discussions and deep dives
  • Maintainer;: A enthusiast that is tired, under pressure and most of the time a one-man show that would prefer doing technical stuff
  • CXO: Focus on resources, health, ROI
  • Product manager: Get the best project, user-friendly
  • Leads: Employees should meet KPIs, with slightly better tech understanding
  • End user: How can tools/features help me

Growth limits

  • Main questions:
    • What is this project/feature
    • Where is the roadmap
    • What parts of the project are at risk?
  • Problem: Wording

Ways of surfacing information

  • Regular project reports/blog posts
  • Roadmap on website
  • Project boards -> GitHub’s feature for this is apparently pretty nice

Questions by leadership

  • What are we getting out? (How fast are bugs getting fixed)
  • What is the criticality of the project?
  • How much time is spent on maintenance?

Conclusion

  • There is significant unrealized value in open source

Towards Great Documentation: Behind a CNCF-Led Docs Audit

Watch talk on YouTube

A talk about the backstage documentation audit and what makes a good documentation.

Opening

  • 2012 the year of the Maya calendar and the mainstream success of memes
  • The classic meme RTFM -> Classic manuals were pretty long
  • 2024: Manuals have become documentation (hopefully with better contents)

What gets us to good documentation

What is documentation

  • Docs (the raw descriptions, quick-start and how-to)
  • Website (the first impression - what does this do, why would I need it)
  • README (the GitHub way of website + docs)
  • CONTRIBUTING (Is this a one-man show)
  • Issues
  • Meta docs (how do we orchestrate things)

Project documentation

  • Who needs this documentation?
    • New users -> Optimize for minimum context
    • Experienced users
    • User roles (Admins, end users, …) -> Separate into different pages (Get started based in your role)
  • What do we need to enable with this documentation?
    • Prove value fast -> Why this project?
    • Educate on fundamental aspects
    • Showcase features/uses cases
    • Hands-on enablement -> Tutorials, guides, step-by-step

Contributor documentation

  • Communication channels have to be clearly marked
  • Documented scheduled contributor meetings
  • Getting started guides for new contributors
  • Project governance
    • Who is going to own it?
    • What will happen to my PR?
    • Who maintains features?

Website

  • Single source for all pages (one repo that includes landing, docs, API and so on) -> Easier to contribute
  • Usability (especially on mobile)
  • Social proof and case studies -> Develop trust
  • SEO (actually get found) and analytics (detect how documentation is used and where people leave)
  • Plan website maintenance

What is great documentation

  • Project docs help users according to their needs -> Low question to answer latency
  • Contributor docs enables contributions predictably -> Don’t leave “when will this be reviewed/merged” questions open
  • Website proves why anyone should invest time in these projects?
  • All documentation is connected and up to date

General best practices

  • Insular pages: One page per topic, preferably short
  • Include API reference
  • Searchable
  • Plan for versioning early on (the right framework is important)
  • Plan for localization

Examples

  • OpenTelemetry: Split by role (dev, ops)
  • Prometheus:
    • New user content in intro (concept) and getting started (practice)
    • Hierarchies includes concepts, key features and guides/tutorials

Q&A

  • Every last Wednesday in the month is a CNCF technical writers meeting (CNCF slack -> #techdocs)

Container Image Workflows at Scale with Buildpacks

Watch talk on YouTube

A talk by Broadcom and Bloomberg (both related to buildpacks.io). And a very full talk at that.

Baseline

  • CN Buildpack provides the spec for buildpacks with a couple of different implementations
  • Pack CLI with builder (collection of Buildpacks - for example Paketo or Heroku)
  • Output images follow OCI -> Just run them on docker/Podman/Kubernetes
  • Built images are production application images (small attack surface, SBOM, non-root, reproducible)

Scaling

Builds

  • Use in CI (Jenkins, GitHub Actions, Tekton, …)
  • KPack: Kubernetes operator -> Builds on new changes

Multiarch support

flowchart LR
    subgraph OCIImageIndex
        lamd(linux/amd64)
        larm(linux/arm64)
    end
    larm-->imageARM
    lamd-->imageAMD
    subgraph imageARM
        layer1.1
        layer2.1
        layer3.1
    end
    subgraph imageAMD
        layer1.2
        layer2.2
        layer3.2
    end
  • Goal: Just a simple docker full that auto-detects the right architecture
  • Needed: Pack, Lifecycle, Buildpacks, Build images, builders, registry
  • Current state: There is an RFC to handle image index creation with changes to Buildpack creation
    • New folder structure for binaries
    • Update config files to include targets
  • The user impact is minimal, because the builder abstracts everything away

Majority

  • kpack is slsa.dev v3 compliant (party hard)
  • 5 years of production
  • scaling up to Tanzu/Heroku/GCP levels
  • Multiarch is being worked on

Networking

Who have I talked to today, are there any follow-ups or learnings?

VMware

  • Dinner