Chapter 3

Day 3

Spent most of the early day with headache therefore talk notes only start at noon.

Stop leaking Kubernetes service information via DNS

Watch talk on YouTube

A talk by Google and Ivanti.

Background

RBAC is there to limit information access and control
RBAC can be used to avoid interference in shared envs
DNS is not really applicable when it comes to RBAC

DNS in Kubernetes

DNS Info is always public -> No auth
Services are exposed to all clients

Isolation and Clusters

Specially for smaller, high growth companies with infinite VC money
Just give everyone their own cluster -> Problem solved
Smaller (<1000) typically use many small clusters

Shared Clusters

Becomes important when cost is a question and engineers don’t have any platform knowledge
A dedicated kube team can optimize both hardware and deliver updates fast -> Increased productivity by utilizing specialists
Problem: Noisy neighbors by leaky DNS

Leaks (demo)

Base scenario

Cluster with a bunch of deployments and services
Creating a simple pod results in binding to default RBAC -> No access to anything
Querying DNS info (aka services) still leaks everything (namespaces, services)

Leak mechanics

Leaks are based on the <service>.<nemspace>.<svc>.cluster.local pattern
You can also just reverse lookup the entire service CIDR
SRV records get created for each service including the service ports

Fix the leak

CoreDNS Firewall Plugin

External plugin provided by the CoreDNS team
Expression engine built-in with support for external policy engines

flowchart LR
    req-->metadata
    metadata-->firewall
    firewall-->kube
    kube-->|Adds namespace/clientnamespace metadata|firewall
    firewall-->|send nxdomain|metadata
    metadata-->res

Demo

Firewall rule that only allows queries from the same namespace, kube-system or default
Every other cross-namespace request gets blocked
Same SVC requests from before now return NXDOMAIN

Why is this a plugin and not default?

Requires pods verified mode -> Puts the watch on pods and only returns a query result if the pod actually exists
Puts a watch on all pods -> higher API load and CoreDNS memory usage
Potential race conditions with initial lookups in larger clusters -> Alternative is to fail open (not really secure)

Per tenant DNS

Just run a CoreDNS instance for each tenant
Use a mutating webhook to inject the right DNS into each pod
Pro: No more pods verified -> Aka no more constant watch
Limitation: Platform services still need a central CoreDNS

Why is this so hard! Conveying the business value of open source

Watch talk on YouTube

Bob a Program Manager at Google and Kubernetes steering committee member with a bunch of contributor and maintainer experience. The value should be rated even higher than the pure business value.

Baseline

A large chunk of CNCF contributors and maintainers (95%) are company affiliated
Most (50%) of the people contributed in professional personal time (and 30 only on work time)
Explaining business value can be very complex
Base question: What does this contribute to the business

Data enablement

Problem: Insufficient data (data collection is often an afterthought)
Example used: Random CNCF selection
- 50% of issues are labeled consistently
- 17% of projects label PRs
- 58% of projects use milestones
Labels provide: Context, Prioritization, Scope, State
Milestones enable: Filtering outside date range
Sample queries:
- How many features have been in milestone XY?
- How many bugs have been fixed in this version?
- What have I/my team worked on over time?

Triage

Many projects don’t triage b/C
- Auth (No Role that can only edit labels+milestones)
- Thought of as overhead
- Project is too small
Tools:
- Actions/Pipelines for auto-label, copy label sync labels
- Prow: The label system for Kubernetes projects
People with high project, but low code knowledge can triage -> Make them feel recognized

Conclusions

Consistent labels & milestones are critical for state analysis
Data is the evidence needed in messaging for leadership
Recruiting triage-specific people and using automations streamlines the process

Communication

Personas

OSS enthusiast: Knows the ecosystem and project with a knack for discussions and deep dives
Maintainer;: A enthusiast that is tired, under pressure and most of the time a one-man show that would prefer doing technical stuff
CXO: Focus on resources, health, ROI
Product manager: Get the best project, user-friendly
Leads: Employees should meet KPIs, with slightly better tech understanding
End user: How can tools/features help me

Growth limits

Main questions:
- What is this project/feature
- Where is the roadmap
- What parts of the project are at risk?
Problem: Wording

Ways of surfacing information

Regular project reports/blog posts
Roadmap on website
Project boards -> GitHub’s feature for this is apparently pretty nice

Questions by leadership

What are we getting out? (How fast are bugs getting fixed)
What is the criticality of the project?
How much time is spent on maintenance?

Conclusion

There is significant unrealized value in open source

Docs
Dx

Towards Great Documentation: Behind a CNCF-Led Docs Audit

Watch talk on YouTube

A talk about the backstage documentation audit and what makes a good documentation.

Opening

2012 the year of the Maya calendar and the mainstream success of memes
The classic meme RTFM -> Classic manuals were pretty long
2024: Manuals have become documentation (hopefully with better contents)

What gets us to good documentation

What is documentation

Docs (the raw descriptions, quick-start and how-to)
Website (the first impression - what does this do, why would I need it)
README (the GitHub way of website + docs)
CONTRIBUTING (Is this a one-man show)
Issues
Meta docs (how do we orchestrate things)

Project documentation

Who needs this documentation?
- New users -> Optimize for minimum context
- Experienced users
- User roles (Admins, end users, …) -> Separate into different pages (Get started based in your role)
What do we need to enable with this documentation?
- Prove value fast -> Why this project?
- Educate on fundamental aspects
- Showcase features/uses cases
- Hands-on enablement -> Tutorials, guides, step-by-step

Contributor documentation

Communication channels have to be clearly marked
Documented scheduled contributor meetings
Getting started guides for new contributors
Project governance
- Who is going to own it?
- What will happen to my PR?
- Who maintains features?

Website

Single source for all pages (one repo that includes landing, docs, API and so on) -> Easier to contribute
Usability (especially on mobile)
Social proof and case studies -> Develop trust
SEO (actually get found) and analytics (detect how documentation is used and where people leave)
Plan website maintenance

What is great documentation

Project docs help users according to their needs -> Low question to answer latency
Contributor docs enables contributions predictably -> Don’t leave “when will this be reviewed/merged” questions open
Website proves why anyone should invest time in these projects?
All documentation is connected and up to date

General best practices

Insular pages: One page per topic, preferably short
Include API reference
Searchable
Plan for versioning early on (the right framework is important)
Plan for localization

Examples

OpenTelemetry: Split by role (dev, ops)
Prometheus:
- New user content in intro (concept) and getting started (practice)
- Hierarchies includes concepts, key features and guides/tutorials

Q&A

Every last Wednesday in the month is a CNCF technical writers meeting (CNCF slack -> #techdocs)

Container Image Workflows at Scale with Buildpacks

Watch talk on YouTube

A talk by Broadcom and Bloomberg (both related to buildpacks.io). And a very full talk at that.

Baseline

CN Buildpack provides the spec for buildpacks with a couple of different implementations
Pack CLI with builder (collection of Buildpacks - for example Paketo or Heroku)
Output images follow OCI -> Just run them on docker/Podman/Kubernetes
Built images are production application images (small attack surface, SBOM, non-root, reproducible)

Scaling

Builds

Use in CI (Jenkins, GitHub Actions, Tekton, …)
KPack: Kubernetes operator -> Builds on new changes

Multiarch support

flowchart LR
    subgraph OCIImageIndex
        lamd(linux/amd64)
        larm(linux/arm64)
    end
    larm-->imageARM
    lamd-->imageAMD
    subgraph imageARM
        layer1.1
        layer2.1
        layer3.1
    end
    subgraph imageAMD
        layer1.2
        layer2.2
        layer3.2
    end

Goal: Just a simple docker full that auto-detects the right architecture
Needed: Pack, Lifecycle, Buildpacks, Build images, builders, registry
Current state: There is an RFC to handle image index creation with changes to Buildpack creation
- New folder structure for binaries
- Update config files to include targets
The user impact is minimal, because the builder abstracts everything away

Majority

kpack is slsa.dev v3 compliant (party hard)
5 years of production
scaling up to Tanzu/Heroku/GCP levels
Multiarch is being worked on

Networking

Who have I talked to today, are there any follow-ups or learnings?

VMware

Dinner

Day 3

Subsections of Day 3

Stop leaking Kubernetes service information via DNS

Background

DNS in Kubernetes

Isolation and Clusters

Just don’t share

Shared Clusters

Leaks (demo)

Base scenario

Leak mechanics

Fix the leak

CoreDNS Firewall Plugin

Demo

Why is this a plugin and not default?

Per tenant DNS

Why is this so hard! Conveying the business value of open source

Baseline

Data enablement

Triage

Conclusions

Communication

Personas

Growth limits

Ways of surfacing information

Questions by leadership

Conclusion

Towards Great Documentation: Behind a CNCF-Led Docs Audit

Opening

What gets us to good documentation

What is documentation

Project documentation

Contributor documentation

Website

What is great documentation

General best practices

Examples

Q&A

Container Image Workflows at Scale with Buildpacks

Baseline

Scaling

Builds

Multiarch support

Majority

Networking

VMware