The Hitchhiker's Guide to Kubernetes Platforms: Don’t Panic, Just Launch!

Watch talk on YouTube

This talk looks at bootstrapping Platforms using KServe. They do this in regard to AI Workflows.

Scenario

  • Deploy AI Workloads - Sometime consisting of different parts
  • Models get stored in a model registry

Baseline

  • Consistent APIs throughout the platform
  • Not the kube API directly b/c:
    • Data scientists are a bit overpowered by the kube API
    • Not only Kubernetes (also monitoring tools, feedback tools, etc.)
    • Better debugging experience for specific workloads

The debugging API

  • Specific API with enhanced statuses and consistent UX across Code and UI
  • Example Endpoints: Pods, Deployments, InferenceServices
  • Provides a status summary-> Consistent health info across all related resources
    • Example: Deployments have progress/availability, Pods have phases, Containers have readiness -> What do we interpret how?
    • Evaluation: Progressing, Available Count vs Readiness, Replicafailure, Pod Phase, Container Readiness
  • The rules themselves may be pretty complex, but - since the user doesn’t have to check them themselves - the status is simple

Debugging Metrics

  • Dashboards (Utilization, throughput, latency)
  • Events
  • Logs

Deployment API

  • Launchpad: Just select your model and version -> The DB (dock) stores all manifests (Spaceship)
  • Manifests relate to models from a model registry
  • Multi-tenancy is implemented using k8s namespaces
  • Kine is used to replace/extend etcd with the relational dock db -> Relation namespace<->manifests is stored here and RBAC can be used
  • Launchpad: Select Namespace and check resource (fuel) availability/utilization

Cluster maintenance

  • Deployments can be launched to multiple clusters (even two clusters at once) -> HA through identical clusters
  • The exact same manifests get deployed to two clusters
  • Cluster desired state is stored externally to enable effortless upgrades, rescale, etc

Versioning API

  • Basically the dock DB
  • CRDs are the representations of the inference manifests
  • Rollbacks, Promotion and History is managed via the CRs
  • Why not GitOps: Internal Diffs, deployment overrides, customized features

UX

  • User driven API design
  • Customized tools
  • Everything gets 1:1 replicated for HA
  • Large onboarding guide