Skip to content

Appendix D — Further reading

A topic → resource map: for each Part of this guide, the most relevant chapter(s) from the reference library plus 1–3 curated official-docs links. The book→chapter mapping mirrors the guide's internal citation map exactly; nothing here is invented. This guide is standalone — these are for going deeper on what a chapter already taught, not prerequisites.

This is a reference (no nine-section anatomy). Each chapter of the guide also ends with its own specific citation + official URL; this appendix is the consolidated, by-Part view.

The library

Only these six books are cited anywhere in the guide:

Short Book
L Lukša, Kubernetes in Action, 2nd Edition (Manning)
P Poulton, The Kubernetes Book
KP Ibryam & Huß, Kubernetes Patterns, 2nd Edition (O'Reilly)
R Rosso, Lander, Brand & Harris, Production Kubernetes (O'Reilly)
A Argo CD Up & Running (O'Reilly)
D Davis, Bootstrapping Microservices (Manning)

A note on Lukša 2e: the MEAP "brief contents" ends at ch.17. Where the guide needs material beyond that (securing the API server / RBAC, securing Pods, GitOps, extending Kubernetes), the citation is at topic granularity ("Lukša 2e, securing-the-API-server material") and a sibling book (Production Kubernetes / Kubernetes Patterns) is the primary — exactly as recorded below.


Part 00 — Foundations

Why Kubernetes, containers/images, architecture, control plane, node components, the declarative API model, local cluster setup.

Guide chapter Primary Secondary
01 — Why Kubernetes P ch.1; L ch.1 KP ch.1 (Introduction)
02 — Containers and images P ch.3; L ch.2 (Understanding containers) KP ch.30 (Image Builder)
03 — Architecture overview P ch.2; L ch.3 (Deploying your first application) R ch.1 (A Path to Production)
04 — Control plane deep dive L ch.3 + securing-the-API-server material; R ch.1 official docs (below)
05 — Node components L ch.2, ch.3 R ch.3 (Container Runtime)
06 — The declarative API model L ch.4 (Introducing Kubernetes API objects) KP ch.3 (Declarative Deployment)
07 — Local cluster setup P ch.3; L ch.3 official docs (below)

Official docs: - Kubernetes components & architecture — https://kubernetes.io/docs/concepts/overview/components/ - Working with objects (spec/status, labels, selectors) — https://kubernetes.io/docs/concepts/overview/working-with-objects/ - Install tools; kind https://kind.sigs.k8s.io/docs/user/quick-start/, k3d https://k3d.io/, kubectl https://kubernetes.io/docs/tasks/tools/


Part 01 — Core Workloads

Pods, health/lifecycle, resources/QoS, ReplicaSets/Deployments, StatefulSets, DaemonSets, Jobs/CronJobs, deployment strategies.

Guide chapter Primary Secondary
01 — Pods L ch.5 (Running workloads in Pods) KP ch.15/16/17/18 (Init/Sidecar/Adapter/Ambassador)
02 — Health and lifecycle L ch.6 (Managing the Pod lifecycle) KP ch.4 (Health Probe), ch.5 (Managed Lifecycle)
03 — Resources and QoS L (resource-management material); official docs KP ch.2 (Predictable Demands)
04 — ReplicaSets and Deployments L ch.13 (ReplicaSets), ch.14 (Deployments) KP ch.3 (Declarative Deployment)
05 — StatefulSets L ch.15 (Deploying stateful workloads with StatefulSets) KP ch.12 (Stateful Service)
06 — DaemonSets L ch.16 (Deploying node agents and daemons with DaemonSets) KP ch.9 (Daemon Service)
07 — Jobs and CronJobs L ch.17 (Running finite workloads with Jobs and CronJobs) KP ch.7 (Batch Job), ch.8 (Periodic Job)
08 — Deployment strategies KP ch.3 (Declarative Deployment) R ch.14 (Application Considerations); L ch.14

Official docs: - Workloads overview — https://kubernetes.io/docs/concepts/workloads/ - Configure liveness/readiness/startup probes — https://kubernetes.io/docs/tasks/configure-pod-container/configure-liveness-readiness-startup-probes/ - Resource management for Pods/Containers — https://kubernetes.io/docs/concepts/configuration/manage-resources-containers/


Part 02 — Networking

The networking model, Services, DNS/discovery, Ingress, Gateway API, network policies.

Guide chapter Primary Secondary
01 — The networking model R ch.5 (Pod Networking) L ch.11; official docs (CNI)
02 — Services L ch.11 (Exposing Pods with Services) R ch.6 (Service Routing); KP ch.13 (Service Discovery)
03 — DNS and service discovery L ch.11 KP ch.13 (Service Discovery); official docs (CoreDNS)
04 — Ingress L ch.12 (Exposing Services with Ingress) R ch.6 (Service Routing)
05 — Gateway API official docs (Gateway API — primary) R ch.6 (Service Routing)
06 — Network policies KP ch.24 (Network Segmentation) R ch.5 (Pod Networking); L ch.11

Official docs: - Services, Ingress & networking — https://kubernetes.io/docs/concepts/services-networking/ - Gateway API — https://gateway-api.sigs.k8s.io/ - Network Policies — https://kubernetes.io/docs/concepts/services-networking/network-policies/


Part 03 — Config and Storage

ConfigMaps, Secrets, volumes, persistent storage, stateful data patterns.

Guide chapter Primary Secondary
01 — ConfigMaps L ch.9 (Configuration via ConfigMaps, Secrets, and the Downward API) KP ch.20 (Configuration Resource), ch.19 (EnvVar Configuration)
02 — Secrets L ch.9 R ch.7 (Secret Management); KP ch.25 (Secure Configuration)
03 — Volumes L ch.7 (Attaching storage volumes to Pods) KP ch.20 (Configuration Resource)
04 — Persistent storage L ch.8 (Persisting data in PersistentVolumes) R ch.4 (Container Storage)
05 — Stateful data patterns KP ch.12 (Stateful Service) R ch.4 (Container Storage); R ch.16 (Platform Abstractions)

Official docs: - ConfigMaps — https://kubernetes.io/docs/concepts/configuration/configmap/ · Secrets — https://kubernetes.io/docs/concepts/configuration/secret/ - Storage (volumes, PV/PVC, StorageClass, CSI) — https://kubernetes.io/docs/concepts/storage/ - Volume snapshots — https://kubernetes.io/docs/concepts/storage/volume-snapshots/


Part 04 — Scheduling

The scheduler & nodes, affinity/taints/topology, priority & preemption.

Guide chapter Primary Secondary
01 — The scheduler and nodes KP ch.6 (Automated Placement) L (scheduling material); official docs (kube-scheduler)
02 — Affinity, taints, topology KP ch.6 (Automated Placement) official docs (assign Pods to nodes, topology spread)
03 — Priority and preemption KP ch.6 (Automated Placement) official docs (Pod priority & preemption)

Official docs: - Scheduling, preemption & eviction — https://kubernetes.io/docs/concepts/scheduling-eviction/ - Assigning Pods to nodes (affinity, taints/tolerations, topology spread) — https://kubernetes.io/docs/concepts/scheduling-eviction/assign-pod-node/ - Pod priority & preemption — https://kubernetes.io/docs/concepts/scheduling-eviction/pod-priority-preemption/


Part 05 — Security

AuthN/AuthZ/RBAC, pod security, supply chain, secrets & cluster hardening.

Guide chapter Primary Secondary
01 — Authn, authz, RBAC KP ch.26 (Access Control) R ch.10 (Identity); L securing-the-API-server material
02 — Pod security KP ch.23 (Process Containment) R ch.8 (Admission Control); official docs (Pod Security Standards)
03 — Supply chain R ch.15 (Software Supply Chain) KP ch.30 (Image Builder); official docs (Kyverno/Cosign)
04 — Secrets and cluster hardening R ch.7 (Secret Management) + R ch.8 (Admission Control) KP ch.25 (Secure Configuration); CIS Kubernetes Benchmark

Official docs: - Authentication & authorization (RBAC) — https://kubernetes.io/docs/reference/access-authn-authz/rbac/ - Pod Security Standards & Admission — https://kubernetes.io/docs/concepts/security/pod-security-standards/ - Cloud Native security & supply chain — https://kubernetes.io/docs/concepts/security/ · Sigstore/Cosign https://docs.sigstore.dev/ · Kyverno https://kyverno.io/docs/


Part 06 — Production Readiness

Metrics, logging, tracing, autoscaling, reliability & disruptions, capacity & cost.

Guide chapter Primary Secondary
01 — Observability: metrics R ch.9 (Observability) official docs (Prometheus)
02 — Logging R ch.9 (Observability) official docs (logging architecture)
03 — Tracing R ch.9 (Observability) official docs (OpenTelemetry)
04 — Autoscaling KP ch.29 (Elastic Scale) R ch.13 (Autoscaling); official docs (HPA/KEDA)
05 — Reliability and disruptions R ch.14 (Application Considerations) KP ch.10 (Singleton Service); official docs (PDB)
06 — Capacity and cost R ch.13 (Autoscaling) + R ch.12 (Multitenancy) KP ch.2 (Predictable Demands); OpenCost docs

Official docs: - Metrics & the metrics-server — https://kubernetes.io/docs/tasks/debug/debug-cluster/resource-metrics-pipeline/ · Prometheus https://prometheus.io/docs/ · OpenTelemetry https://opentelemetry.io/docs/ - Logging architecture — https://kubernetes.io/docs/concepts/cluster-administration/logging/ - HPA — https://kubernetes.io/docs/tasks/run-application/horizontal-pod-autoscale/ · PodDisruptionBudget — https://kubernetes.io/docs/concepts/workloads/pods/disruptions/ · KEDA https://keda.sh/docs/


Part 07 — Delivery

Helm, Kustomize, CI/CD, GitOps with Argo CD, progressive delivery.

Guide chapter Primary Secondary
01 — Packaging with Helm R ch.11 (Building Platform Services) official docs (Helm)
02 — Packaging with Kustomize R ch.11 (Building Platform Services) official docs (Kustomize)
03 — CI/CD pipeline D (CI/CD pipeline shape) R ch.15 (Software Supply Chain)
04 — GitOps with Argo CD A ch.1–10 (the whole book — primary) R ch.11 (Building Platform Services)
05 — Progressive delivery R ch.14 (Application Considerations) A ch.5 (Synchronizing Applications); official docs (Argo Rollouts)

Argo CD Up & Running chapter anchors: ch.1 Introduction · ch.2 Installing · ch.3 Core Concepts (App/Project/sync) · ch.4 Managing Applications · ch.5 Synchronizing Applications · ch.6 Access Control/RBAC & Projects · ch.9 declarative install · ch.10 Applications at Scale (App-of-Apps, ApplicationSet) · ch.12 Integrating CI · ch.13 Operationalizing. The guide cites Argo CD at book + topic granularity, which is unambiguous.

Official docs: - Helm — https://helm.sh/docs/ (charts, hooks, best practices) - Kustomize — https://kubectl.docs.kubernetes.io/references/kustomize/ · https://kustomize.io - Argo CD — https://argo-cd.readthedocs.io/ · Argo Rollouts — https://argo-rollouts.readthedocs.io/


Part 08 — Day-2 Operations

Cluster lifecycle, backup & DR, troubleshooting, multi-tenancy, operators & CRDs.

Guide chapter Primary Secondary
01 — Cluster lifecycle R ch.2 (Deployment Models) official docs (version skew, upgrades)
02 — Backup and DR R ch.4 (Container Storage) + R ch.2 (Deployment Models) official docs (etcd backup, Velero)
03 — Troubleshooting playbook R ch.9 (Observability) — co-primary, the operations/method perspective the chapter actually uses (observe→isolate→hypothesize→test→fix; alert→runbook) — with L (debugging material across ch.5/6/11) for Pod-status/Events/probes mechanics official docs (debug Pods/Services)
04 — Multi-tenancy and namespaces R ch.12 (Multitenancy) KP ch.26 (Access Control)
05 — Operators and CRDs KP ch.27 (Controller) + ch.28 (Operator) R ch.11 (Building Platform Services); official docs (CRD/operator)

Official docs: - Cluster administration & upgrades, version skew — https://kubernetes.io/docs/setup/release/version-skew-policy/ - Backing up etcd — https://kubernetes.io/docs/tasks/administer-cluster/configure-upgrade-etcd/ · Velero https://velero.io/docs/ - Debug running Pods (ephemeral containers / kubectl debug) — https://kubernetes.io/docs/tasks/debug/debug-application/debug-running-pod/ - Extend Kubernetes / CRDs & operators — https://kubernetes.io/docs/concepts/extend-kubernetes/


Part 09 — Capstone

The whole system, end to end.

Guide chapter Primary Secondary
01 — Bookstore end-to-end R ch.1 (A Path to Production) A (whole book); a recap of all parts

Official docs: - Production environment checklist — https://kubernetes.io/docs/setup/production-environment/ - Configuration & cluster-administration best practices — https://kubernetes.io/docs/concepts/configuration/overview/


Part 10 — Cloud & Managed Kubernetes

The shared-responsibility model, IaC for managed clusters, cloud identity for workloads, cloud CNI / load balancing / storage, and node autoscaling / cost / multi-cloud.

Guide chapter Primary Secondary
01 — The managed Kubernetes model R ch.2 (Deployment Models) KP ch.1; provider docs (below)
02 — Provisioning and IaC R ch.2 (Deployment Models) provider CLI docs; Terraform docs
03 — Cloud identity for workloads R ch.10 (Identity) KP ch.26 (Access Control); provider pod-identity docs
04 — Cloud networking and load balancing R ch.5 (Pod Networking) + R ch.6 (Service Routing) provider CNI docs; Cilium docs
05 — Cloud storage and data R ch.4 (Container Storage) provider CSI docs
06 — Node autoscaling, cost & multi-cloud R ch.13 (Autoscaling) + R ch.2 (Deployment Models) KP ch.29 (Elastic Scale); Karpenter docs; OpenCost docs

Official docs: - EKS — https://docs.aws.amazon.com/eks/ · GKE — https://cloud.google.com/kubernetes-engine/docs · AKS — https://learn.microsoft.com/azure/aks/ - IRSA — https://docs.aws.amazon.com/eks/latest/userguide/iam-roles-for-service-accounts.html · EKS Pod Identity — https://docs.aws.amazon.com/eks/latest/userguide/pod-identities.html · GKE Workload Identity — https://cloud.google.com/kubernetes-engine/docs/concepts/workload-identity · Azure AD Workload Identity — https://azure.github.io/azure-workload-identity/docs/ - AWS VPC CNI — https://github.com/aws/amazon-vpc-cni-k8s · AWS Load Balancer Controller — https://kubernetes-sigs.github.io/aws-load-balancer-controller/ · GKE Dataplane V2 — https://cloud.google.com/kubernetes-engine/docs/concepts/dataplane-v2 · Azure CNI Overlay — https://learn.microsoft.com/azure/aks/azure-cni-overlay - EBS CSI — https://github.com/kubernetes-sigs/aws-ebs-csi-driver · GCE PD CSI — https://github.com/kubernetes-sigs/gcp-compute-persistent-disk-csi-driver · Azure Disk CSI — https://github.com/kubernetes-sigs/azuredisk-csi-driver - Karpenter — https://karpenter.sh/docs/ · Cluster Autoscaler — https://kubernetes.io/docs/concepts/cluster-administration/cluster-autoscaling/ · OpenCost — https://www.opencost.io/docs/ - Terraform — https://developer.hashicorp.com/terraform/docs · eksctlhttps://eksctl.io/ · Crossplane — https://docs.crossplane.io/

Standout articles: - Karpenter consolidation deep-dive — https://aws.amazon.com/blogs/containers/optimizing-your-kubernetes-compute-costs-with-karpenter-consolidation/ - "Kubernetes Networking on AWS" (the canonical VPC-CNI / IP-density walkthrough) — https://aws.amazon.com/blogs/containers/amazon-vpc-cni-increases-pods-per-node-limits/ - Pod-identity for the three clouds, compared — https://kubernetes.io/blog/2022/12/22/pod-security-admission-stable/ (PSA context) + the provider docs above.


Part 11 — Advanced Production Patterns

Admission webhooks, operator development (build, not consume), APF, service mesh, secrets at scale, multi-cluster fleets, chaos engineering, HA control plane / etcd ops, performance & scalability, and platform engineering.

Guide chapter Primary Secondary
01 — Admission webhooks R ch.8 (Admission Control) KP ch.25 (Secure Configuration); official docs (webhooks, VAP)
02 — Operator development KP ch.27 (Controller) + KP ch.28 (Operator) R ch.11 (Building Platform Services); Kubebuilder Book
03 — API Priority and Fairness official docs (APF) R ch.1 (Path to Production) (control-plane context)
04 — Service mesh R ch.6 (Service Routing) + R ch.10 (Identity) KP ch.13 (Service Discovery); Istio / Linkerd / SPIFFE docs
05 — Secrets at scale R ch.7 (Secret Management) KP ch.25 (Secure Configuration); ESO + Vault docs
06 — Multi-cluster and fleet R ch.2 (Deployment Models) + R ch.12 (Multitenancy) A ch.10 (Applications at Scale: App-of-Apps, ApplicationSet)
07 — Chaos engineering R ch.14 (Application Considerations) KP ch.10 (Singleton Service); Chaos Mesh docs
08 — HA control plane and etcd R ch.2 (Deployment Models) official docs (etcd, kubeadm HA)
09 — Performance and scalability R ch.9 (Observability) KP ch.6 (Automated Placement); Cilium / kube-proxy docs
10 — Platform engineering R ch.11 (Building Platform Services) KP ch.28 (Operator); Crossplane / Backstage docs

Official docs: - Admission control — https://kubernetes.io/docs/reference/access-authn-authz/admission-controllers/ · Dynamic admission webhooks — https://kubernetes.io/docs/reference/access-authn-authz/extensible-admission-controllers/ · ValidatingAdmissionPolicy — https://kubernetes.io/docs/reference/access-authn-authz/validating-admission-policy/ - Kubebuilder Book — https://book.kubebuilder.io/ · controller-runtime — https://pkg.go.dev/sigs.k8s.io/controller-runtime · Operator SDK — https://sdk.operatorframework.io/docs/ - API Priority and Fairness — https://kubernetes.io/docs/concepts/cluster-administration/flow-control/ - Istio — https://istio.io/latest/docs/ · Istio Ambient — https://istio.io/latest/docs/ambient/ · Linkerd — https://linkerd.io/2/overview/ · SPIFFE/SPIRE — https://spiffe.io/docs/ - External Secrets Operator — https://external-secrets.io/latest/ · Vault on Kubernetes — https://developer.hashicorp.com/vault/docs/platform/k8s · CSI Secrets Store driver — https://secrets-store-csi-driver.sigs.k8s.io/ - Argo CD ApplicationSet — https://argo-cd.readthedocs.io/en/stable/operator-manual/applicationset/ · Karmada — https://karmada.io/docs/ · Cluster API — https://cluster-api.sigs.k8s.io/ - Chaos Mesh — https://chaos-mesh.org/docs/ · Litmus — https://docs.litmuschaos.io/ · Principles of Chaos Engineering — https://principlesofchaos.org/ - etcd operations — https://etcd.io/docs/latest/op-guide/ · etcd backup/restore — https://kubernetes.io/docs/tasks/administer-cluster/configure-upgrade-etcd/ - Cilium — https://docs.cilium.io/ · kube-proxy IPVS — https://kubernetes.io/docs/reference/networking/virtual-ips/ · API server scalability — https://kubernetes.io/docs/setup/best-practices/cluster-large/ - Crossplane — https://docs.crossplane.io/ · Backstage — https://backstage.io/docs/overview/what-is-backstage · Team Topologies — https://teamtopologies.com/

Standout articles: - "The Operator Pattern" original CoreOS post — https://kubernetes.io/docs/concepts/extend-kubernetes/operator/ (the official write-up) - Istio ambient announcement / architecture overview — https://istio.io/latest/blog/2022/introducing-ambient-mesh/ - Manuel Pais on Internal Developer Platforms / Team Topologies — https://teamtopologies.com/key-concepts-content/platform-as-a-product - "Principles of Chaos Engineering" — https://principlesofchaos.org/ - Kelsey Hightower, "Kubernetes The Hard Way" (for control-plane internals you can map onto HA) — https://github.com/kelseyhightower/kubernetes-the-hard-way


Part 12 — Kubernetes for Machine Learning

ML workload taxonomy, GPUs and accelerators, batch / gang scheduling, distributed training, notebooks, model serving (KServe), pipelines (Argo Workflows / Kubeflow Pipelines), ML platform / cost / MLOps capstone.

Guide chapter Primary Secondary
01 — Why ML on Kubernetes KP ch.7 (Batch Job) + ch.29 (Elastic Scale) R ch.11 (Building Platform Services); official docs (workloads)
02 — GPUs and accelerators official docs (device plugins, scheduling GPUs) NVIDIA GPU Operator docs; KP ch.6 (Automated Placement)
03 — Batch and gang scheduling KP ch.7 (Batch Job) + KP ch.6 (Automated Placement) Kueue docs; JobSet docs; Volcano docs
04 — Distributed training KP ch.7 (Batch Job) Kubeflow Training Operator docs; KubeRay / Ray Train docs
05 — Notebooks and interactive ML R ch.4 (Container Storage) (PVC-backed dev envs) JupyterHub z2jh docs; Kubeflow Notebooks
06 — Model serving and inference R ch.14 (Application Considerations) + KP ch.29 (Elastic Scale) KServe docs; Seldon Core docs; Triton docs
07 — ML pipelines and workflows A (workflow patterns from Argo) Argo Workflows / Argo Events docs; Kubeflow Pipelines docs
08 — ML platform, cost, and MLOps capstone R ch.11 (Building Platform Services) + R ch.12 (Multitenancy) KP ch.28 (Operator); MLflow docs; OpenCost docs

Official docs: - Scheduling GPUs / device plugins — https://kubernetes.io/docs/concepts/scheduling-eviction/dynamic-resource-allocation/ · https://kubernetes.io/docs/tasks/manage-gpus/scheduling-gpus/ · https://kubernetes.io/docs/concepts/extend-kubernetes/compute-storage-net/device-plugins/ - NVIDIA GPU Operator — https://docs.nvidia.com/datacenter/cloud-native/gpu-operator/latest/ · Node Feature Discovery — https://kubernetes-sigs.github.io/node-feature-discovery/stable/get-started/ · DCGM Exporter — https://github.com/NVIDIA/dcgm-exporter - Kueue — https://kueue.sigs.k8s.io/docs/ · JobSet — https://jobset.sigs.k8s.io/docs/ · Volcano — https://volcano.sh/en/docs/ - Kubeflow — https://www.kubeflow.org/docs/ · Kubeflow Training Operator — https://www.kubeflow.org/docs/components/training/ · Katib — https://www.kubeflow.org/docs/components/katib/ · Kubeflow Notebooks — https://www.kubeflow.org/docs/components/notebooks/ · Kubeflow Pipelines — https://www.kubeflow.org/docs/components/pipelines/ - Ray — https://docs.ray.io/ · KubeRay — https://docs.ray.io/en/latest/cluster/kubernetes/index.html - JupyterHub Zero-to-JupyterHub — https://z2jh.jupyter.org/en/stable/ · KubeSpawner — https://jupyterhub-kubespawner.readthedocs.io/ - KServe — https://kserve.github.io/website/ · Seldon Core — https://docs.seldon.io/projects/seldon-core/en/latest/ · NVIDIA Triton — https://docs.nvidia.com/deeplearning/triton-inference-server/user-guide/docs/ - Argo Workflows — https://argo-workflows.readthedocs.io/ · Argo Events — https://argoproj.github.io/argo-events/ · Tekton Pipelines — https://tekton.dev/docs/pipelines/ - MLflow — https://mlflow.org/docs/latest/ · OpenCost — https://www.opencost.io/docs/ · Alibi-Detect — https://docs.seldon.io/projects/alibi-detect/en/stable/ · Evidently — https://docs.evidentlyai.com/

Standout articles: - "MLOps: Continuous delivery and automation pipelines in machine learning" (Google Cloud — the L0/L1/L2 maturity model) — https://cloud.google.com/architecture/mlops-continuous-delivery-and-automation-pipelines-in-machine-learning - KServe canary rollouts — https://kserve.github.io/website/latest/modelserving/v1beta1/rollout/canary/ - Kubeflow Pipelines artifact + metadata model — https://www.kubeflow.org/docs/components/pipelines/concepts/metadata/ - Argo Workflows artifacts (input / output, S3 / GCS / PVC) — https://argo-workflows.readthedocs.io/en/latest/walk-through/artifacts/ - NVIDIA, "Best Practices for GPU-accelerated Kubernetes" — https://docs.nvidia.com/datacenter/cloud-native/gpu-operator/latest/overview.html


Part 13 — Grand Capstone: Bookstore Platform v2

The production e-commerce platform: tenancy + multi-region, Keycloak + IRSA + mesh JWT, CDC-driven search, Kafka outbox payments, edge WAF + rate limiting, the closed ML loop, three-pillar OTel observability, OpenCost FinOps, Backstage as the developer portal, and the day-2 runbook / on-call / DR / chaos discipline.

Guide chapter Primary Secondary
01 — Bookstore 2.0: from toy to platform R ch.1 (A Path to Production) + R ch.11 (Building Platform Services) KP ch.28 (Operator); Google SRE Book ch.1 (Introduction)
02 — Tenancy and Crossplane onboarding R ch.12 (Multitenancy) + R ch.11 (Building Platform Services) KP ch.28 (Operator); Crossplane docs
03 — Multi-region active-active R ch.2 (Deployment Models) A ch.10 (Applications at Scale); CloudNativePG docs; Google SRE Book ch.21 (Handling Overload)
04 — Real auth: Keycloak OIDC + IRSA + Istio JWT R ch.10 (Identity) KP ch.26 (Access Control); Keycloak docs; Istio security docs
05 — Search and product discovery KP ch.13 (Service Discovery) + KP ch.12 (Stateful Service) Debezium docs; Strimzi docs; Meilisearch docs
06 — Payments and event sourcing KP ch.12 (Stateful Service) + D (microservices event flow) Strimzi docs; Debezium outbox-pattern article (below); Stripe webhooks docs
07 — Edge: Istio Gateway + Coraza WAF + rate limiting R ch.6 (Service Routing) + R ch.8 (Admission Control) (policy posture) Istio Gateway API docs; Coraza docs; OWASP CRS docs
08 — Real ML loop R ch.14 (Application Considerations) (canary) + KP ch.29 (Elastic Scale) KServe canary docs; MLflow Registry docs; Alibi-Detect docs
09 — Observability: OTel + Tempo + Loki + Prometheus + Grafana R ch.9 (Observability) Google SRE Book ch.6 (Monitoring Distributed Systems); OpenTelemetry / Tempo / Loki docs
10 — Cost: OpenCost per-tenant FinOps R ch.12 (Multitenancy) + R ch.13 (Autoscaling) KP ch.2 (Predictable Demands); OpenCost docs; FinOps Foundation framework
11 — Backstage developer portal R ch.11 (Building Platform Services) KP ch.28 (Operator); Backstage docs; Team Topologies
12 — Day-2: runbook + on-call + DR + chaos Google SRE Book ch.11 (Being On-Call) + ch.14 (Managing Incidents) + ch.15 (Postmortem Culture) R ch.14 (Application Considerations); Chaos Mesh docs

Official docs: - Keycloak — https://www.keycloak.org/documentation · Keycloak realms / clients — https://www.keycloak.org/docs/latest/server_admin/ · Keycloak Operator (Kubernetes) — https://www.keycloak.org/operator/installation - Crossplane — https://docs.crossplane.io/ · Compositions / XRDs — https://docs.crossplane.io/latest/concepts/compositions/ · Crossplane v2 (composition functions) — https://docs.crossplane.io/latest/concepts/composition-functions/ - CloudNativePG — https://cloudnative-pg.io/docs/ · ReplicaCluster (cross-region) — https://cloudnative-pg.io/documentation/current/replica_cluster/ - Strimzi — https://strimzi.io/docs/operators/latest/overview · KafkaConnect / KafkaConnectorhttps://strimzi.io/docs/operators/latest/configuring#assembly-deployment-configuration-kafka-connect-str - Debezium — https://debezium.io/documentation/reference/stable/ · Postgres connector — https://debezium.io/documentation/reference/stable/connectors/postgresql.html - Meilisearch — https://www.meilisearch.com/docs · Meilisearch on Kubernetes — https://github.com/meilisearch/meilisearch-kubernetes - Istio Gateway API — https://istio.io/latest/docs/tasks/traffic-management/ingress/gateway-api/ · RequestAuthentication / AuthorizationPolicyhttps://istio.io/latest/docs/tasks/security/authentication/authn-policy/ · https://istio.io/latest/docs/tasks/security/authorization/ - Coraza WAF — https://coraza.io/docs/ · Istio + Coraza Wasm plugin — https://github.com/corazawaf/coraza-proxy-wasm - OWASP Core Rule Set — https://coreruleset.org/docs/ · OWASP ModSecurity CRS GitHub — https://github.com/coreruleset/coreruleset - MLflow — https://mlflow.org/docs/latest/ · Model Registry — https://mlflow.org/docs/latest/model-registry.html - KServe — https://kserve.github.io/website/ · KServe canary rollouts — https://kserve.github.io/website/latest/modelserving/v1beta1/rollout/canary/ - Alibi-Detect — https://docs.seldon.io/projects/alibi-detect/en/stable/ · Evidently — https://docs.evidentlyai.com/ - OpenTelemetry — https://opentelemetry.io/docs/ · OTel Collector — https://opentelemetry.io/docs/collector/ · OTLP — https://opentelemetry.io/docs/specs/otlp/ - Grafana Tempo — https://grafana.com/docs/tempo/latest/ · Grafana Loki — https://grafana.com/docs/loki/latest/ · Grafana variables — https://grafana.com/docs/grafana/latest/dashboards/variables/ - Prometheus Alertmanager (inhibition / routing) — https://prometheus.io/docs/alerting/latest/alertmanager/ - OpenCost — https://www.opencost.io/docs/ · OpenCost on Kubernetes — https://www.opencost.io/docs/installation/install - FinOps Foundation framework — https://www.finops.org/framework/ · FinOps Foundation maturity model — https://www.finops.org/framework/maturity-model/ - Backstage — https://backstage.io/docs/overview/what-is-backstage · Software Catalog — https://backstage.io/docs/features/software-catalog/ · Scaffolder — https://backstage.io/docs/features/software-templates/ · TechDocs — https://backstage.io/docs/features/techdocs/techdocs-overview - Chaos Mesh — https://chaos-mesh.org/docs/ · Litmus — https://docs.litmuschaos.io/ - Stripe sandbox + webhooks — https://stripe.com/docs/webhooks · webhook signature verification — https://stripe.com/docs/webhooks/signatures

Standout articles: - Google SRE Book — ch.11 "Being On-Call", ch.14 "Managing Incidents", ch.15 "Postmortem Culture: Learning from Failure" — https://sre.google/sre-book/being-on-call/ · https://sre.google/sre-book/managing-incidents/ · https://sre.google/sre-book/postmortem-culture/ - FinOps Foundation — the FinOps framework + maturity model — https://www.finops.org/framework/ - Spotify's Backstage adoption story — https://backstage.spotify.com/blog/ (and the canonical "Spotify Engineering Culture" backstage posts at https://engineering.atspotify.com/category/backstage/) - Debezium outbox-pattern canonical post — https://debezium.io/blog/2019/02/19/reliable-microservices-data-exchange-with-the-outbox-pattern/ - Google Cloud Architecture Center — multi-region active-active patterns — https://cloud.google.com/architecture/disaster-recovery · https://cloud.google.com/architecture/multi-regional-active-active-design - Istio ambient + Coraza WAF — https://istio.io/latest/blog/ (browse for the WAF + ambient announcements)


Part 14 — EKS in Production: A-Z

Terraform state hygiene, EKS version lifecycle, add-on discipline, storage, log cost, cost guardrails, infrastructure CI/CD + drift, VPC endpoints, Graviton, GitOps bootstrap, multi-region cloud reality, supply chain in CI, runtime defense, Velero, Cilium/eBPF, developer experience, and the cross- region DR + AWS account baseline + 90-day production-readiness runbook capstone.

Guide chapter Primary Secondary
01 — Production-grade Terraform state R ch.1 (A Path to Production); HashiCorp Terraform S3 backend docs (below) KP ch.28 (Operator); the canonical "Terraform 1.10 use_lockfile" blog (below)
02 — EKS cluster lifecycle R ch.15 (Cluster Operations); AWS EKS Kubernetes Release Calendar (below) L ch.3; KP ch.3 (Declarative Deployment)
03 — EKS add-on management discipline R ch.5 (Pod Networking) + R ch.4 (Container Storage); AWS EKS Add-ons docs (below) L ch.11; Karpenter-shaped material in R ch.13 (Autoscaling)
04 — Storage classes & EBS in production R ch.4 (Container Storage); AWS EBS gp2 → gp3 migration blog (below) L ch.8; KP ch.12 (Stateful Service)
05 — Logging & metrics cost discipline R ch.9 (Observability); AWS CloudWatch pricing (below) KP ch.2 (Predictable Demands)
06 — Cost guardrails R ch.13 (Autoscaling); FinOps Foundation framework + AWS Budgets docs (below) KP ch.2 (Predictable Demands); OpenCost docs
07 — Infrastructure CI/CD + drift detection R ch.1 (A Path to Production); GitHub Actions OIDC docs + Atlantis docs (below) D (CI shape); driftctl docs
08 — VPC endpoints & egress economics R ch.5 (Pod Networking); AWS VPC endpoints docs (below) official AWS PrivateLink docs
09 — ARM/Graviton on EKS R ch.13 (Autoscaling); AWS Graviton docs + Docker buildx multi-platform docs (below) KP ch.2 (Predictable Demands)
10 — GitOps bootstrap on a fresh EKS cluster A ch.10 (Applications at Scale); Argo CD App-of-Apps blog (below) A ch.7 (Sync, Diff, Hooks, Waves)
11 — Multi-region active-active: cloud reality R ch.2 (Deployment Models); AWS Route 53 LBR + Global Accelerator + CloudNativePG ReplicaCluster docs (below) Google SRE Book ch.21 (Handling Overload); Google Cloud multi-region active-active blog
12 — Supply chain security in production R ch.8 (Admission Control); Sigstore + cosign + syft + SLSA framework docs (below) KP ch.25 (Secure Configuration); AWS ECR enhanced scanning docs; Kyverno verifyImages docs
13 — Runtime defense & container security R ch.8 (Admission Control); Falco + Tetragon + AWS GuardDuty for EKS docs (below) KP ch.26 (Access Control); Google SRE Book ch.20 (Load Balancing) (alert routing analogues)
14 — Backup and restore with Velero R ch.15 (Cluster Operations); Velero docs (below) L ch.8 (Persistent Volumes); CloudNativePG backup docs
15 — Cilium / eBPF on EKS R ch.5 (Pod Networking); Cilium + Hubble docs (below) KP ch.24 (Network Segmentation); L ch.11
16 — Developer experience for Kubernetes teams R ch.11 (Building Platform Services); Telepresence + Mirrord + Skaffold + Tilt + Devcontainer docs (below) KP ch.30 (Image Builder); Spotify Backstage adoption case study (below)
17 — Cross-region DR + AWS account baseline + 90-day production-readiness runbook Google SRE Book ch.32 "The Evolving SRE Engagement Model"; AWS Config conformance pack + IAM Access Analyzer docs (below) R ch.1 (A Path to Production); R ch.15 (Cluster Operations)

Official docs: - HashiCorp Terraform S3 backend (state + use_lockfile) — https://developer.hashicorp.com/terraform/language/settings/backends/s3 - AWS EKS Kubernetes Release Calendar — https://docs.aws.amazon.com/eks/latest/userguide/kubernetes-versions.html · EKS standard + extended support — https://docs.aws.amazon.com/eks/latest/userguide/kubernetes-versions-extended.html - AWS EKS Add-ons — https://docs.aws.amazon.com/eks/latest/userguide/eks-add-ons.html · resolve_conflicts_on_update / _on_create reference — https://docs.aws.amazon.com/eks/latest/APIReference/API_UpdateAddon.html - AWS EBS gp2 → gp3 migration — https://aws.amazon.com/blogs/storage/migrate-your-amazon-ebs-volumes-from-gp2-to-gp3-and-save-up-to-20-on-costs/ · gp3 docs — https://docs.aws.amazon.com/ebs/latest/userguide/general-purpose.html#gp3-ebs-volume-type - AWS CloudWatch pricing — https://aws.amazon.com/cloudwatch/pricing/ · CloudWatch Logs retention — https://docs.aws.amazon.com/AmazonCloudWatch/latest/logs/SettingLogRetention.html - AWS Budgets — https://docs.aws.amazon.com/cost-management/latest/userguide/budgets-managing-costs.html · Budgets actions — https://docs.aws.amazon.com/cost-management/latest/userguide/budgets-controls.html - FinOps Foundation framework — https://www.finops.org/framework/ · FinOps maturity model — https://www.finops.org/framework/maturity-model/ - infracost — https://www.infracost.io/docs/ · infracost GitHub Action — https://github.com/infracost/actions - GitHub Actions OIDC for AWS — https://docs.github.com/en/actions/deployment/security-hardening-your-deployments/configuring-openid-connect-in-amazon-web-services · aws-actions/configure-aws-credentialshttps://github.com/aws-actions/configure-aws-credentials - Atlantis (Terraform CI) — https://www.runatlantis.io/docs/ · Atlantis on GitHub — https://github.com/runatlantis/atlantis - driftctl — https://docs.driftctl.com/ · driftctl on GitHub — https://github.com/snyk/driftctl - AWS VPC endpoints — https://docs.aws.amazon.com/vpc/latest/privatelink/concepts.html · Gateway vs Interface endpoints — https://docs.aws.amazon.com/vpc/latest/privatelink/vpce-gateway.html · Interface endpoint pricing — https://aws.amazon.com/privatelink/pricing/ - AWS Graviton (arm64 EC2) — https://aws.amazon.com/ec2/graviton/ · Graviton Ready software — https://aws.amazon.com/ec2/graviton/getting-started/ - Docker buildx multi-platform — https://docs.docker.com/build/building/multi-platform/ · docker buildx --platform reference — https://docs.docker.com/reference/cli/docker/buildx/build/ - Argo CD App-of-Apps pattern — https://argo-cd.readthedocs.io/en/stable/operator-manual/cluster-bootstrapping/ · App-of-Apps canonical blog — https://argo-cd.readthedocs.io/en/stable/operator-manual/declarative-setup/ - AWS Route 53 latency-based routing — https://docs.aws.amazon.com/Route53/latest/DeveloperGuide/routing-policy.html#routing-policy-latency · AWS Global Accelerator — https://docs.aws.amazon.com/global-accelerator/latest/dg/what-is-global-accelerator.html - CloudNativePG cross-region ReplicaClusterhttps://cloudnative-pg.io/documentation/current/replica_cluster/ - Sigstore — https://docs.sigstore.dev/ · cosign — https://docs.sigstore.dev/cosign/overview/ · cosign keyless — https://docs.sigstore.dev/cosign/signing/overview/ - syft (SBOM) — https://github.com/anchore/syft · grype (CVE scanner) — https://github.com/anchore/grype - SLSA framework — https://slsa.dev/spec/v1.0/ · SLSA build levels — https://slsa.dev/spec/v1.0/levels - AWS ECR scanning (basic + enhanced) — https://docs.aws.amazon.com/AmazonECR/latest/userguide/image-scanning.html · enhanced scanning with Inspector — https://docs.aws.amazon.com/AmazonECR/latest/userguide/image-scanning-enhanced.html - Kyverno verifyImageshttps://kyverno.io/docs/writing-policies/verify-images/ · Kyverno policy library — https://kyverno.io/policies/ - Falco — https://falco.org/docs/ · Falco modern eBPF driver — https://falco.org/docs/setup/driver/modern-ebpf/ - Tetragon — https://tetragon.io/docs/overview/ · Tetragon TracingPolicy CRD — https://tetragon.io/docs/concepts/tracing-policy/ - AWS GuardDuty for EKS (Audit + Runtime) — https://docs.aws.amazon.com/guardduty/latest/ug/kubernetes-protection.html · Runtime Monitoring — https://docs.aws.amazon.com/guardduty/latest/ug/runtime-monitoring.html - Velero — https://velero.io/docs/main/ · Velero on AWS (BSL + VSL + Kopia) — https://velero.io/docs/main/csi/ · Velero schedule — https://velero.io/docs/main/api-types/schedule/ - Cilium — https://docs.cilium.io/en/stable/ · Cilium on EKS — https://docs.cilium.io/en/stable/installation/k8s-install-aws-eks/ · Hubble observability — https://docs.cilium.io/en/stable/observability/hubble/ - Telepresence — https://www.telepresence.io/docs/latest/quick-start/ · personal intercepts — https://www.telepresence.io/docs/latest/concepts/intercepts/ - Mirrord — https://mirrord.dev/docs/overview/introduction/ · Mirrord mirror + steal modes — https://mirrord.dev/docs/reference/configuration/#feature-network-incoming - Skaffold — https://skaffold.dev/docs/ · Skaffold sync mode — https://skaffold.dev/docs/filesync/ - Tilt — https://docs.tilt.dev/ · Tiltfile reference — https://docs.tilt.dev/api.html - Devcontainer spec — https://containers.dev/ · devcontainer.json reference — https://containers.dev/implementors/json_reference/ - AWS Config conformance packs — https://docs.aws.amazon.com/config/latest/developerguide/conformance-packs.html - IAM Access Analyzer — https://docs.aws.amazon.com/IAM/latest/UserGuide/what-is-access-analyzer.html

Standout articles: - Google SRE Book — ch.32 "The Evolving SRE Engagement Model" — https://sre.google/sre-book/evolving-sre-engagement-model/ - "Terraform 1.10: native S3 state locking with use_lockfile" — HashiCorp release notes — https://github.com/hashicorp/terraform/releases/tag/v1.10.0 · S3 backend doc note — https://developer.hashicorp.com/terraform/language/settings/backends/s3#s3-bucket-permissions - Spotify's Backstage adoption story (developer-experience case study) — https://backstage.spotify.com/blog/ · https://engineering.atspotify.com/category/backstage/ - "From kubectl to Atlantis: GitOps for Terraform" — Atlantis blog — https://www.runatlantis.io/blog/ - "Cilium on EKS in production" — Isovalent blog — https://isovalent.com/blog/


Part 15 — Day-to-Day Production Operations

The application-side production loop: PR-to-production lifecycle, application CI/CD, image signing + provenance, multi-environment promotion, production Vault + ESO secrets, progressive delivery, rollback layer matrix, feature flags + dark launches, hotfix + breakglass, incident response + on-call, day-to-day ops cadence, and the first-90-days capstone for a team taking over production.

Guide chapter Primary Secondary
01 — The PR-to-production lifecycle D (microservices delivery shape); R ch.1 (A Path to Production) A ch.10 (Applications at Scale); Google SRE Book ch.7 (The Evolution of Automation)
02 — Application CI/CD pipelines D (CI/CD shape); GitHub Actions OIDC docs (below) R ch.1 (A Path to Production); Atlantis docs
03 — Image signing and provenance in CI R ch.8 (Admission Control); Sigstore + cosign docs + SLSA framework (below) KP ch.25 (Secure Configuration); Kyverno verifyImages docs
04 — Multi-environment promotion A ch.7 (Sync, Diff, Hooks, Waves) + A ch.8 (Apps + ApplicationSets); Argo CD ApplicationSet docs (below) R ch.1 (A Path to Production)
05 — Production secrets: Vault + ESO + rotation R ch.7 (Secret Management); HashiCorp Vault + External Secrets Operator docs (below) KP ch.25 (Secure Configuration); KP ch.20 (Configuration Resource)
06 — Progressive delivery in production A ch.10 (Applications at Scale); Argo Rollouts AnalysisTemplate + canary docs (below) R ch.14 (Application Considerations); Google SRE Book ch.27 (Reliable Product Launches at Scale)
07 — Rollback playbook R ch.15 (Cluster Operations); Velero restore + Postgres PITR (CNPG) + AWS S3 versioning docs (below) KP ch.3 (Declarative Deployment); A ch.7 (Sync, Diff, Hooks, Waves)
08 — Feature flags and dark launches R ch.14 (Application Considerations); OpenFeature + Flagsmith / LaunchDarkly / Unleash docs (below) D (microservices feature toggle shape); Charity Majors "test in production" canon (below)
09 — Hotfix workflow and breakglass Google SRE Book ch.14 (Managing Incidents); AWS CloudTrail + IAM breakglass docs (below) R ch.15 (Cluster Operations); R ch.7 (Secret Management) (post-incident rotation)
10 — Incident response & on-call Google SRE Book ch.11 (Being On-Call) + ch.14 (Managing Incidents) + ch.15 (Postmortem Culture); PagerDuty + Incident.io / FireHydrant / Rootly docs (below) Camille Fournier, The Manager's Path (on-call as a team practice) (below)
11 — Day-to-day production operations Google SRE Book ch.16 (Tracking Outages) + ch.17 (Testing for Reliability); AWS Well-Architected Operational Excellence pillar (below) KP ch.2 (Predictable Demands); R ch.13 (Autoscaling)
12 — Capstone: the first 90 days running production Google SRE Book ch.1 (Introduction) + ch.32 (The Evolving SRE Engagement Model); Camille Fournier, The Manager's Path (below) R ch.1 (A Path to Production)

Official docs: - HashiCorp Vault — https://developer.hashicorp.com/vault/docs · Vault Kubernetes auth method — https://developer.hashicorp.com/vault/docs/auth/kubernetes · Vault dynamic database secrets — https://developer.hashicorp.com/vault/docs/secrets/databases - External Secrets Operator — https://external-secrets.io/latest/ · ClusterSecretStore / SecretStorehttps://external-secrets.io/latest/api/clustersecretstore/ · ExternalSecret CRD — https://external-secrets.io/latest/api/externalsecret/ - Argo CD ApplicationSethttps://argo-cd.readthedocs.io/en/stable/operator-manual/applicationset/ · Cluster generator — https://argo-cd.readthedocs.io/en/stable/operator-manual/applicationset/Generators-Cluster/ - Argo Rollouts — https://argoproj.github.io/argo-rollouts/ · AnalysisTemplatehttps://argoproj.github.io/argo-rollouts/features/analysis/ · canary strategy — https://argoproj.github.io/argo-rollouts/features/canary/ · blue-green strategy — https://argoproj.github.io/argo-rollouts/features/bluegreen/ - OpenFeature — https://openfeature.dev/docs/reference/intro · OpenFeature spec — https://openfeature.dev/specification/ - Flagsmith — https://docs.flagsmith.com/ · LaunchDarkly — https://docs.launchdarkly.com/ · Unleash — https://docs.getunleash.io/ - PagerDuty (Alertmanager integration) — https://support.pagerduty.com/main/docs/services-integrations · PagerDuty Events API — https://developer.pagerduty.com/docs/events-api-v2/overview/ - Incident.io — https://incident.io/docs · FireHydrant — https://docs.firehydrant.com/ · Rootly — https://rootly.com/docs - Atlassian Statuspage — https://support.atlassian.com/statuspage/ · Statuspage incident communication — https://www.atlassian.com/incident-management/handbook/incident-communication - Velero restore — https://velero.io/docs/main/restore-reference/ · selective restore — https://velero.io/docs/main/resource-filtering/ - Postgres point-in-time recovery (CloudNativePG) — https://cloudnative-pg.io/documentation/current/recovery/ · backup config — https://cloudnative-pg.io/documentation/current/backup/ - AWS S3 versioning — https://docs.aws.amazon.com/AmazonS3/latest/userguide/Versioning.html · S3 Batch Operations restore — https://docs.aws.amazon.com/AmazonS3/latest/userguide/batch-ops.html - AWS CloudTrail (audit-log immutability) — https://docs.aws.amazon.com/awscloudtrail/latest/userguide/cloudtrail-log-file-integrity-validation.html · CloudTrail log file integrity validation — https://docs.aws.amazon.com/awscloudtrail/latest/userguide/cloudtrail-log-file-integrity-validation-enabling.html - AWS Well-Architected Framework — Operational Excellence pillar — https://docs.aws.amazon.com/wellarchitected/latest/operational-excellence-pillar/welcome.html - Prometheus Alertmanager inhibition + routing — https://prometheus.io/docs/alerting/latest/alertmanager/#inhibition

Standout articles: - Google SRE Book — ch.11 "Being On-Call", ch.14 "Managing Incidents", ch.15 "Postmortem Culture: Learning from Failure" — https://sre.google/sre-book/being-on-call/ · https://sre.google/sre-book/managing-incidents/ · https://sre.google/sre-book/postmortem-culture/ - Charity Majors — "Test in production" canon — https://charity.wtf/2017/07/03/test-in-production-yo/ · "I test in prod" — https://increment.com/testing/i-test-in-production/ - Camille Fournier, The Manager's Path (O'Reilly) — the canonical reference on on-call as a team practice, tech-lead transitions, and incident postmortem culture — https://www.oreilly.com/library/view/the-managers-path/9781491973882/ - "Build, deploy, run: a Charity Majors framework for production excellence" — https://charity.wtf/category/production/ - "The 24-hour postmortem" — Incident.io — https://incident.io/blog/post-mortems-and-learning


How to go deeper (a path through the library)

  1. Get the model first. Read Lukša 2e ch.1–9 alongside Parts 00–03 of this guide — it is the deepest "how it works" within MEAP scope and tracks the guide's foundations/workloads/config arc almost chapter-for-chapter. Poulton is the gentler on-ramp if Part 00 feels fast.
  2. Add the "why" lens. Read Kubernetes Patterns 2e by pattern as each one appears (Health Probe → Part 01 ch.02; Automated Placement → Part 04; Process Containment / Access Control → Part 05; Controller/Operator → Part 08 ch.05). It explains the design rationale the guide applies.
  3. Go to production. Read Production Kubernetes for Parts 05–08 — security hardening, observability, delivery, day-2, multitenancy, supply chain. It is the guide's primary source for the production arc.
  4. Delivery & GitOps. Read Argo CD Up & Running end-to-end for Part 07 ch.04 (it is effectively the sole reference there) and Bootstrapping Microservices for the CI/CD pipeline shape (Part 07 ch.03).
  5. Then breadth — the CNCF landscape. The cloud-native ecosystem is far wider than this guide (service mesh, eBPF, policy, cost, FinOps, platform engineering). Use the CNCF Cloud Native Landscape (https://landscape.cncf.io/) and the CNCF site (https://www.cncf.io/) to see where each tool the guide uses (Prometheus, OpenTelemetry, Argo, Helm, Kustomize, Kyverno, KEDA, Cilium, CloudNativePG, …) sits, and what graduated/incubating alternatives exist.
  6. Certify (optional). Appendix E — Learning paths maps this guide's chapters to the CKAD / CKA / CKS exam domains and gives ordered study tracks.

See also: Appendix E — Learning paths for ordered study routes through the guide, and each chapter's own "Further reading" section for the precise per-chapter citation and official URL.