Appendix D — Further reading¶

A topic → resource map: for each Part of this guide, the most relevant chapter(s) from the reference library plus 1–3 curated official-docs links. The book→chapter mapping mirrors the guide's internal citation map exactly; nothing here is invented. This guide is standalone — these are for going deeper on what a chapter already taught, not prerequisites.

This is a reference (no nine-section anatomy). Each chapter of the guide also ends with its own specific citation + official URL; this appendix is the consolidated, by-Part view.

The library¶

Only these six books are cited anywhere in the guide:

Short	Book
L	Lukša, Kubernetes in Action, 2nd Edition (Manning)
P	Poulton, The Kubernetes Book
KP	Ibryam & Huß, Kubernetes Patterns, 2nd Edition (O'Reilly)
R	Rosso, Lander, Brand & Harris, Production Kubernetes (O'Reilly)
A	Argo CD Up & Running (O'Reilly)
D	Davis, Bootstrapping Microservices (Manning)

A note on Lukša 2e: the MEAP "brief contents" ends at ch.17. Where the guide needs material beyond that (securing the API server / RBAC, securing Pods, GitOps, extending Kubernetes), the citation is at topic granularity ("Lukša 2e, securing-the-API-server material") and a sibling book (Production Kubernetes / Kubernetes Patterns) is the primary — exactly as recorded below.

Part 00 — Foundations¶

Why Kubernetes, containers/images, architecture, control plane, node components, the declarative API model, local cluster setup.

Guide chapter	Primary	Secondary
01 — Why Kubernetes	P ch.1; L ch.1	KP ch.1 (Introduction)
02 — Containers and images	P ch.3; L ch.2 (Understanding containers)	KP ch.30 (Image Builder)
03 — Architecture overview	P ch.2; L ch.3 (Deploying your first application)	R ch.1 (A Path to Production)
04 — Control plane deep dive	L ch.3 + securing-the-API-server material; R ch.1	official docs (below)
05 — Node components	L ch.2, ch.3	R ch.3 (Container Runtime)
06 — The declarative API model	L ch.4 (Introducing Kubernetes API objects)	KP ch.3 (Declarative Deployment)
07 — Local cluster setup	P ch.3; L ch.3	official docs (below)

Official docs: - Kubernetes components & architecture — https://kubernetes.io/docs/concepts/overview/components/ - Working with objects (spec/status, labels, selectors) — https://kubernetes.io/docs/concepts/overview/working-with-objects/ - Install tools; kind https://kind.sigs.k8s.io/docs/user/quick-start/, k3d https://k3d.io/, kubectl https://kubernetes.io/docs/tasks/tools/

Part 01 — Core Workloads¶

Pods, health/lifecycle, resources/QoS, ReplicaSets/Deployments, StatefulSets, DaemonSets, Jobs/CronJobs, deployment strategies.

Guide chapter	Primary	Secondary
01 — Pods	L ch.5 (Running workloads in Pods)	KP ch.15/16/17/18 (Init/Sidecar/Adapter/Ambassador)
02 — Health and lifecycle	L ch.6 (Managing the Pod lifecycle)	KP ch.4 (Health Probe), ch.5 (Managed Lifecycle)
03 — Resources and QoS	L (resource-management material); official docs	KP ch.2 (Predictable Demands)
04 — ReplicaSets and Deployments	L ch.13 (ReplicaSets), ch.14 (Deployments)	KP ch.3 (Declarative Deployment)
05 — StatefulSets	L ch.15 (Deploying stateful workloads with StatefulSets)	KP ch.12 (Stateful Service)
06 — DaemonSets	L ch.16 (Deploying node agents and daemons with DaemonSets)	KP ch.9 (Daemon Service)
07 — Jobs and CronJobs	L ch.17 (Running finite workloads with Jobs and CronJobs)	KP ch.7 (Batch Job), ch.8 (Periodic Job)
08 — Deployment strategies	KP ch.3 (Declarative Deployment)	R ch.14 (Application Considerations); L ch.14

Official docs: - Workloads overview — https://kubernetes.io/docs/concepts/workloads/ - Configure liveness/readiness/startup probes — https://kubernetes.io/docs/tasks/configure-pod-container/configure-liveness-readiness-startup-probes/ - Resource management for Pods/Containers — https://kubernetes.io/docs/concepts/configuration/manage-resources-containers/

Part 02 — Networking¶

The networking model, Services, DNS/discovery, Ingress, Gateway API, network policies.

Guide chapter	Primary	Secondary
01 — The networking model	R ch.5 (Pod Networking)	L ch.11; official docs (CNI)
02 — Services	L ch.11 (Exposing Pods with Services)	R ch.6 (Service Routing); KP ch.13 (Service Discovery)
03 — DNS and service discovery	L ch.11	KP ch.13 (Service Discovery); official docs (CoreDNS)
04 — Ingress	L ch.12 (Exposing Services with Ingress)	R ch.6 (Service Routing)
05 — Gateway API	official docs (Gateway API — primary)	R ch.6 (Service Routing)
06 — Network policies	KP ch.24 (Network Segmentation)	R ch.5 (Pod Networking); L ch.11

Official docs: - Services, Ingress & networking — https://kubernetes.io/docs/concepts/services-networking/ - Gateway API — https://gateway-api.sigs.k8s.io/ - Network Policies — https://kubernetes.io/docs/concepts/services-networking/network-policies/

Part 03 — Config and Storage¶

ConfigMaps, Secrets, volumes, persistent storage, stateful data patterns.

Guide chapter	Primary	Secondary
01 — ConfigMaps	L ch.9 (Configuration via ConfigMaps, Secrets, and the Downward API)	KP ch.20 (Configuration Resource), ch.19 (EnvVar Configuration)
02 — Secrets	L ch.9	R ch.7 (Secret Management); KP ch.25 (Secure Configuration)
03 — Volumes	L ch.7 (Attaching storage volumes to Pods)	KP ch.20 (Configuration Resource)
04 — Persistent storage	L ch.8 (Persisting data in PersistentVolumes)	R ch.4 (Container Storage)
05 — Stateful data patterns	KP ch.12 (Stateful Service)	R ch.4 (Container Storage); R ch.16 (Platform Abstractions)

Official docs: - ConfigMaps — https://kubernetes.io/docs/concepts/configuration/configmap/ · Secrets — https://kubernetes.io/docs/concepts/configuration/secret/ - Storage (volumes, PV/PVC, StorageClass, CSI) — https://kubernetes.io/docs/concepts/storage/ - Volume snapshots — https://kubernetes.io/docs/concepts/storage/volume-snapshots/

Part 04 — Scheduling¶

The scheduler & nodes, affinity/taints/topology, priority & preemption.

Guide chapter	Primary	Secondary
01 — The scheduler and nodes	KP ch.6 (Automated Placement)	L (scheduling material); official docs (kube-scheduler)
02 — Affinity, taints, topology	KP ch.6 (Automated Placement)	official docs (assign Pods to nodes, topology spread)
03 — Priority and preemption	KP ch.6 (Automated Placement)	official docs (Pod priority & preemption)

Official docs: - Scheduling, preemption & eviction — https://kubernetes.io/docs/concepts/scheduling-eviction/ - Assigning Pods to nodes (affinity, taints/tolerations, topology spread) — https://kubernetes.io/docs/concepts/scheduling-eviction/assign-pod-node/ - Pod priority & preemption — https://kubernetes.io/docs/concepts/scheduling-eviction/pod-priority-preemption/

Part 05 — Security¶

AuthN/AuthZ/RBAC, pod security, supply chain, secrets & cluster hardening.

Guide chapter	Primary	Secondary
01 — Authn, authz, RBAC	KP ch.26 (Access Control)	R ch.10 (Identity); L securing-the-API-server material
02 — Pod security	KP ch.23 (Process Containment)	R ch.8 (Admission Control); official docs (Pod Security Standards)
03 — Supply chain	R ch.15 (Software Supply Chain)	KP ch.30 (Image Builder); official docs (Kyverno/Cosign)
04 — Secrets and cluster hardening	R ch.7 (Secret Management) + R ch.8 (Admission Control)	KP ch.25 (Secure Configuration); CIS Kubernetes Benchmark

Official docs: - Authentication & authorization (RBAC) — https://kubernetes.io/docs/reference/access-authn-authz/rbac/ - Pod Security Standards & Admission — https://kubernetes.io/docs/concepts/security/pod-security-standards/ - Cloud Native security & supply chain — https://kubernetes.io/docs/concepts/security/ · Sigstore/Cosign https://docs.sigstore.dev/ · Kyverno https://kyverno.io/docs/

Part 06 — Production Readiness¶

Metrics, logging, tracing, autoscaling, reliability & disruptions, capacity & cost.

Guide chapter	Primary	Secondary
01 — Observability: metrics	R ch.9 (Observability)	official docs (Prometheus)
02 — Logging	R ch.9 (Observability)	official docs (logging architecture)
03 — Tracing	R ch.9 (Observability)	official docs (OpenTelemetry)
04 — Autoscaling	KP ch.29 (Elastic Scale)	R ch.13 (Autoscaling); official docs (HPA/KEDA)
05 — Reliability and disruptions	R ch.14 (Application Considerations)	KP ch.10 (Singleton Service); official docs (PDB)
06 — Capacity and cost	R ch.13 (Autoscaling) + R ch.12 (Multitenancy)	KP ch.2 (Predictable Demands); OpenCost docs

Official docs: - Metrics & the metrics-server — https://kubernetes.io/docs/tasks/debug/debug-cluster/resource-metrics-pipeline/ · Prometheus https://prometheus.io/docs/ · OpenTelemetry https://opentelemetry.io/docs/ - Logging architecture — https://kubernetes.io/docs/concepts/cluster-administration/logging/ - HPA — https://kubernetes.io/docs/tasks/run-application/horizontal-pod-autoscale/ · PodDisruptionBudget — https://kubernetes.io/docs/concepts/workloads/pods/disruptions/ · KEDA https://keda.sh/docs/

Part 07 — Delivery¶

Helm, Kustomize, CI/CD, GitOps with Argo CD, progressive delivery.

Guide chapter	Primary	Secondary
01 — Packaging with Helm	R ch.11 (Building Platform Services)	official docs (Helm)
02 — Packaging with Kustomize	R ch.11 (Building Platform Services)	official docs (Kustomize)
03 — CI/CD pipeline	D (CI/CD pipeline shape)	R ch.15 (Software Supply Chain)
04 — GitOps with Argo CD	A ch.1–10 (the whole book — primary)	R ch.11 (Building Platform Services)
05 — Progressive delivery	R ch.14 (Application Considerations)	A ch.5 (Synchronizing Applications); official docs (Argo Rollouts)

Argo CD Up & Running chapter anchors: ch.1 Introduction · ch.2 Installing · ch.3 Core Concepts (App/Project/sync) · ch.4 Managing Applications · ch.5 Synchronizing Applications · ch.6 Access Control/RBAC & Projects · ch.9 declarative install · ch.10 Applications at Scale (App-of-Apps, ApplicationSet) · ch.12 Integrating CI · ch.13 Operationalizing. The guide cites Argo CD at book + topic granularity, which is unambiguous.

Official docs: - Helm — https://helm.sh/docs/ (charts, hooks, best practices) - Kustomize — https://kubectl.docs.kubernetes.io/references/kustomize/ · https://kustomize.io - Argo CD — https://argo-cd.readthedocs.io/ · Argo Rollouts — https://argo-rollouts.readthedocs.io/

Part 08 — Day-2 Operations¶

Cluster lifecycle, backup & DR, troubleshooting, multi-tenancy, operators & CRDs.

Guide chapter	Primary	Secondary
01 — Cluster lifecycle	R ch.2 (Deployment Models)	official docs (version skew, upgrades)
02 — Backup and DR	R ch.4 (Container Storage) + R ch.2 (Deployment Models)	official docs (etcd backup, Velero)
03 — Troubleshooting playbook	R ch.9 (Observability) — co-primary, the operations/method perspective the chapter actually uses (observe→isolate→hypothesize→test→fix; alert→runbook) — with L (debugging material across ch.5/6/11) for Pod-status/Events/probes mechanics	official docs (debug Pods/Services)
04 — Multi-tenancy and namespaces	R ch.12 (Multitenancy)	KP ch.26 (Access Control)
05 — Operators and CRDs	KP ch.27 (Controller) + ch.28 (Operator)	R ch.11 (Building Platform Services); official docs (CRD/operator)

Official docs: - Cluster administration & upgrades, version skew — https://kubernetes.io/docs/setup/release/version-skew-policy/ - Backing up etcd — https://kubernetes.io/docs/tasks/administer-cluster/configure-upgrade-etcd/ · Velero https://velero.io/docs/ - Debug running Pods (ephemeral containers / kubectl debug) — https://kubernetes.io/docs/tasks/debug/debug-application/debug-running-pod/ - Extend Kubernetes / CRDs & operators — https://kubernetes.io/docs/concepts/extend-kubernetes/

Part 09 — Capstone¶

The whole system, end to end.

Guide chapter	Primary	Secondary
01 — Bookstore end-to-end	R ch.1 (A Path to Production)	A (whole book); a recap of all parts

Official docs: - Production environment checklist — https://kubernetes.io/docs/setup/production-environment/ - Configuration & cluster-administration best practices — https://kubernetes.io/docs/concepts/configuration/overview/

Part 10 — Cloud & Managed Kubernetes¶

The shared-responsibility model, IaC for managed clusters, cloud identity for workloads, cloud CNI / load balancing / storage, and node autoscaling / cost / multi-cloud.

Guide chapter	Primary	Secondary
01 — The managed Kubernetes model	R ch.2 (Deployment Models)	KP ch.1; provider docs (below)
02 — Provisioning and IaC	R ch.2 (Deployment Models)	provider CLI docs; Terraform docs
03 — Cloud identity for workloads	R ch.10 (Identity)	KP ch.26 (Access Control); provider pod-identity docs
04 — Cloud networking and load balancing	R ch.5 (Pod Networking) + R ch.6 (Service Routing)	provider CNI docs; Cilium docs
05 — Cloud storage and data	R ch.4 (Container Storage)	provider CSI docs
06 — Node autoscaling, cost & multi-cloud	R ch.13 (Autoscaling) + R ch.2 (Deployment Models)	KP ch.29 (Elastic Scale); Karpenter docs; OpenCost docs

Official docs: - EKS — https://docs.aws.amazon.com/eks/ · GKE — https://cloud.google.com/kubernetes-engine/docs · AKS — https://learn.microsoft.com/azure/aks/ - IRSA — https://docs.aws.amazon.com/eks/latest/userguide/iam-roles-for-service-accounts.html · EKS Pod Identity — https://docs.aws.amazon.com/eks/latest/userguide/pod-identities.html · GKE Workload Identity — https://cloud.google.com/kubernetes-engine/docs/concepts/workload-identity · Azure AD Workload Identity — https://azure.github.io/azure-workload-identity/docs/ - AWS VPC CNI — https://github.com/aws/amazon-vpc-cni-k8s · AWS Load Balancer Controller — https://kubernetes-sigs.github.io/aws-load-balancer-controller/ · GKE Dataplane V2 — https://cloud.google.com/kubernetes-engine/docs/concepts/dataplane-v2 · Azure CNI Overlay — https://learn.microsoft.com/azure/aks/azure-cni-overlay - EBS CSI — https://github.com/kubernetes-sigs/aws-ebs-csi-driver · GCE PD CSI — https://github.com/kubernetes-sigs/gcp-compute-persistent-disk-csi-driver · Azure Disk CSI — https://github.com/kubernetes-sigs/azuredisk-csi-driver - Karpenter — https://karpenter.sh/docs/ · Cluster Autoscaler — https://kubernetes.io/docs/concepts/cluster-administration/cluster-autoscaling/ · OpenCost — https://www.opencost.io/docs/ - Terraform — https://developer.hashicorp.com/terraform/docs · eksctl — https://eksctl.io/ · Crossplane — https://docs.crossplane.io/

Standout articles: - Karpenter consolidation deep-dive — https://aws.amazon.com/blogs/containers/optimizing-your-kubernetes-compute-costs-with-karpenter-consolidation/ - "Kubernetes Networking on AWS" (the canonical VPC-CNI / IP-density walkthrough) — https://aws.amazon.com/blogs/containers/amazon-vpc-cni-increases-pods-per-node-limits/ - Pod-identity for the three clouds, compared — https://kubernetes.io/blog/2022/12/22/pod-security-admission-stable/ (PSA context) + the provider docs above.

Part 11 — Advanced Production Patterns¶

Admission webhooks, operator development (build, not consume), APF, service mesh, secrets at scale, multi-cluster fleets, chaos engineering, HA control plane / etcd ops, performance & scalability, and platform engineering.

Guide chapter	Primary	Secondary
01 — Admission webhooks	R ch.8 (Admission Control)	KP ch.25 (Secure Configuration); official docs (webhooks, VAP)
02 — Operator development	KP ch.27 (Controller) + KP ch.28 (Operator)	R ch.11 (Building Platform Services); Kubebuilder Book
03 — API Priority and Fairness	official docs (APF)	R ch.1 (Path to Production) (control-plane context)
04 — Service mesh	R ch.6 (Service Routing) + R ch.10 (Identity)	KP ch.13 (Service Discovery); Istio / Linkerd / SPIFFE docs
05 — Secrets at scale	R ch.7 (Secret Management)	KP ch.25 (Secure Configuration); ESO + Vault docs
06 — Multi-cluster and fleet	R ch.2 (Deployment Models) + R ch.12 (Multitenancy)	A ch.10 (Applications at Scale: App-of-Apps, ApplicationSet)
07 — Chaos engineering	R ch.14 (Application Considerations)	KP ch.10 (Singleton Service); Chaos Mesh docs
08 — HA control plane and etcd	R ch.2 (Deployment Models)	official docs (etcd, kubeadm HA)
09 — Performance and scalability	R ch.9 (Observability)	KP ch.6 (Automated Placement); Cilium / kube-proxy docs
10 — Platform engineering	R ch.11 (Building Platform Services)	KP ch.28 (Operator); Crossplane / Backstage docs

Official docs: - Admission control — https://kubernetes.io/docs/reference/access-authn-authz/admission-controllers/ · Dynamic admission webhooks — https://kubernetes.io/docs/reference/access-authn-authz/extensible-admission-controllers/ · ValidatingAdmissionPolicy — https://kubernetes.io/docs/reference/access-authn-authz/validating-admission-policy/ - Kubebuilder Book — https://book.kubebuilder.io/ · controller-runtime — https://pkg.go.dev/sigs.k8s.io/controller-runtime · Operator SDK — https://sdk.operatorframework.io/docs/ - API Priority and Fairness — https://kubernetes.io/docs/concepts/cluster-administration/flow-control/ - Istio — https://istio.io/latest/docs/ · Istio Ambient — https://istio.io/latest/docs/ambient/ · Linkerd — https://linkerd.io/2/overview/ · SPIFFE/SPIRE — https://spiffe.io/docs/ - External Secrets Operator — https://external-secrets.io/latest/ · Vault on Kubernetes — https://developer.hashicorp.com/vault/docs/platform/k8s · CSI Secrets Store driver — https://secrets-store-csi-driver.sigs.k8s.io/ - Argo CD ApplicationSet — https://argo-cd.readthedocs.io/en/stable/operator-manual/applicationset/ · Karmada — https://karmada.io/docs/ · Cluster API — https://cluster-api.sigs.k8s.io/ - Chaos Mesh — https://chaos-mesh.org/docs/ · Litmus — https://docs.litmuschaos.io/ · Principles of Chaos Engineering — https://principlesofchaos.org/ - etcd operations — https://etcd.io/docs/latest/op-guide/ · etcd backup/restore — https://kubernetes.io/docs/tasks/administer-cluster/configure-upgrade-etcd/ - Cilium — https://docs.cilium.io/ · kube-proxy IPVS — https://kubernetes.io/docs/reference/networking/virtual-ips/ · API server scalability — https://kubernetes.io/docs/setup/best-practices/cluster-large/ - Crossplane — https://docs.crossplane.io/ · Backstage — https://backstage.io/docs/overview/what-is-backstage · Team Topologies — https://teamtopologies.com/

Standout articles: - "The Operator Pattern" original CoreOS post — https://kubernetes.io/docs/concepts/extend-kubernetes/operator/ (the official write-up) - Istio ambient announcement / architecture overview — https://istio.io/latest/blog/2022/introducing-ambient-mesh/ - Manuel Pais on Internal Developer Platforms / Team Topologies — https://teamtopologies.com/key-concepts-content/platform-as-a-product - "Principles of Chaos Engineering" — https://principlesofchaos.org/ - Kelsey Hightower, "Kubernetes The Hard Way" (for control-plane internals you can map onto HA) — https://github.com/kelseyhightower/kubernetes-the-hard-way

Part 12 — Kubernetes for Machine Learning¶

ML workload taxonomy, GPUs and accelerators, batch / gang scheduling, distributed training, notebooks, model serving (KServe), pipelines (Argo Workflows / Kubeflow Pipelines), ML platform / cost / MLOps capstone.

Guide chapter	Primary	Secondary
01 — Why ML on Kubernetes	KP ch.7 (Batch Job) + ch.29 (Elastic Scale)	R ch.11 (Building Platform Services); official docs (workloads)
02 — GPUs and accelerators	official docs (device plugins, scheduling GPUs)	NVIDIA GPU Operator docs; KP ch.6 (Automated Placement)
03 — Batch and gang scheduling	KP ch.7 (Batch Job) + KP ch.6 (Automated Placement)	Kueue docs; JobSet docs; Volcano docs
04 — Distributed training	KP ch.7 (Batch Job)	Kubeflow Training Operator docs; KubeRay / Ray Train docs
05 — Notebooks and interactive ML	R ch.4 (Container Storage) (PVC-backed dev envs)	JupyterHub z2jh docs; Kubeflow Notebooks
06 — Model serving and inference	R ch.14 (Application Considerations) + KP ch.29 (Elastic Scale)	KServe docs; Seldon Core docs; Triton docs
07 — ML pipelines and workflows	A (workflow patterns from Argo)	Argo Workflows / Argo Events docs; Kubeflow Pipelines docs
08 — ML platform, cost, and MLOps capstone	R ch.11 (Building Platform Services) + R ch.12 (Multitenancy)	KP ch.28 (Operator); MLflow docs; OpenCost docs

Official docs: - Scheduling GPUs / device plugins — https://kubernetes.io/docs/concepts/scheduling-eviction/dynamic-resource-allocation/ · https://kubernetes.io/docs/tasks/manage-gpus/scheduling-gpus/ · https://kubernetes.io/docs/concepts/extend-kubernetes/compute-storage-net/device-plugins/ - NVIDIA GPU Operator — https://docs.nvidia.com/datacenter/cloud-native/gpu-operator/latest/ · Node Feature Discovery — https://kubernetes-sigs.github.io/node-feature-discovery/stable/get-started/ · DCGM Exporter — https://github.com/NVIDIA/dcgm-exporter - Kueue — https://kueue.sigs.k8s.io/docs/ · JobSet — https://jobset.sigs.k8s.io/docs/ · Volcano — https://volcano.sh/en/docs/ - Kubeflow — https://www.kubeflow.org/docs/ · Kubeflow Training Operator — https://www.kubeflow.org/docs/components/training/ · Katib — https://www.kubeflow.org/docs/components/katib/ · Kubeflow Notebooks — https://www.kubeflow.org/docs/components/notebooks/ · Kubeflow Pipelines — https://www.kubeflow.org/docs/components/pipelines/ - Ray — https://docs.ray.io/ · KubeRay — https://docs.ray.io/en/latest/cluster/kubernetes/index.html - JupyterHub Zero-to-JupyterHub — https://z2jh.jupyter.org/en/stable/ · KubeSpawner — https://jupyterhub-kubespawner.readthedocs.io/ - KServe — https://kserve.github.io/website/ · Seldon Core — https://docs.seldon.io/projects/seldon-core/en/latest/ · NVIDIA Triton — https://docs.nvidia.com/deeplearning/triton-inference-server/user-guide/docs/ - Argo Workflows — https://argo-workflows.readthedocs.io/ · Argo Events — https://argoproj.github.io/argo-events/ · Tekton Pipelines — https://tekton.dev/docs/pipelines/ - MLflow — https://mlflow.org/docs/latest/ · OpenCost — https://www.opencost.io/docs/ · Alibi-Detect — https://docs.seldon.io/projects/alibi-detect/en/stable/ · Evidently — https://docs.evidentlyai.com/

Standout articles: - "MLOps: Continuous delivery and automation pipelines in machine learning" (Google Cloud — the L0/L1/L2 maturity model) — https://cloud.google.com/architecture/mlops-continuous-delivery-and-automation-pipelines-in-machine-learning - KServe canary rollouts — https://kserve.github.io/website/latest/modelserving/v1beta1/rollout/canary/ - Kubeflow Pipelines artifact + metadata model — https://www.kubeflow.org/docs/components/pipelines/concepts/metadata/ - Argo Workflows artifacts (input / output, S3 / GCS / PVC) — https://argo-workflows.readthedocs.io/en/latest/walk-through/artifacts/ - NVIDIA, "Best Practices for GPU-accelerated Kubernetes" — https://docs.nvidia.com/datacenter/cloud-native/gpu-operator/latest/overview.html

Part 13 — Grand Capstone: Bookstore Platform v2¶

The production e-commerce platform: tenancy + multi-region, Keycloak + IRSA + mesh JWT, CDC-driven search, Kafka outbox payments, edge WAF + rate limiting, the closed ML loop, three-pillar OTel observability, OpenCost FinOps, Backstage as the developer portal, and the day-2 runbook / on-call / DR / chaos discipline.

Guide chapter	Primary	Secondary
01 — Bookstore 2.0: from toy to platform	R ch.1 (A Path to Production) + R ch.11 (Building Platform Services)	KP ch.28 (Operator); Google SRE Book ch.1 (Introduction)
02 — Tenancy and Crossplane onboarding	R ch.12 (Multitenancy) + R ch.11 (Building Platform Services)	KP ch.28 (Operator); Crossplane docs
03 — Multi-region active-active	R ch.2 (Deployment Models)	A ch.10 (Applications at Scale); CloudNativePG docs; Google SRE Book ch.21 (Handling Overload)
04 — Real auth: Keycloak OIDC + IRSA + Istio JWT	R ch.10 (Identity)	KP ch.26 (Access Control); Keycloak docs; Istio security docs
05 — Search and product discovery	KP ch.13 (Service Discovery) + KP ch.12 (Stateful Service)	Debezium docs; Strimzi docs; Meilisearch docs
06 — Payments and event sourcing	KP ch.12 (Stateful Service) + D (microservices event flow)	Strimzi docs; Debezium outbox-pattern article (below); Stripe webhooks docs
07 — Edge: Istio Gateway + Coraza WAF + rate limiting	R ch.6 (Service Routing) + R ch.8 (Admission Control) (policy posture)	Istio Gateway API docs; Coraza docs; OWASP CRS docs
08 — Real ML loop	R ch.14 (Application Considerations) (canary) + KP ch.29 (Elastic Scale)	KServe canary docs; MLflow Registry docs; Alibi-Detect docs
09 — Observability: OTel + Tempo + Loki + Prometheus + Grafana	R ch.9 (Observability)	Google SRE Book ch.6 (Monitoring Distributed Systems); OpenTelemetry / Tempo / Loki docs
10 — Cost: OpenCost per-tenant FinOps	R ch.12 (Multitenancy) + R ch.13 (Autoscaling)	KP ch.2 (Predictable Demands); OpenCost docs; FinOps Foundation framework
11 — Backstage developer portal	R ch.11 (Building Platform Services)	KP ch.28 (Operator); Backstage docs; Team Topologies
12 — Day-2: runbook + on-call + DR + chaos	Google SRE Book ch.11 (Being On-Call) + ch.14 (Managing Incidents) + ch.15 (Postmortem Culture)	R ch.14 (Application Considerations); Chaos Mesh docs

Official docs: - Keycloak — https://www.keycloak.org/documentation · Keycloak realms / clients — https://www.keycloak.org/docs/latest/server_admin/ · Keycloak Operator (Kubernetes) — https://www.keycloak.org/operator/installation - Crossplane — https://docs.crossplane.io/ · Compositions / XRDs — https://docs.crossplane.io/latest/concepts/compositions/ · Crossplane v2 (composition functions) — https://docs.crossplane.io/latest/concepts/composition-functions/ - CloudNativePG — https://cloudnative-pg.io/docs/ · ReplicaCluster (cross-region) — https://cloudnative-pg.io/documentation/current/replica_cluster/ - Strimzi — https://strimzi.io/docs/operators/latest/overview · KafkaConnect / KafkaConnector — https://strimzi.io/docs/operators/latest/configuring#assembly-deployment-configuration-kafka-connect-str - Debezium — https://debezium.io/documentation/reference/stable/ · Postgres connector — https://debezium.io/documentation/reference/stable/connectors/postgresql.html - Meilisearch — https://www.meilisearch.com/docs · Meilisearch on Kubernetes — https://github.com/meilisearch/meilisearch-kubernetes - Istio Gateway API — https://istio.io/latest/docs/tasks/traffic-management/ingress/gateway-api/ · RequestAuthentication / AuthorizationPolicy — https://istio.io/latest/docs/tasks/security/authentication/authn-policy/ · https://istio.io/latest/docs/tasks/security/authorization/ - Coraza WAF — https://coraza.io/docs/ · Istio + Coraza Wasm plugin — https://github.com/corazawaf/coraza-proxy-wasm - OWASP Core Rule Set — https://coreruleset.org/docs/ · OWASP ModSecurity CRS GitHub — https://github.com/coreruleset/coreruleset - MLflow — https://mlflow.org/docs/latest/ · Model Registry — https://mlflow.org/docs/latest/model-registry.html - KServe — https://kserve.github.io/website/ · KServe canary rollouts — https://kserve.github.io/website/latest/modelserving/v1beta1/rollout/canary/ - Alibi-Detect — https://docs.seldon.io/projects/alibi-detect/en/stable/ · Evidently — https://docs.evidentlyai.com/ - OpenTelemetry — https://opentelemetry.io/docs/ · OTel Collector — https://opentelemetry.io/docs/collector/ · OTLP — https://opentelemetry.io/docs/specs/otlp/ - Grafana Tempo — https://grafana.com/docs/tempo/latest/ · Grafana Loki — https://grafana.com/docs/loki/latest/ · Grafana variables — https://grafana.com/docs/grafana/latest/dashboards/variables/ - Prometheus Alertmanager (inhibition / routing) — https://prometheus.io/docs/alerting/latest/alertmanager/ - OpenCost — https://www.opencost.io/docs/ · OpenCost on Kubernetes — https://www.opencost.io/docs/installation/install - FinOps Foundation framework — https://www.finops.org/framework/ · FinOps Foundation maturity model — https://www.finops.org/framework/maturity-model/ - Backstage — https://backstage.io/docs/overview/what-is-backstage · Software Catalog — https://backstage.io/docs/features/software-catalog/ · Scaffolder — https://backstage.io/docs/features/software-templates/ · TechDocs — https://backstage.io/docs/features/techdocs/techdocs-overview - Chaos Mesh — https://chaos-mesh.org/docs/ · Litmus — https://docs.litmuschaos.io/ - Stripe sandbox + webhooks — https://stripe.com/docs/webhooks · webhook signature verification — https://stripe.com/docs/webhooks/signatures

Standout articles: - Google SRE Book — ch.11 "Being On-Call", ch.14 "Managing Incidents", ch.15 "Postmortem Culture: Learning from Failure" — https://sre.google/sre-book/being-on-call/ · https://sre.google/sre-book/managing-incidents/ · https://sre.google/sre-book/postmortem-culture/ - FinOps Foundation — the FinOps framework + maturity model — https://www.finops.org/framework/ - Spotify's Backstage adoption story — https://backstage.spotify.com/blog/ (and the canonical "Spotify Engineering Culture" backstage posts at https://engineering.atspotify.com/category/backstage/) - Debezium outbox-pattern canonical post — https://debezium.io/blog/2019/02/19/reliable-microservices-data-exchange-with-the-outbox-pattern/ - Google Cloud Architecture Center — multi-region active-active patterns — https://cloud.google.com/architecture/disaster-recovery · https://cloud.google.com/architecture/multi-regional-active-active-design - Istio ambient + Coraza WAF — https://istio.io/latest/blog/ (browse for the WAF + ambient announcements)

Part 14 — EKS in Production: A-Z¶

Terraform state hygiene, EKS version lifecycle, add-on discipline, storage, log cost, cost guardrails, infrastructure CI/CD + drift, VPC endpoints, Graviton, GitOps bootstrap, multi-region cloud reality, supply chain in CI, runtime defense, Velero, Cilium/eBPF, developer experience, and the cross- region DR + AWS account baseline + 90-day production-readiness runbook capstone.

Guide chapter	Primary	Secondary
01 — Production-grade Terraform state	R ch.1 (A Path to Production); HashiCorp Terraform S3 backend docs (below)	KP ch.28 (Operator); the canonical "Terraform 1.10 `use_lockfile`" blog (below)
02 — EKS cluster lifecycle	R ch.15 (Cluster Operations); AWS EKS Kubernetes Release Calendar (below)	L ch.3; KP ch.3 (Declarative Deployment)
03 — EKS add-on management discipline	R ch.5 (Pod Networking) + R ch.4 (Container Storage); AWS EKS Add-ons docs (below)	L ch.11; Karpenter-shaped material in R ch.13 (Autoscaling)
04 — Storage classes & EBS in production	R ch.4 (Container Storage); AWS EBS gp2 → gp3 migration blog (below)	L ch.8; KP ch.12 (Stateful Service)
05 — Logging & metrics cost discipline	R ch.9 (Observability); AWS CloudWatch pricing (below)	KP ch.2 (Predictable Demands)
06 — Cost guardrails	R ch.13 (Autoscaling); FinOps Foundation framework + AWS Budgets docs (below)	KP ch.2 (Predictable Demands); OpenCost docs
07 — Infrastructure CI/CD + drift detection	R ch.1 (A Path to Production); GitHub Actions OIDC docs + Atlantis docs (below)	D (CI shape); driftctl docs
08 — VPC endpoints & egress economics	R ch.5 (Pod Networking); AWS VPC endpoints docs (below)	official AWS PrivateLink docs
09 — ARM/Graviton on EKS	R ch.13 (Autoscaling); AWS Graviton docs + Docker buildx multi-platform docs (below)	KP ch.2 (Predictable Demands)
10 — GitOps bootstrap on a fresh EKS cluster	A ch.10 (Applications at Scale); Argo CD App-of-Apps blog (below)	A ch.7 (Sync, Diff, Hooks, Waves)
11 — Multi-region active-active: cloud reality	R ch.2 (Deployment Models); AWS Route 53 LBR + Global Accelerator + CloudNativePG `ReplicaCluster` docs (below)	Google SRE Book ch.21 (Handling Overload); Google Cloud multi-region active-active blog
12 — Supply chain security in production	R ch.8 (Admission Control); Sigstore + cosign + syft + SLSA framework docs (below)	KP ch.25 (Secure Configuration); AWS ECR enhanced scanning docs; Kyverno verifyImages docs
13 — Runtime defense & container security	R ch.8 (Admission Control); Falco + Tetragon + AWS GuardDuty for EKS docs (below)	KP ch.26 (Access Control); Google SRE Book ch.20 (Load Balancing) (alert routing analogues)
14 — Backup and restore with Velero	R ch.15 (Cluster Operations); Velero docs (below)	L ch.8 (Persistent Volumes); CloudNativePG backup docs
15 — Cilium / eBPF on EKS	R ch.5 (Pod Networking); Cilium + Hubble docs (below)	KP ch.24 (Network Segmentation); L ch.11
16 — Developer experience for Kubernetes teams	R ch.11 (Building Platform Services); Telepresence + Mirrord + Skaffold + Tilt + Devcontainer docs (below)	KP ch.30 (Image Builder); Spotify Backstage adoption case study (below)
17 — Cross-region DR + AWS account baseline + 90-day production-readiness runbook	Google SRE Book ch.32 "The Evolving SRE Engagement Model"; AWS Config conformance pack + IAM Access Analyzer docs (below)	R ch.1 (A Path to Production); R ch.15 (Cluster Operations)

Standout articles: - Google SRE Book — ch.32 "The Evolving SRE Engagement Model" — https://sre.google/sre-book/evolving-sre-engagement-model/ - "Terraform 1.10: native S3 state locking with use_lockfile" — HashiCorp release notes — https://github.com/hashicorp/terraform/releases/tag/v1.10.0 · S3 backend doc note — https://developer.hashicorp.com/terraform/language/settings/backends/s3#s3-bucket-permissions - Spotify's Backstage adoption story (developer-experience case study) — https://backstage.spotify.com/blog/ · https://engineering.atspotify.com/category/backstage/ - "From kubectl to Atlantis: GitOps for Terraform" — Atlantis blog — https://www.runatlantis.io/blog/ - "Cilium on EKS in production" — Isovalent blog — https://isovalent.com/blog/

Part 15 — Day-to-Day Production Operations¶

The application-side production loop: PR-to-production lifecycle, application CI/CD, image signing + provenance, multi-environment promotion, production Vault + ESO secrets, progressive delivery, rollback layer matrix, feature flags + dark launches, hotfix + breakglass, incident response + on-call, day-to-day ops cadence, and the first-90-days capstone for a team taking over production.

Guide chapter	Primary	Secondary
01 — The PR-to-production lifecycle	D (microservices delivery shape); R ch.1 (A Path to Production)	A ch.10 (Applications at Scale); Google SRE Book ch.7 (The Evolution of Automation)
02 — Application CI/CD pipelines	D (CI/CD shape); GitHub Actions OIDC docs (below)	R ch.1 (A Path to Production); Atlantis docs
03 — Image signing and provenance in CI	R ch.8 (Admission Control); Sigstore + cosign docs + SLSA framework (below)	KP ch.25 (Secure Configuration); Kyverno verifyImages docs
04 — Multi-environment promotion	A ch.7 (Sync, Diff, Hooks, Waves) + A ch.8 (Apps + ApplicationSets); Argo CD `ApplicationSet` docs (below)	R ch.1 (A Path to Production)
05 — Production secrets: Vault + ESO + rotation	R ch.7 (Secret Management); HashiCorp Vault + External Secrets Operator docs (below)	KP ch.25 (Secure Configuration); KP ch.20 (Configuration Resource)
06 — Progressive delivery in production	A ch.10 (Applications at Scale); Argo Rollouts `AnalysisTemplate` + canary docs (below)	R ch.14 (Application Considerations); Google SRE Book ch.27 (Reliable Product Launches at Scale)
07 — Rollback playbook	R ch.15 (Cluster Operations); Velero restore + Postgres PITR (CNPG) + AWS S3 versioning docs (below)	KP ch.3 (Declarative Deployment); A ch.7 (Sync, Diff, Hooks, Waves)
08 — Feature flags and dark launches	R ch.14 (Application Considerations); OpenFeature + Flagsmith / LaunchDarkly / Unleash docs (below)	D (microservices feature toggle shape); Charity Majors "test in production" canon (below)
09 — Hotfix workflow and breakglass	Google SRE Book ch.14 (Managing Incidents); AWS CloudTrail + IAM breakglass docs (below)	R ch.15 (Cluster Operations); R ch.7 (Secret Management) (post-incident rotation)
10 — Incident response & on-call	Google SRE Book ch.11 (Being On-Call) + ch.14 (Managing Incidents) + ch.15 (Postmortem Culture); PagerDuty + Incident.io / FireHydrant / Rootly docs (below)	Camille Fournier, The Manager's Path (on-call as a team practice) (below)
11 — Day-to-day production operations	Google SRE Book ch.16 (Tracking Outages) + ch.17 (Testing for Reliability); AWS Well-Architected Operational Excellence pillar (below)	KP ch.2 (Predictable Demands); R ch.13 (Autoscaling)
12 — Capstone: the first 90 days running production	Google SRE Book ch.1 (Introduction) + ch.32 (The Evolving SRE Engagement Model); Camille Fournier, The Manager's Path (below)	R ch.1 (A Path to Production)

Official docs: - HashiCorp Vault — https://developer.hashicorp.com/vault/docs · Vault Kubernetes auth method — https://developer.hashicorp.com/vault/docs/auth/kubernetes · Vault dynamic database secrets — https://developer.hashicorp.com/vault/docs/secrets/databases - External Secrets Operator — https://external-secrets.io/latest/ · ClusterSecretStore / SecretStore — https://external-secrets.io/latest/api/clustersecretstore/ · ExternalSecret CRD — https://external-secrets.io/latest/api/externalsecret/ - Argo CD ApplicationSet — https://argo-cd.readthedocs.io/en/stable/operator-manual/applicationset/ · Cluster generator — https://argo-cd.readthedocs.io/en/stable/operator-manual/applicationset/Generators-Cluster/ - Argo Rollouts — https://argoproj.github.io/argo-rollouts/ · AnalysisTemplate — https://argoproj.github.io/argo-rollouts/features/analysis/ · canary strategy — https://argoproj.github.io/argo-rollouts/features/canary/ · blue-green strategy — https://argoproj.github.io/argo-rollouts/features/bluegreen/ - OpenFeature — https://openfeature.dev/docs/reference/intro · OpenFeature spec — https://openfeature.dev/specification/ - Flagsmith — https://docs.flagsmith.com/ · LaunchDarkly — https://docs.launchdarkly.com/ · Unleash — https://docs.getunleash.io/ - PagerDuty (Alertmanager integration) — https://support.pagerduty.com/main/docs/services-integrations · PagerDuty Events API — https://developer.pagerduty.com/docs/events-api-v2/overview/ - Incident.io — https://incident.io/docs · FireHydrant — https://docs.firehydrant.com/ · Rootly — https://rootly.com/docs - Atlassian Statuspage — https://support.atlassian.com/statuspage/ · Statuspage incident communication — https://www.atlassian.com/incident-management/handbook/incident-communication - Velero restore — https://velero.io/docs/main/restore-reference/ · selective restore — https://velero.io/docs/main/resource-filtering/ - Postgres point-in-time recovery (CloudNativePG) — https://cloudnative-pg.io/documentation/current/recovery/ · backup config — https://cloudnative-pg.io/documentation/current/backup/ - AWS S3 versioning — https://docs.aws.amazon.com/AmazonS3/latest/userguide/Versioning.html · S3 Batch Operations restore — https://docs.aws.amazon.com/AmazonS3/latest/userguide/batch-ops.html - AWS CloudTrail (audit-log immutability) — https://docs.aws.amazon.com/awscloudtrail/latest/userguide/cloudtrail-log-file-integrity-validation.html · CloudTrail log file integrity validation — https://docs.aws.amazon.com/awscloudtrail/latest/userguide/cloudtrail-log-file-integrity-validation-enabling.html - AWS Well-Architected Framework — Operational Excellence pillar — https://docs.aws.amazon.com/wellarchitected/latest/operational-excellence-pillar/welcome.html - Prometheus Alertmanager inhibition + routing — https://prometheus.io/docs/alerting/latest/alertmanager/#inhibition

Standout articles: - Google SRE Book — ch.11 "Being On-Call", ch.14 "Managing Incidents", ch.15 "Postmortem Culture: Learning from Failure" — https://sre.google/sre-book/being-on-call/ · https://sre.google/sre-book/managing-incidents/ · https://sre.google/sre-book/postmortem-culture/ - Charity Majors — "Test in production" canon — https://charity.wtf/2017/07/03/test-in-production-yo/ · "I test in prod" — https://increment.com/testing/i-test-in-production/ - Camille Fournier, The Manager's Path (O'Reilly) — the canonical reference on on-call as a team practice, tech-lead transitions, and incident postmortem culture — https://www.oreilly.com/library/view/the-managers-path/9781491973882/ - "Build, deploy, run: a Charity Majors framework for production excellence" — https://charity.wtf/category/production/ - "The 24-hour postmortem" — Incident.io — https://incident.io/blog/post-mortems-and-learning

How to go deeper (a path through the library)¶

Get the model first. Read Lukša 2e ch.1–9 alongside Parts 00–03 of this guide — it is the deepest "how it works" within MEAP scope and tracks the guide's foundations/workloads/config arc almost chapter-for-chapter. Poulton is the gentler on-ramp if Part 00 feels fast.
Add the "why" lens. Read Kubernetes Patterns 2e by pattern as each one appears (Health Probe → Part 01 ch.02; Automated Placement → Part 04; Process Containment / Access Control → Part 05; Controller/Operator → Part 08 ch.05). It explains the design rationale the guide applies.
Go to production. Read Production Kubernetes for Parts 05–08 — security hardening, observability, delivery, day-2, multitenancy, supply chain. It is the guide's primary source for the production arc.
Delivery & GitOps. Read Argo CD Up & Running end-to-end for Part 07 ch.04 (it is effectively the sole reference there) and Bootstrapping Microservices for the CI/CD pipeline shape (Part 07 ch.03).
Then breadth — the CNCF landscape. The cloud-native ecosystem is far wider than this guide (service mesh, eBPF, policy, cost, FinOps, platform engineering). Use the CNCF Cloud Native Landscape (https://landscape.cncf.io/) and the CNCF site (https://www.cncf.io/) to see where each tool the guide uses (Prometheus, OpenTelemetry, Argo, Helm, Kustomize, Kyverno, KEDA, Cilium, CloudNativePG, …) sits, and what graduated/incubating alternatives exist.
Certify (optional). Appendix E — Learning paths maps this guide's chapters to the CKAD / CKA / CKS exam domains and gives ordered study tracks.

See also: Appendix E — Learning paths for ordered study routes through the guide, and each chapter's own "Further reading" section for the precise per-chapter citation and official URL.