06 — The declarative API model¶
The principle the entire system runs on: objects with
spec(desired) andstatus(observed), reconciled forever; howapplymerges intent; and labels/selectors as the glue — applied by writing the first Bookstore manifest.
Estimated time: ~15 min read · ~30 min hands-on
Prerequisites: Part 00 ch.03 — the components that reconcile · Part 00 ch.04 — how reconciliation actually runs
You'll know after this: • distinguish spec (desired) from status (observed) · • explain what kubectl apply does and how server-side apply merges fields · • use labels and selectors as the universal glue between objects · • write a minimal valid Kubernetes manifest by hand · • predict what a controller will do when you change a spec
Why this exists¶
You now know the components (ch.03) and their
internals (ch.04,
ch.05). This chapter is the conceptual keystone:
every later chapter is "here is another object, its spec, and the controller
that reconciles it". If the declarative model is solid, the other 40+ chapters
are variations on a theme. If it's fuzzy, you'll keep asking the wrong question
("what command makes it do X?") instead of the right one ("what desired state
do I declare, and what reconciles it?"). It's also the foundation of GitOps
(Part 07): if the cluster is "whatever's
in Git, continuously enforced", you must first understand why declaring state
even works.
Mental model¶
You don't operate Kubernetes; you describe what should be true and let it keep it true. Each thing in the cluster is an object with two halves:
spec— desired state, written by you (or a controller). "What I want."status— observed state, written by the controller/kubelet that owns it. "What is actually true."
A controller's entire job is a loop: read spec, observe reality, compute the
difference, act to shrink it, write status, repeat — forever. You changed
a number in a file; the system continuously bends reality to match. That's it.
Pods, Deployments, Services, Secrets, your own CRDs — all the same shape.
The reconciliation principle (level-triggered)¶
flowchart LR
spec["spec
(desired state,
you declare)"]
obs["OBSERVE
read current
actual state"]
diff["DIFF
desired − actual
= the gap"]
act["ACT
create / update /
delete to converge"]
status["status
(observed state,
controller writes)"]
spec --> diff
obs --> diff
diff -->|gap ≠ 0| act
act --> obs
diff -->|gap = 0| status
status -. periodic resync re-runs the loop anyway .-> obs
The defining property: Kubernetes controllers are level-triggered, not edge-triggered.
- Edge-triggered = react to an event ("a Pod was deleted") and then forget. Miss the event (controller was down, network blip) → state is wrong forever.
- Level-triggered = repeatedly look at the current level ("I want 3, I see 2") and correct the gap, regardless of how it arose. Missed events don't matter — the next observation re-derives the truth. Periodic resyncs re-run the loop even when nothing changed, as a correctness backstop.
This single choice is why Kubernetes self-heals: "heal" is not a special path
— a crashed Pod simply makes observed < desired, and the next loop iteration
closes the gap exactly as it would for any other cause. It's also why declaring
state is safe: you assert the destination, not the route, so the system
recovers correctly from any deviation, not just the ones someone anticipated.
("Imperative vs. declarative" from ch.01 is this,
mechanized.)
Anatomy of an object: GVK, metadata, spec, status¶
Every Kubernetes object has the same top-level skeleton:
Object
├── apiVersion: <GROUP>/<VERSION> ── e.g. apps/v1, v1 (core group = "")
├── kind: <Kind> ── e.g. Pod, Deployment, Service
│ (apiVersion + kind == GVK: Group / Version / Kind — identifies the type)
├── metadata
│ ├── name ── unique within (namespace, kind)
│ ├── namespace ── tenancy/scope boundary (namespaced kinds only)
│ ├── labels ── identifying key/values → SELECTED by other objects
│ ├── annotations ── non-identifying metadata (tools, docs, config)
│ ├── uid ── server-assigned unique id (survives name reuse)
│ ├── resourceVersion ── changes on every write → optimistic concurrency
│ └── ownerReferences ── parent object (drives cascading delete / GC)
├── spec ── DESIRED state — you write this
└── status ── OBSERVED state — the controller/kubelet writes this
- GVK (Group/Version/Kind) =
apiVersion+kind. It names the type and routes the request to the controller and storage that own it.Groupversions the API surface (apps/v1,networking.k8s.io/v1); the empty group is "core" (apiVersion: v1→ Pod, Service, ConfigMap). The instance is then identified by(group, kind, namespace, name). metadatacarries identity and the cross-object glue.labelsare how objects find each other (next section);annotationsare free-form non-selecting data;ownerReferencesmake deletion cascade (delete a ReplicaSet → its Pods are garbage-collected).specvsstatusis the universal divide and the contract of the whole system: humans/controllers writespec; the owning controller writesstatus. You will almost never writestatus.
kubectl explain <KIND>[.spec.<FIELD>] prints this schema for any kind,
straight from the API server — it is the authoritative, version-correct
reference and you should reach for it constantly.
resourceVersion & optimistic concurrency¶
Many controllers (and you) may try to update the same object concurrently. Kubernetes uses optimistic concurrency, not locks:
- Every object's
metadata.resourceVersionreflects etcd's revision at its last write. - An update must carry the
resourceVersionit read. The API server commits it only if that version is still current (compare-and-swap). If someone else wrote in between, your write is rejected with aConflict(409) and you must re-read and retry.
Effect: concurrent writers can't silently clobber each other; the loser simply re-observes and re-reconciles (which, being level-triggered, is harmless and expected). This is the low-level mechanism that makes many independent controllers operating on shared state correct. You'll see it surface as the occasional "the object has been modified; please apply your changes to the latest version" — that's optimistic concurrency working, not a bug.
kubectl apply: 3-way merge & server-side apply¶
There are three ways to change objects; the difference matters for production:
- Imperative command (
kubectl create,kubectl scale,kubectl edit) — a one-off action. Fine for learning/debugging; no record of intent, so it drifts. Used deliberately in ch.07 to learn. kubectl apply(declarative) — you keep the desired state in files and re-apply them.applydoesn't blindly overwrite: it computes a merge so it only changes the fields you manage, leaving fields owned by controllers (e.g. a Deployment's replica count managed by an HPA) untouched.- Server-Side Apply (SSA) — the modern mechanism: each field has a
recorded manager (
metadata.managedFields). The API server merges based on field ownership and reports a conflict if two managers fight over the same field. This is what makes "Git is the source of truth, but a controller also edits this object" safe, and it's foundational to GitOps tooling.
Classic client-side apply does a 3-way merge between (1) your new
manifest, (2) the live object, and (3) the last-applied configuration it
stored as an annotation — so it can tell "the user removed this field" (delete
it) from "a controller added this field" (keep it). Mental picture:
last-applied (what you declared before)
\
>── 3-way merge ──► patch that adds/updates your fields,
/ removes fields you dropped,
live object (cluster now) and leaves controller-owned fields alone
\
your new manifest (what you declare now)
In production: keep all manifests in Git and
applythem (ideally via a GitOps controller, Part 07). Avoid mixing ad-hoc imperative edits withapplyon the same objects — that's how "works on the cluster but not in Git" drift starts. Prefer SSA semantics so field ownership is explicit.
etcd as the source of truth¶
Tie it together with ch.04: the spec you
declare and the status controllers observe both live in etcd, written
only through the API server. "Declaring desired state" concretely means
persisting an object in etcd via the API server; "reconciliation" means
controllers watching etcd (through the API server) and acting until status
matches spec. The cluster has exactly one authoritative copy of intent — which
is why backing up etcd is backing up the cluster, and why a single API server
pipeline can enforce all policy. The declarative model isn't a convention bolted
on; it's a direct consequence of "one consistent store + watchers".
Labels, selectors, and annotations: the glue¶
Kubernetes objects are deliberately loosely coupled: they refer to each other by label selectors, not by name or pointer. This is how the system is composed.
- Labels — identifying key/value pairs in
metadata.labels(app=catalog,tier=backend,version=1.4). Meant for selection and grouping. - Selectors — queries over labels. A Service routes to "Pods where
app=catalog"; a ReplicaSet owns "Pods whereapp=catalog"; an HPA scales "the Deployment with these labels". The selected objects don't know they were selected — coupling is by matching, late-bound and dynamic. (Add a Pod withapp=catalogand a matching Service immediately starts sending it traffic.) - Annotations — also key/value in
metadata.annotations, but non-identifying: free-form metadata for tools and humans (build SHA, change-cause, ingress/controller config, checksums). You cannot select on annotations. Rule of thumb: if something selects on it, it's a label; otherwise it's an annotation.
This selector-based wiring is why you can swap, scale, and roll workloads without re-pointing references — the references resolve by label match every time. Almost every "why isn't my Service hitting my Pods?" bug is a label/selector mismatch.
Hands-on with the Bookstore: the first real manifest¶
Time to declare something. We write the first Bookstore manifest:
catalog as a single bare Pod. (A bare Pod is not how you'd run it in
production — it isn't self-healed by a controller; that's the whole point of
Deployments in Part 01.
Here it's the minimal object that demonstrates the declarative model and gives
you something to run in ch.07.)
This file is created at
examples/bookstore/raw-manifests/01-catalog-pod.yaml:
apiVersion: v1 # core API group, version v1
kind: Pod # GVK = (core, v1, Pod)
metadata:
name: catalog # unique within (namespace, kind)
labels: # identifying — what selectors will match later
app: catalog # the canonical label other objects will select on
component: backend
spec: # DESIRED state (you write this)
containers:
- name: catalog # container name (unique within the Pod)
image: bookstore/catalog:dev # the image built in ch.02
imagePullPolicy: IfNotPresent # use the locally loaded image (ch.07 kind load)
ports:
- name: http
containerPort: 8080 # the port the Go app listens on
env:
- name: PORT # the app reads PORT (defaults to 8080)
value: "8080"
# (no status: here — the kubelet/controllers write status, never you)
Every field, and why it is exactly this:
apiVersion: v1/kind: Pod— the GVK.Podis in the core group, soapiVersionis justv1(no group prefix). Together they tell the API server which schema to validate against and which controller/kubelet path owns it.metadata.name: catalog— the object's identity within its namespace. Re-applying this same file updates this object (it's matched by name+kind), it does not create a second one — the essence of declarative.metadata.labels—app: catalogis the label the future Service, ReplicaSet, NetworkPolicy, and HPA will select on. Setting it correctly now is what lets later chapters wire things to this Pod without editing this Pod.component: backendis an extra grouping dimension.spec— desired state, authored by you. There is nostatus:block: status is observed and written by the kubelet (ch.05) — Pod phase, container states, theReadycondition. Trying to set it yourself is meaningless.spec.containers[0].name: catalog— names the container (used in logs,kubectl exec -c, probes). Unique within the Pod.image: bookstore/catalog:dev— the exact image built in ch.02. (:devis a mutable tag — fine for local learning; production pins by digest, per ch.02 production notes.)imagePullPolicy: IfNotPresent— critical for the local workflow: pull only if the image isn't already on the node. In ch.07 youkind loadthe image onto the node, so the kubelet must not try to fetchbookstore/catalog:devfrom a registry (it doesn't exist in one). With a:latesttag the default would beAlwaysand the Pod would failImagePullBackOff— this is the single most common first-Pod mistake, pre-empted here.ports.containerPort: 8080— informational: documents that the container listens on 8080 (the app's defaultPORT). It does not "open" anything by itself, but it's good practice and named (http) so a Service/probe can reference it by name later.env.PORT="8080"— thecatalogsource readsPORT(defaulting to 8080). Set explicitly so the desired state is unambiguous and self-documenting — exactly the Predictable Demands idea: declare what the workload needs rather than rely on implicit defaults.
This Pod is minimal on purpose: no replicas (a bare Pod has no controller restoring it — covered in Part 01), no probes yet (Part 01 ch.02), no resource requests yet (Part 01 ch.03). Each later chapter adds a field to a Bookstore manifest and explains it; this is the seed. You apply it to a real cluster in ch.07.
Validate the manifest now (client-side — no cluster required), so the file is known-good before ch.07:
# from the repo root (full-guide/)
kubectl apply --dry-run=client -f \
examples/bookstore/raw-manifests/01-catalog-pod.yaml
# → pod/catalog created (dry run)
(--dry-run=client parses and locally validates the object without contacting
a cluster; --dry-run=server would additionally run the API server's
admission/validation pipeline from ch.04 — same
gauntlet, no persistence.)
How it works under the hood¶
spec/statussplit is enforced, not just convention. Many resources have a separate/statussubresource with its own RBAC; controllers update status without being able to mutate spec, and vice-versa. The divide is real at the API level.- Watches + level-triggering = robustness. Controllers watch from a
resourceVersionand reconcile on each change and on a periodic resync. A controller can crash for an hour; on restart it lists current state and converges — nothing was "lost", because correctness derives from current level, not from a stream of events. managedFieldsmakes co-ownership safe. Server-Side Apply records which manager owns each field. That's how an HPA can ownspec.replicaswhile you own the rest of the Deployment via Git, with conflicts surfaced instead of silently lost — the technical basis of GitOps coexisting with controllers.- Selectors are evaluated continuously. "Service → Pods with
app=catalog" isn't resolved once; the EndpointSlice controller (ch.04) re-evaluates the selector as Pods come and go. Loose coupling is dynamic, which is what makes scaling and rollouts transparent to callers.
Production notes¶
In production: manifests are code — in Git, reviewed, CI-validated (
--dry-run=server,kubeconform, policy checks), and applied by a GitOps controller so the cluster is the repo, continuously reconciled (Part 07). This is just the declarative model extended one level: Git is the declared state, Argo CD/Flux is the controller.In production: adopt consistent labels (the
app.kubernetes.io/*recommended set:name,instance,version,component,part-of,managed-by). Selectors, dashboards, cost allocation, NetworkPolicies, and rollouts all hang off labels — inconsistent labels make the cluster unobservable and unsegmentable.In production: never hand-edit live objects that a controller or GitOps owns (
kubectl editon a managed Deployment). It creates drift and SSA conflicts and will be reverted; change the source, not the instance.In production: a bare Pod is an anti-pattern for real workloads — it is not rescheduled if its node dies and not replaced if it crashes permanently. Always front workloads with a controller (Deployment/StatefulSet/ DaemonSet/Job). This chapter's bare Pod exists solely to teach the object model; it graduates to a Deployment in Part 01.
Quick Reference¶
kubectl explain <KIND> # schema of a kind (authoritative)
kubectl explain <KIND>.spec.<FIELD> --recursive # drill into spec fields
kubectl apply --dry-run=client -f f.yaml # local parse + validate (no cluster)
kubectl apply --dry-run=server -f f.yaml # run apiserver admission/validation
kubectl apply -f f.yaml # declarative create/update (3-way merge)
kubectl apply --server-side -f f.yaml # Server-Side Apply (field ownership)
kubectl get <KIND> <NAME> -o yaml # see spec AND controller-written status
kubectl get pods -l app=catalog # select by label (the glue in action)
kubectl get pods --show-labels # inspect labels
kubectl diff -f f.yaml # what apply *would* change
Universal object skeleton (every kind you'll ever write fits this):
apiVersion: <GROUP>/<VERSION> # GVK (core group => just a version, e.g. v1)
kind: <Kind>
metadata:
name: <NAME> # identity within (namespace, kind)
labels: { app: <NAME> } # identifying → what other objects select on
annotations: { } # non-identifying metadata (not selectable)
spec: { } # DESIRED state — you author this
# status is OBSERVED — written by the owning controller/kubelet, never by you
Declarative-model checklist:
- You write
speconly; never setstatus - Objects identified by GVK + name; re-applying updates, doesn't duplicate
- Consistent, intentional labels; cross-object wiring is by selector
-
imagePullPolicycorrect for how the image reaches the node - Manifests in Git, validated in CI, applied declaratively (GitOps in prod)
- Real workloads run under a controller — bare Pods only for learning
Test your understanding¶
Try each before opening the answer drawer. The act of trying is the exercise; the answer is the check.
-
Explain why writing
status:in your manifest is meaningless. Where does status come from, and how does the API enforce the split?
Show answer
`status` is written by the owning controller/kubelet to record observed reality, not declared by you. Most resources expose status as a separate `/status` subresource with distinct RBAC, so a controller may update status without mutating spec and vice-versa — the divide is enforced at the API level, not by convention (see §How it works under the hood, spec/status enforcement). -
A teammate's Service has
selector: { app: catalog }but no traffic reaches Pods labeledapp: Catalog(note capitalization). Why does this silently fail, and what's the general lesson about late-bound coupling?
Show answer
Labels are case-sensitive strings and selectors match by exact equality — `Catalog` ≠ `catalog`, so the selector finds zero Pods and the EndpointSlice is empty. There's no error because objects are loosely coupled by selector; the Service doesn't know the Pods exist. Almost every "Service hits no Pods" bug is a label/selector mismatch (see §Labels, selectors, and annotations: the glue). -
An HPA owns
spec.replicasof a Deployment while you keep the rest of the manifest in Git and re-apply it. Why doesn't yourapplyoverwrite the HPA's replica count, and what would change if you switched to Server-Side Apply?
Show answer
Classic `apply` does a 3-way merge with the last-applied annotation, so it only changes fields *you* manage; the replica field set by the HPA is left alone because it isn't in your last-applied. Server-Side Apply makes this explicit via `managedFields`: the HPA is the recorded manager of `spec.replicas`, and if your manifest tried to set it you'd get a Conflict instead of silent overwrite (see §`kubectl apply`: 3-way merge & server-side apply). -
You apply a manifest, then someone runs
kubectl editand changes a field. You re-apply your unchanged manifest. What happens to their edit, and why?
Show answer
If their edit changed a field that's in your last-applied (managed by you), `apply`'s 3-way merge will revert it back to your manifest's value — the edit looks like drift from your declared state. If they added a field not in your last-applied (and not removed by you), `apply` leaves it alone. This is why ad-hoc `kubectl edit` on GitOps-managed objects causes drift bugs (see §Production notes, "never hand-edit live objects"). -
Hands-on extension: take the
01-catalog-pod.yamlfrom this chapter and runkubectl apply --dry-run=server -fagainst a kind cluster. Then changeimagePullPolicytoAlwaysand re-run. What does each invocation tell you that--dry-run=clientwould not?
What you should see
`--dry-run=server` runs the full API server pipeline — authN, authZ, admission webhooks, schema and field validation — without persisting. You'll see policy rejects, defaulting (e.g., a `Never`/`Always` value getting normalized), and conflicts with admission webhooks if any. `--dry-run=client` only parses YAML locally and won't catch admission-time policy or schema rules added by webhooks (see §Hands-on with the Bookstore, dry-run discussion).
Further reading¶
- Lukša, Kubernetes in Action 2e, ch.4 — "Introducing Kubernetes API
objects" — GVK, metadata/spec/status, the declarative model, labels and
selectors, and how
applyworks. - Ibryam & Huß, Kubernetes Patterns 2e — Declarative Deployment (ch.3) and Predictable Demands (ch.2): why declaring desired state and explicit requirements is the cloud-native way (the principle behind this chapter's manifest).
- Official: https://kubernetes.io/docs/concepts/overview/working-with-objects/ (objects, spec/status, labels & selectors) and https://kubernetes.io/docs/reference/using-api/server-side-apply/ (Server-Side Apply / field management).