Bookstore ML — the "recommendations" thread (Part 12)¶
This tree is the worked example for Part 12 — Kubernetes for Machine
Learning. It is additive: it does not modify the Bookstore app
(../app/), the canonical manifests (../raw-manifests/, ../helm/,
../kustomize/), or any earlier examples/bookstore/* tree. Everything here is
new and lives in its own PSA-restricted namespace.
The use case¶
A recommendations model: item-to-item "customers who bought X also bought Y", derived from the Bookstore's own data.
- Dataset — the Bookstore's
catalogandordersdata (the real schema the app uses; see../app/catalog/main.goand../app/orders/main.go). Synthetic and generated, not shipped — seedataset/README.md. - Model — item co-occurrence / item-kNN. For each pair of books, count how often they appear in orders "together" (per customer/basket proxy), normalise (cosine / Jaccard over the co-occurrence matrix), and keep the top-K neighbours per book. This is a classic, tiny, deterministic recommender computed with NumPy/scikit-learn-class libraries — no deep learning, no GPU required.
- Why deliberately tiny — so the entire train → register → serve path runs CPU-only on kind, with zero GPU. GPUs are introduced (Part 12 ch.02) and gang/queue scheduling (ch.03) using this same model as the "now scale training up" example — honestly marked: those steps need a real GPU node pool, and a CPU fallback that genuinely runs is always provided.
The plan (the MLOps loop, mapped to this tree)¶
data ──► prep/train ──► evaluate ──► register ──► serve ──► monitor ──► retrain
│ │ │
└ catalog └ co-occurrence + top-K └ tiny HTTP API: GET
+ orders (NumPy; CPU; 2-worker "gang" /recommend?book_id=…
(synthetic) demo in ch.03) → top-K book ids
| Dir | Phase | Status |
|---|---|---|
dataset/ |
data shape (synthetic, generated) | this phase (X3a) — stub |
gpu/ |
a restricted GPU training Pod (the scale-up path) | this phase (X3a) — ch.02 |
batch/ |
Kueue queue/flavor + a JobSet "training" (CPU, gang) | this phase (X3a) — ch.03 |
train/ |
the real CPU training Job + model artifact | X3b (forward-ref) |
serve/ |
the recommendations serving Deployment/KServe | X3b (forward-ref) |
pipeline/ |
the whole loop as a pipeline | X3c (forward-ref) |
The namespace contract (read this before applying anything)¶
Everything here targets the bookstore-ml namespace, which is labelled
PSA enforce: restricted (plus audit + warn), exactly like the app's
bookstore namespace (Part 05 ch.02).
kubectl create namespace bookstore-ml
kubectl label namespace bookstore-ml \
app.kubernetes.io/part-of=bookstore-ml \
pod-security.kubernetes.io/enforce=restricted \
pod-security.kubernetes.io/enforce-version=latest \
pod-security.kubernetes.io/audit=restricted \
pod-security.kubernetes.io/audit-version=latest \
pod-security.kubernetes.io/warn=restricted \
pod-security.kubernetes.io/warn-version=latest --overwrite
Every Pod in this tree is restricted-compliant: runAsNonRoot: true, a
non-root runAsUser, allowPrivilegeEscalation: false,
capabilities.drop: ["ALL"], seccompProfile.type: RuntimeDefault, and only
restricted-allowed volume types (emptyDir, configMap, secret,
persistentVolumeClaim, projected, downwardAPI). This matters: many
GPU/ML/notebook base images default to root and would be rejected by
PSA — Part 12 ch.02 teaches the compliant securityContext that still works on
a CUDA image. ML pods are not exempt from Pod Security.
CRD-backed manifests here¶
gpu/ is a plain Job (built-in — dry-runs cleanly). batch/ contains
CRD-backed objects (Kueue ResourceFlavor/ClusterQueue/LocalQueue,
JobSet JobSet). Each such file's header carries the documented
CRD-intrinsic note (identical precedent to the guide's
raw-manifests/51-, 70-, 83-, the argocd/, operators/, chaos/ files):
a client dry-run prints no matches for kind "…" until the operator/CRDs are
installed; the schema is correct. Install Kueue/JobSet via pinned Helm
(own namespaces) as the chapter shows — never
kubectl apply -f .../releases/latest/download/<FILE>.yaml.
Operators and accelerators install elsewhere¶
The NVIDIA GPU Operator, Kueue, JobSet controller, and Volcano install into their own system namespaces via pinned Helm charts (or the project's pinned, documented installer) — see Part 12 ch.02 (GPU Operator) and ch.03 (Kueue/JobSet). They are infrastructure, not part of this tree.