Index
# Bookstore ML — pipeline/ (Part 12 ch.07)
The worked example for Part 12 ch.07 — ML pipelines and workflows:
the recommendations train -> eval -> register -> promote loop modelled
as an Argo Workflows WorkflowTemplate plus a CronWorkflow for
nightly retraining and an Argo Events EventSource + Sensor pair
for event-driven retraining.
This tree is additive: it does not modify the Bookstore app
(../../app/), the canonical manifests (../../raw-manifests/,
../../helm/, ../../kustomize/), any earlier examples/bookstore/* tree,
or the earlier examples/bookstore/ml/{dataset,gpu,batch,train,serve,notebook}/
trees. Everything here is new, in the bookstore-ml PSA-restricted
namespace, and reuses the same images built by ../train/ and
../serve/ — so the pipeline orchestrates a real loop end-to-end.
Files¶
| File | Kind | Built-in? | Purpose |
|---|---|---|---|
recommender-workflow.yaml |
WorkflowTemplate + RBAC |
mixed (built-in SA/Role/Binding + CRD) | the reusable train -> eval -> register -> promote DAG |
recommender-cronworkflow.yaml |
CronWorkflow |
CRD | nightly retraining (0 2 * * * UTC) |
recommender-eventsource.yaml |
EventSource |
CRD | webhook-triggered retraining (Argo Events) |
recommender-sensor.yaml |
Sensor + RBAC |
mixed (CRD + SA/Role/Binding) | turns an event into a Workflow from the template |
register-cm-template.yaml |
ConfigMap |
built-in | the shape of the registry stamp the register step writes at runtime |
CRD-backed manifests in this tree¶
All Argo-Workflows / Argo-Events objects (WorkflowTemplate, CronWorkflow,
EventSource, Sensor) carry the documented CRD-intrinsic header note
— identical precedent to the guide's raw-manifests/51-, 70-, 83-, the
argocd/, operators/, chaos/ files, and ml/batch/, ml/serve/,
ml/train/. A client dry-run without the operator installed prints
no matches for kind "..."; the schema is correct, and the chapter
walks the pinned-Helm install.
The built-in ConfigMap (register-cm-template.yaml) and the SA/Role/
RoleBinding triples inside recommender-workflow.yaml and
recommender-sensor.yaml dry-run cleanly anywhere.
Install the operators (pinned; own namespaces)¶
# Argo Workflows (workflow controller + server + executor) into ns `argo`.
helm repo add argo https://argoproj.github.io/argo-helm
ARGO_WORKFLOWS_VERSION="0.42.0" # bump deliberately; chart != app version.
helm install argo-workflows argo/argo-workflows \
--version "$ARGO_WORKFLOWS_VERSION" \
-n argo --create-namespace --wait
# Argo Events (controller + EventBus + EventSource/Sensor controllers).
ARGO_EVENTS_VERSION="2.4.7"
helm install argo-events argo/argo-events \
--version "$ARGO_EVENTS_VERSION" \
-n argo-events --create-namespace --wait
# Argo Events needs a default EventBus in the namespace where Sensors run.
kubectl apply -n argo-events -f - <<'EOF'
apiVersion: argoproj.io/v1alpha1
kind: EventBus
metadata: { name: default }
spec:
jetstream:
# pinned NATS JetStream version (controller-managed image, not a Helm chart)
# — bump together with the Argo Events chart version
version: "2.10.11"
EOF
Pinned
--versionflags only; neverreleases/latest/download/<FILE>.yaml.
Apply¶
# 1) Prereq from earlier chapters — bookstore-ml namespace (ch.01) +
# the recommender-model PVC + model.joblib (../train/, ch.04).
kubectl apply -f examples/bookstore/ml/train/recommender-train-job.yaml
kubectl wait --for=condition=complete job/recommender-train \
-n bookstore-ml --timeout=300s
# (optional but expected for `promote`) the serving Deployment from ch.06:
kubectl apply -f examples/bookstore/ml/serve/recommender-deployment.yaml
kubectl apply -f examples/bookstore/ml/serve/recommender-service.yaml
# 2) The WorkflowTemplate + the per-namespace SA/Role/Binding.
kubectl apply -f examples/bookstore/ml/pipeline/recommender-workflow.yaml
# 3) Run the pipeline once (interactive; requires the `argo` CLI):
argo submit --from workflowtemplate/recommender-pipeline -n bookstore-ml
argo list -n bookstore-ml
argo logs -n bookstore-ml @latest
argo get -n bookstore-ml @latest
# 4) Schedule nightly retraining (the CronWorkflow):
kubectl apply -f examples/bookstore/ml/pipeline/recommender-cronworkflow.yaml
kubectl get cronworkflow -n bookstore-ml
# 5) Event-driven retraining (the EventSource + Sensor, in `argo-events`):
kubectl apply -f examples/bookstore/ml/pipeline/recommender-eventsource.yaml
kubectl apply -f examples/bookstore/ml/pipeline/recommender-sensor.yaml
# Trigger it with a POST from inside the cluster (uses the EventSource Pod's
# Service, default port 12000):
kubectl run -n argo-events curl-once --rm -it --restart=Never \
--image=curlimages/curl:8.10.1 --command -- \
curl -X POST -H 'content-type: application/json' \
-d '{"dataset_uri":"pvc://recommender-model/"}' \
http://recommender-dataset-eventsource-svc:12000/recommender-dataset-ready
What each step does (and what's the honest proxy)¶
| Step | Action | KIND-runnable proxy | Real prod shape |
|---|---|---|---|
| train | Runs bookstore/recommender-train:dev (../train/Dockerfile); writes model.joblib to PVC recommender-model. |
Same Job, same image, same artifact as ../train/recommender-train-job.yaml. |
A Training Operator PyTorchJob / RayJob (../train/recommender-pytorchjob.yaml, ../train/recommender-rayjob.yaml) on a GPU node pool. |
| eval | Loads model.joblib, computes average top-1 cosine similarity over all books, gates on --min-score (default 0.05). |
A script step in the SAME train image (sklearn + joblib already baked). Writes metrics.json to the PVC + emits score as an Argo output parameter. |
A real offline metric (NDCG@k, MRR, AUC, …) computed against a held-out set, gated on a domain-relevant SLO. |
| register | Stamps (model_uri, score, registered_at, workflow) into a ConfigMap recommender-model-registry-<WORKFLOW>. |
kubectl create configmap from inside the workflow Pod, namespace-scoped RBAC, illustrated by register-cm-template.yaml. |
An MLflow Model Registry entry, a KFP Model Registry record, or an OCI artifact pushed to a registry — see Part 12 ch.07 + ch.08. |
| promote | Annotates the recommender Deployment + rollout restarts it so the serving Pod re-loads the new model.joblib from the PVC. |
kubectl annotate + kubectl rollout restart deploy/recommender. |
A GitOps commit (Part 07 ch.04) that bumps the InferenceService storageUri to a new versioned URI; Argo CD reconciles; KServe shifts traffic via canaryTrafficPercent (Part 12 ch.06). |
How the pieces fit together with the rest of ml/¶
../dataset/ schema (synthetic)
│
▼
../train/ image bookstore/recommender-train:dev ─┐
│ writes model.joblib to PVC │
▼ │
★ pipeline/recommender-workflow.yaml ──> argo ───┤ ← THIS DIR
★ pipeline/recommender-cronworkflow.yaml │
★ pipeline/recommender-eventsource.yaml + sensor.yaml│
│ │
▼ ▼
../serve/ image bookstore/recommender-serve:dev
loads SAME model.joblib (Deployment OR KServe InferenceService)
PSA, RBAC, and other invariants¶
- PSA-
restrictedon every workflow Pod — pod-levelsecurityContext(runAsNonRoot, non-root UID 65532, seccomp RuntimeDefault) applied via theWorkflowTemplate.spec.podSpecPatch; container-levelsecurityContext(allowPrivilegeEscalation:false, readOnlyRootFilesystem, drop ALL caps) on every step's container. The Argo Workflows controller lives in the operator's OWN namespace (argo); the workflow Pods live inbookstore-ml. - RBAC scoped to one namespace: the
argo-workflowSA can manage workflow Pods, ConfigMaps, and Deployments inbookstore-mlonly — no cluster-wide rights, no secrets access. - No machine-specific paths / users — image refs are
bookstore/recommender-train:devandbookstore/recommender-serve:dev(the same convention as../train/and../serve/); replace with a registry-pushed tag in prod.
Honest "not built here"¶
See Part 12 ch.07/ch.08 for the deliberate scope boundaries (data versioning, feature stores, model explainability, drift detection, federated learning, multi-cluster training topologies, and the Kubeflow distribution itself).