Skip to content

Index

# Bookstore ML — pipeline/ (Part 12 ch.07)

The worked example for Part 12 ch.07 — ML pipelines and workflows: the recommendations train -> eval -> register -> promote loop modelled as an Argo Workflows WorkflowTemplate plus a CronWorkflow for nightly retraining and an Argo Events EventSource + Sensor pair for event-driven retraining.

This tree is additive: it does not modify the Bookstore app (../../app/), the canonical manifests (../../raw-manifests/, ../../helm/, ../../kustomize/), any earlier examples/bookstore/* tree, or the earlier examples/bookstore/ml/{dataset,gpu,batch,train,serve,notebook}/ trees. Everything here is new, in the bookstore-ml PSA-restricted namespace, and reuses the same images built by ../train/ and ../serve/ — so the pipeline orchestrates a real loop end-to-end.

Files

File Kind Built-in? Purpose
recommender-workflow.yaml WorkflowTemplate + RBAC mixed (built-in SA/Role/Binding + CRD) the reusable train -> eval -> register -> promote DAG
recommender-cronworkflow.yaml CronWorkflow CRD nightly retraining (0 2 * * * UTC)
recommender-eventsource.yaml EventSource CRD webhook-triggered retraining (Argo Events)
recommender-sensor.yaml Sensor + RBAC mixed (CRD + SA/Role/Binding) turns an event into a Workflow from the template
register-cm-template.yaml ConfigMap built-in the shape of the registry stamp the register step writes at runtime

CRD-backed manifests in this tree

All Argo-Workflows / Argo-Events objects (WorkflowTemplate, CronWorkflow, EventSource, Sensor) carry the documented CRD-intrinsic header note — identical precedent to the guide's raw-manifests/51-, 70-, 83-, the argocd/, operators/, chaos/ files, and ml/batch/, ml/serve/, ml/train/. A client dry-run without the operator installed prints no matches for kind "..."; the schema is correct, and the chapter walks the pinned-Helm install.

The built-in ConfigMap (register-cm-template.yaml) and the SA/Role/ RoleBinding triples inside recommender-workflow.yaml and recommender-sensor.yaml dry-run cleanly anywhere.

Install the operators (pinned; own namespaces)

# Argo Workflows (workflow controller + server + executor) into ns `argo`.
helm repo add argo https://argoproj.github.io/argo-helm
ARGO_WORKFLOWS_VERSION="0.42.0"   # bump deliberately; chart != app version.
helm install argo-workflows argo/argo-workflows \
  --version "$ARGO_WORKFLOWS_VERSION" \
  -n argo --create-namespace --wait

# Argo Events (controller + EventBus + EventSource/Sensor controllers).
ARGO_EVENTS_VERSION="2.4.7"
helm install argo-events argo/argo-events \
  --version "$ARGO_EVENTS_VERSION" \
  -n argo-events --create-namespace --wait

# Argo Events needs a default EventBus in the namespace where Sensors run.
kubectl apply -n argo-events -f - <<'EOF'
apiVersion: argoproj.io/v1alpha1
kind: EventBus
metadata: { name: default }
spec:
  jetstream:
    # pinned NATS JetStream version (controller-managed image, not a Helm chart)
    # — bump together with the Argo Events chart version
    version: "2.10.11"
EOF

Pinned --version flags only; never releases/latest/download/<FILE>.yaml.

Apply

# 1) Prereq from earlier chapters — bookstore-ml namespace (ch.01) +
#    the recommender-model PVC + model.joblib (../train/, ch.04).
kubectl apply -f examples/bookstore/ml/train/recommender-train-job.yaml
kubectl wait --for=condition=complete job/recommender-train \
  -n bookstore-ml --timeout=300s
# (optional but expected for `promote`) the serving Deployment from ch.06:
kubectl apply -f examples/bookstore/ml/serve/recommender-deployment.yaml
kubectl apply -f examples/bookstore/ml/serve/recommender-service.yaml

# 2) The WorkflowTemplate + the per-namespace SA/Role/Binding.
kubectl apply -f examples/bookstore/ml/pipeline/recommender-workflow.yaml

# 3) Run the pipeline once (interactive; requires the `argo` CLI):
argo submit --from workflowtemplate/recommender-pipeline -n bookstore-ml
argo list -n bookstore-ml
argo logs -n bookstore-ml @latest
argo get  -n bookstore-ml @latest

# 4) Schedule nightly retraining (the CronWorkflow):
kubectl apply -f examples/bookstore/ml/pipeline/recommender-cronworkflow.yaml
kubectl get cronworkflow -n bookstore-ml

# 5) Event-driven retraining (the EventSource + Sensor, in `argo-events`):
kubectl apply -f examples/bookstore/ml/pipeline/recommender-eventsource.yaml
kubectl apply -f examples/bookstore/ml/pipeline/recommender-sensor.yaml
# Trigger it with a POST from inside the cluster (uses the EventSource Pod's
# Service, default port 12000):
kubectl run -n argo-events curl-once --rm -it --restart=Never \
  --image=curlimages/curl:8.10.1 --command -- \
  curl -X POST -H 'content-type: application/json' \
    -d '{"dataset_uri":"pvc://recommender-model/"}' \
    http://recommender-dataset-eventsource-svc:12000/recommender-dataset-ready

What each step does (and what's the honest proxy)

Step Action KIND-runnable proxy Real prod shape
train Runs bookstore/recommender-train:dev (../train/Dockerfile); writes model.joblib to PVC recommender-model. Same Job, same image, same artifact as ../train/recommender-train-job.yaml. A Training Operator PyTorchJob / RayJob (../train/recommender-pytorchjob.yaml, ../train/recommender-rayjob.yaml) on a GPU node pool.
eval Loads model.joblib, computes average top-1 cosine similarity over all books, gates on --min-score (default 0.05). A script step in the SAME train image (sklearn + joblib already baked). Writes metrics.json to the PVC + emits score as an Argo output parameter. A real offline metric (NDCG@k, MRR, AUC, …) computed against a held-out set, gated on a domain-relevant SLO.
register Stamps (model_uri, score, registered_at, workflow) into a ConfigMap recommender-model-registry-<WORKFLOW>. kubectl create configmap from inside the workflow Pod, namespace-scoped RBAC, illustrated by register-cm-template.yaml. An MLflow Model Registry entry, a KFP Model Registry record, or an OCI artifact pushed to a registry — see Part 12 ch.07 + ch.08.
promote Annotates the recommender Deployment + rollout restarts it so the serving Pod re-loads the new model.joblib from the PVC. kubectl annotate + kubectl rollout restart deploy/recommender. A GitOps commit (Part 07 ch.04) that bumps the InferenceService storageUri to a new versioned URI; Argo CD reconciles; KServe shifts traffic via canaryTrafficPercent (Part 12 ch.06).

How the pieces fit together with the rest of ml/

 ../dataset/   schema (synthetic)
        │
        ▼
 ../train/    image bookstore/recommender-train:dev  ─┐
        │    writes model.joblib to PVC              │
        ▼                                            │
   ★ pipeline/recommender-workflow.yaml ──> argo  ───┤  ← THIS DIR
   ★ pipeline/recommender-cronworkflow.yaml             │
   ★ pipeline/recommender-eventsource.yaml + sensor.yaml│
        │                                            │
        ▼                                            ▼
 ../serve/    image bookstore/recommender-serve:dev
              loads SAME model.joblib (Deployment OR KServe InferenceService)

PSA, RBAC, and other invariants

  • PSA-restricted on every workflow Pod — pod-level securityContext (runAsNonRoot, non-root UID 65532, seccomp RuntimeDefault) applied via the WorkflowTemplate.spec.podSpecPatch; container-level securityContext (allowPrivilegeEscalation:false, readOnlyRootFilesystem, drop ALL caps) on every step's container. The Argo Workflows controller lives in the operator's OWN namespace (argo); the workflow Pods live in bookstore-ml.
  • RBAC scoped to one namespace: the argo-workflow SA can manage workflow Pods, ConfigMaps, and Deployments in bookstore-ml only — no cluster-wide rights, no secrets access.
  • No machine-specific paths / users — image refs are bookstore/recommender-train:dev and bookstore/recommender-serve:dev (the same convention as ../train/ and ../serve/); replace with a registry-pushed tag in prod.

Honest "not built here"

See Part 12 ch.07/ch.08 for the deliberate scope boundaries (data versioning, feature stores, model explainability, drift detection, federated learning, multi-cluster training topologies, and the Kubeflow distribution itself).