Bookstore ML — serving (Part 12 ch.06)¶
This directory is the serving side of the recommendations thread. It
loads the model.joblib produced by ../train/ (the CPU
recommender-train Job) and exposes it as an HTTP API over the recommender
endpoint.
Files¶
| File | What it is | Runs on kind? |
|---|---|---|
predictor.py |
tiny FastAPI app: /v1/models/recommender:predict + /recommend + health |
yes (no cluster) |
requirements.txt |
pinned Python deps (FastAPI/uvicorn + joblib + sklearn) | — |
Dockerfile |
multi-stage slim Python image, non-root UID 65532 | yes |
Makefile |
compile / build / train-and-run / test targets |
yes |
recommender-deployment.yaml |
built-in Deployment of the predictor (PSA-restricted) | yes — kind-runnable |
recommender-service.yaml |
ClusterIP Service in front of the Deployment |
yes |
recommender-inferenceservice.yaml |
KServe InferenceService (CRD-backed; scale-to-zero, canary) |
needs KServe + Knative |
All manifests target the
bookstore-mlnamespace (PSAenforce: restricted).recommender-inferenceservice.yamlcarries the CRD-intrinsic header note: a client dry-run printsno matches for kind …until KServe is installed — schema-correct, not a bug (same precedent as the rest of the guide's CRD-backed manifests).
The HTTP surface¶
| Method | Path | Body / Query | What it returns |
|---|---|---|---|
GET |
/healthz |
— | {"status":"ok"} |
GET |
/ready |
— | {"status":"ready"} once the model is loaded |
GET |
/v1/models/recommender |
— | model metadata (kind/version/n_books/top_k) |
POST |
/v1/models/recommender:predict |
{"instances":[{"book_id":1,"k":3}]} |
top-K recommendations per instance |
GET |
/recommend?book_id=1&k=3 |
— | friendly equivalent used by catalog/storefront |
The :predict envelope follows the v1 prediction protocol used by KServe's
built-in runtimes — the same image works behind an InferenceService and a
plain Deployment.
Run it locally (no cluster)¶
# from this directory:
make compile # python3 -m py_compile predictor.py
make build # docker build -t bookstore/recommender-serve:dev .
# end-to-end: train -> joblib -> serve, just on docker
make train-and-run # runs train image -> model.joblib on `bookstore-model`
# docker volume, then runs serve image on :8080
# proof: curl localhost:8080/v1/models/recommender:predict (same as kind step 4)
Run it on kind (the kind-runnable path)¶
# from the repo root (full-guide/):
# 1) build + load images
docker build -t bookstore/recommender-train:dev examples/bookstore/ml/train
docker build -t bookstore/recommender-serve:dev examples/bookstore/ml/serve
kind load docker-image bookstore/recommender-train:dev
kind load docker-image bookstore/recommender-serve:dev
# 2) train: produces model.joblib on the recommender-model PVC
kubectl apply -f examples/bookstore/ml/train/recommender-train-job.yaml
kubectl wait --for=condition=complete job/recommender-train -n bookstore-ml --timeout=300s
# 3) serve: plain Deployment + Service consumes the same PVC
kubectl apply -f examples/bookstore/ml/serve/recommender-deployment.yaml
kubectl apply -f examples/bookstore/ml/serve/recommender-service.yaml
kubectl rollout status deploy/recommender -n bookstore-ml --timeout=120s
# 4) proof: port-forward and POST a predict — the final proof step
kubectl port-forward -n bookstore-ml svc/recommender 8080:8080 &
curl -s -X POST localhost:8080/v1/models/recommender:predict \
-H 'content-type: application/json' \
-d '{"instances":[{"book_id":1,"k":3}]}' | jq .
The KServe path (needs the operator)¶
recommender-inferenceservice.yaml is the CRD-backed equivalent. OPTION A
(custom predictor) uses the same image under KServe's serverless wrapper —
request-driven autoscaling, scale-to-zero, InferenceService-level canary/A-B.
OPTION B (KServe's built-in sklearn ServingRuntime) is a commented stub
in the same file. Install KServe (+ Knative Serving + cert-manager) per
Part 12 ch.06 Hands-on before applying.
Integration with the Bookstore app (catalog / storefront)¶
The recommender's in-cluster DNS is recommender.bookstore-ml.svc.cluster.local:8080. The chapter (../../../12-kubernetes-for-machine-learning/06-model-serving-and-inference.md) describes the catalog/storefront integration via kubectl exec/curl; this README does not mutate the canonical app.