Skip to content

Bookstore ML — serving (Part 12 ch.06)

This directory is the serving side of the recommendations thread. It loads the model.joblib produced by ../train/ (the CPU recommender-train Job) and exposes it as an HTTP API over the recommender endpoint.

Files

File What it is Runs on kind?
predictor.py tiny FastAPI app: /v1/models/recommender:predict + /recommend + health yes (no cluster)
requirements.txt pinned Python deps (FastAPI/uvicorn + joblib + sklearn)
Dockerfile multi-stage slim Python image, non-root UID 65532 yes
Makefile compile / build / train-and-run / test targets yes
recommender-deployment.yaml built-in Deployment of the predictor (PSA-restricted) yes — kind-runnable
recommender-service.yaml ClusterIP Service in front of the Deployment yes
recommender-inferenceservice.yaml KServe InferenceService (CRD-backed; scale-to-zero, canary) needs KServe + Knative

All manifests target the bookstore-ml namespace (PSA enforce: restricted). recommender-inferenceservice.yaml carries the CRD-intrinsic header note: a client dry-run prints no matches for kind … until KServe is installed — schema-correct, not a bug (same precedent as the rest of the guide's CRD-backed manifests).

The HTTP surface

Method Path Body / Query What it returns
GET /healthz {"status":"ok"}
GET /ready {"status":"ready"} once the model is loaded
GET /v1/models/recommender model metadata (kind/version/n_books/top_k)
POST /v1/models/recommender:predict {"instances":[{"book_id":1,"k":3}]} top-K recommendations per instance
GET /recommend?book_id=1&k=3 friendly equivalent used by catalog/storefront

The :predict envelope follows the v1 prediction protocol used by KServe's built-in runtimes — the same image works behind an InferenceService and a plain Deployment.

Run it locally (no cluster)

# from this directory:
make compile           # python3 -m py_compile predictor.py
make build             # docker build -t bookstore/recommender-serve:dev .
# end-to-end: train -> joblib -> serve, just on docker
make train-and-run     # runs train image -> model.joblib on `bookstore-model`
                       # docker volume, then runs serve image on :8080
# proof: curl localhost:8080/v1/models/recommender:predict (same as kind step 4)

Run it on kind (the kind-runnable path)

# from the repo root (full-guide/):
# 1) build + load images
docker build -t bookstore/recommender-train:dev examples/bookstore/ml/train
docker build -t bookstore/recommender-serve:dev examples/bookstore/ml/serve
kind load docker-image bookstore/recommender-train:dev
kind load docker-image bookstore/recommender-serve:dev
# 2) train: produces model.joblib on the recommender-model PVC
kubectl apply -f examples/bookstore/ml/train/recommender-train-job.yaml
kubectl wait --for=condition=complete job/recommender-train -n bookstore-ml --timeout=300s
# 3) serve: plain Deployment + Service consumes the same PVC
kubectl apply -f examples/bookstore/ml/serve/recommender-deployment.yaml
kubectl apply -f examples/bookstore/ml/serve/recommender-service.yaml
kubectl rollout status deploy/recommender -n bookstore-ml --timeout=120s
# 4) proof: port-forward and POST a predict — the final proof step
kubectl port-forward -n bookstore-ml svc/recommender 8080:8080 &
curl -s -X POST localhost:8080/v1/models/recommender:predict \
  -H 'content-type: application/json' \
  -d '{"instances":[{"book_id":1,"k":3}]}' | jq .

The KServe path (needs the operator)

recommender-inferenceservice.yaml is the CRD-backed equivalent. OPTION A (custom predictor) uses the same image under KServe's serverless wrapper — request-driven autoscaling, scale-to-zero, InferenceService-level canary/A-B. OPTION B (KServe's built-in sklearn ServingRuntime) is a commented stub in the same file. Install KServe (+ Knative Serving + cert-manager) per Part 12 ch.06 Hands-on before applying.

Integration with the Bookstore app (catalog / storefront)

The recommender's in-cluster DNS is recommender.bookstore-ml.svc.cluster.local:8080. The chapter (../../../12-kubernetes-for-machine-learning/06-model-serving-and-inference.md) describes the catalog/storefront integration via kubectl exec/curl; this README does not mutate the canonical app.