14.12 — Supply chain security in production¶
Part 05 ch.03 introduced the four-stage trust chain — scan, SBOM, sign, admit — for the kind- local Bookstore. This chapter is the cloud-deployed path: AWS ECR enhanced scanning ($0.09/image/month for continuous CVE tracking),
syftfor the SBOM-as-build-artifact discipline, cosign keyless signing in GitHub Actions (OIDC trust → Fulcio short-lived cert → Rekor transparency-log entry — no long-lived signing keys to rotate), and KyvernoClusterPolicy.verifyImagesenforcing the signature at admission. Plus the honest reality of SLSA framework levels 1–4 — what each level means, what it costs to reach, and the level most teams realistically land at (SLSA 2-3, not SLSA 4).
Estimated time: ~30 min read · ~90 min hands-on Prerequisites: Part 05 ch.03 — four-stage trust chain (scan/SBOM/sign/admit) on kind · Part 13 ch.06 — bookstore security pass that lands signing · Part 12 ch.05 — admission policy rollout pattern
You'll know after this: • configure ECR enhanced scanning ($0.09/image/month) for continuous CVE tracking · • generate SBOMs with syft and bind them to image digests as a build artifact · • implement cosign keyless signing in GitHub Actions (OIDC → Fulcio → Rekor) with no long-lived keys · • enforce signatures at admission with Kyverno ClusterPolicy.verifyImages in Audit→Enforce rollout · • choose a realistic SLSA target (most teams land at 2-3, not 4) and know what each level actually costs
Why this exists¶
The bookstore-platform tree at
../examples/bookstore-platform/terraform/
ships an opt-in supply-chain stack: ECR scanning is on by default
(basic mode, free), and the
kyverno-image-signing.tf
resource installs Kyverno + a verifyImages ClusterPolicy when
var.enable_image_signing = true. The policy ships in Audit mode
deliberately — flipping to Enforce without first verifying the CI
signs every image will reject the cluster's own workloads on day
one (the canonical anti-pattern Part 05 ch.03 named). Production
operationalizes this as Audit for 2-4 weeks → triage findings →
flip to Enforce.
The threats supply-chain security defends against are not hypothetical:
- Typosquatting / dependency confusion — an attacker publishes
react-route(lowercase route) hoping someone in CI typos the realreact-router. The CI machine pulls the malicious package without a verified signature; the package phones home with the build secrets. - Base-image CVEs at runtime — the Alpine image you based your
container on was clean at build time; a new critical CVE is
disclosed against
libsslsix weeks later. Without continuous scanning, you don't know your fleet is exposed. - Tag-overwrite attacks — a compromised registry push token
pushes a malicious image to the same tag (
myorg/api:v1.2.3) that the cluster pulls. Without digest-pinning and signature verification, the cluster pulls and runs whatever is at that tag right now. - Compromised CI runners — a shared CI runner has a malicious
step injected via a typo'd action (
uses: my-action@v1→uses: my-actoin@v1); the malicious action exfiltrates the build secrets and pushes a tampered binary into the legitimate image. - Build-time supply-chain injection — a transitive dependency
in your image's
package.jsonorgo.modis compromised upstream; the image is "yours" but its bytecode contains attacker code.
The defenses come in layers, each closing one class of attack:
- ECR enhanced scanning — continuous CVE re-scan against the CVE database; alerts when a deployed image is newly vulnerable (defends 2, partially 5).
- SBOM generation + storage — a durable inventory of every
package in every image; you can query "which deployed images
contain
log4j< 2.17?" after the next CVE drops (defends 2, 5). - Cosign keyless signing in CI — the build pipeline signs the image with a short-lived OIDC-tied certificate; the signature goes to Rekor's public transparency log (defends 1, 3, 4).
- Kyverno
verifyImagesadmission policy — at admission time, the cluster verifies the signature is from the expected CI identity (defends 3, 4). - Multi-arch signing discipline — every per-architecture manifest entry is signed (defends 3 even when running mixed-arch fleets; the Graviton chapter flagged this).
The SLSA framework (Supply-chain Levels for Software Artifacts, pronounced "salsa") is the structured progression model. SLSA 1 is "scripted build with provenance"; SLSA 4 is "two-party-reviewed hermetic build". Most production teams land at SLSA 2-3 — the delta from SLSA 1 to SLSA 2 is the highest ROI; SLSA 3 adds provenance attestations; SLSA 4 is reserved for high-assurance workloads (financial services, government). The bookstore platform targets SLSA 2-3 with cosign + GitHub Actions OIDC; reaching SLSA 4 would require hermetic builds and two-party review on every release.
Part 05 ch.03 walked the
four-stage trust chain on a kind cluster with Trivy + a Kyverno
policy in Audit mode. This chapter is the cloud overlay: ECR's
scanning instead of Trivy in CI; cosign keyless instead of the
optional Cosign sketch; Kyverno with verifyImages instead of the
pattern-matching policies; SLSA as the framing for "how mature is
our supply chain?".
In production: Supply-chain security is best built in layers, with each layer landing in Audit mode for 2-4 weeks before Enforce. The single highest-impact control is signing in CI + verifyImages at admission. The runner-ups (ECR scanning, SBOM storage) are operationally cheap and inform incident response when the next critical CVE drops.
Mental model¶
Four pieces compose cloud supply-chain security: (1) ECR enhanced
scanning for continuous CVE tracking ($0.09/image/month), (2) syft
SBOMs as build artifacts stored alongside the image, (3) cosign
keyless signing in GitHub Actions backed by OIDC → Fulcio → Rekor,
(4) Kyverno verifyImages ClusterPolicy enforcing the signature at
admission. The SLSA framework is the maturity rubric you grade
against.
The four pieces:
- Piece 1 — ECR enhanced scanning. ECR has two scan modes: Basic (free; scans on push; uses the Clair CVE database; no continuous re-scanning) and Enhanced (powered by Amazon Inspector; $0.09/image/month; continuous re-scanning against the CVE database; richer findings including OS + language dependencies; integrates with AWS Security Hub). For a production cluster with 50 images and ongoing CVE flow, the $4.50/month is the cost of knowing-when-your-image-is-newly- vulnerable; the alternative is finding out from a CVE blog or a pentest report.
- Piece 2 — SBOM generation with
syft. A Software Bill Of Materials is a structured inventory of every package in an image — OS packages, language packages, binaries, files of interest. The two canonical formats are SPDX (Linux Foundation standard) and CycloneDX (OWASP).syft <IMAGE> -o spdx-jsonproduces an SBOM in seconds;syft scanworks against registries directly. Store the SBOM as a CI artifact, and attach it to the image viacosign attest --predicate sbom.json— the attestation lives in the OCI registry alongside the image and is signed under the same cert as the image. - Piece 3 — Cosign keyless signing. Cosign is the Sigstore project's tool for signing OCI artifacts. Keyless signing generates a short-lived (~10 minute) signing certificate from Fulcio (Sigstore's CA), tied to an OIDC identity (the GitHub Actions workflow that's running, or a GitLab job, or an AWS IAM identity). The signature is stored in Rekor (Sigstore's append-only transparency log) — anyone can audit the log to see "this image was signed by this workflow at this time". No long-lived signing keys; no key-rotation problem; the identity-to-signature binding is in the cert + the log.
- Piece 4 — Kyverno
verifyImagespolicy. A KyvernoClusterPolicywith averifyImagesrule checks at admission time that every image in a Pod spec has a valid cosign signature matching the configured OIDC issuer + subject (the expected workflow). The bookstore tree'skyverno-image-signing.tfinstalls Kyverno + this policy inAuditmode by default — flipping toEnforceis a one-line edit once CI is reliably signing.
ECR scanning math. Enhanced scanning is $0.09 per image per month (charged based on the number of unique images stored in ECR times the number of months they're stored). A repository with 100 unique images costs $9/month. Most production teams have 30-50 distinct images in ECR (per-service base + maybe a couple of historical tags); the cost is real but small.
Why CI is the right place to sign. A developer's laptop with a local signing key is the wrong place — the key is reusable, can be stolen, has unclear identity. CI is the right place because:
- The CI run has a verifiable identity (the workflow's OIDC token, signed by GitHub/GitLab's OIDC issuer).
- The signing cert is short-lived (~10 min from Fulcio); even if extracted, it can't be reused after expiry.
- The signing event is recorded in Rekor with the CI identity; the audit trail is automatic.
So the cosign signing step in GitHub Actions looks like (the full walkthrough is in the hands-on):
permissions:
id-token: write # required for OIDC
contents: read
- uses: sigstore/cosign-installer@v3
- run: cosign sign --yes <ECR-URI>:<TAG>@sha256:<DIGEST>
env:
COSIGN_EXPERIMENTAL: "true" # legacy flag, still honored
The id-token: write permission grants the workflow an OIDC token
GitHub mints for this run. cosign reads it, exchanges it with Fulcio
for a short-lived cert, signs the image, pushes the signature to
the registry as an OCI artifact + records the entry in Rekor.
Multi-arch signing — sign the manifest list + every entry. A multi-arch image (from the Graviton chapter) is a manifest list pointing at per-arch manifests. cosign signs the digest you point it at. To verify across all arches:
cosign sign <IMAGE>:<TAG>@sha256:<MANIFEST-LIST-DIGEST>signs the manifest list digest.cosign sign --recursive <IMAGE>:<TAG>signs the manifest list and each per-arch entry's digest.
Kyverno's verification has to match. The bookstore tree's policy
verifies on the digest the cluster actually pulls — which is the
per-arch entry's digest, not the manifest list digest. So you
either sign with --recursive (one cosign command, signs everything)
or configure verifyImages to verify the manifest list digest
(more brittle). --recursive is the right discipline.
SLSA framework levels — what each means, what most teams reach.
| SLSA Level | What it certifies | Build infrastructure | What it costs |
|---|---|---|---|
| SLSA 1 | Scripted build with documented process | Any | Documentation discipline |
| SLSA 2 | Hosted, version-controlled build with signed provenance | Hosted CI (GitHub Actions, GitLab CI, Tekton) with OIDC | One-time CI setup |
| SLSA 3 | Source + build integrity; non-falsifiable provenance | Hardened build runners; signed attestations | Significant CI investment |
| SLSA 4 | Hermetic, reproducible builds; two-party review | Hermetic build env; mandatory code review; reproducibility verification | Substantial process + culture |
Most production teams realistically land at SLSA 2-3:
- SLSA 1 is "we have a Dockerfile in Git and a script that builds the image". Almost free; almost no security guarantees.
- SLSA 2 is "we build in GitHub Actions / GitLab CI; cosign- sign every image; the OIDC identity is the build cert". The bookstore platform's CI is exactly here. The biggest single ROI step is the SLSA 1 → SLSA 2 transition.
- SLSA 3 adds SLSA provenance attestations — a cosign
attestation that records the build inputs (commit SHA, source
repo, builder image, build steps). Reachable with
slsa-github-generatorfor GitHub Actions builds. - SLSA 4 is "hermetic, reproducible builds + two-party review". Hermetic = the build runs in a network-isolated environment with all dependencies pre-fetched; reproducible = building the same source twice produces byte-identical outputs. Reached by organizations like Google for their internal builds; rare in general industry.
The bookstore platform targets SLSA 2-3; the
GitHub Actions workflow in examples/bookstore-platform/terraform/.github/workflows/
includes cosign sign + an optional slsa-github-generator step
that gets you to SLSA 3.
The trap to keep in view: signing is provenance, NOT safety. A cosign signature proves "the image was signed by this CI identity"; it does NOT prove "the image is free of vulnerabilities" or "the build is hermetic" or "the developer didn't commit malicious code". A malicious commit by an authorized developer produces a legitimately-signed malicious image. Defense in depth requires all four pieces (scan + SBOM + sign + admit) — not just one.
Diagrams¶
Diagram A — CI pipeline: build, scan, SBOM, sign, push, admit (Mermaid)¶
flowchart LR
src["Developer pushes
commit to main"]
gha["GitHub Actions
workflow triggers"]
subgraph build["CI: build + scan"]
build_img["docker buildx build
linux/amd64 + linux/arm64"]
ecr_push["docker push
to ECR"]
ecr_scan["ECR Enhanced Scanning
(Amazon Inspector)
$0.09/image/month"]
end
subgraph sign["CI: SBOM + sign"]
syft["syft -o spdx-json
-> sbom.spdx.json"]
gha_oidc["GitHub mints OIDC token
for this workflow run"]
fulcio["Fulcio signs short-lived cert
(~10 min validity)
cert identity = workflow"]
cosign_sign["cosign sign --recursive
@"]
cosign_attest["cosign attest --predicate sbom.spdx.json"]
rekor["Rekor transparency log
(append-only, public)"]
end
subgraph admission["Cluster: admission policy"]
api["kube-apiserver
receives Pod create"]
kyv["Kyverno
verifyImages ClusterPolicy"]
verify{"Signature valid?
OIDC issuer matches?
Subject matches?"}
admit["Pod admitted -> kubelet pulls
by digest"]
reject["Pod REJECTED -> error in events:
image verification failed"]
end
src --> gha --> build_img --> ecr_push --> ecr_scan
ecr_push --> syft
gha --> gha_oidc --> fulcio
syft --> cosign_attest
fulcio --> cosign_sign --> rekor
cosign_attest --> rekor
rekor -. "signature published" .-> api
api --> kyv --> verify
verify -- "yes (Enforce)" --> admit
verify -- "no (Enforce)" --> reject
verify -- "Audit mode" --> admit
style sign fill:#e8f4f8
style admission fill:#fef4e8
Diagram B — Trust chain + SLSA level alignment (ASCII)¶
TRUST CHAIN (the 4 questions, the 4 controls):
Question Control SLSA level
───────────────────────────── ──────────────────────────────── ──────────
Q1. What's IN this image? ECR Enhanced Scanning + SBOM SLSA 1+
(continuous CVE rescan; syft SBOM
stored as artifact + attested)
Q2. Is it EXACTLY what we Pin by digest (sha256:...) SLSA 1+
built? in deployed manifests.
Q3. WHO vouches for it? cosign keyless sign in CI: SLSA 2-3
- OIDC issuer (GitHub Actions
/ GitLab / AWS IAM)
- Fulcio short-lived cert
- Rekor transparency log entry
+ optional slsa-github-generator
for SLSA-3 provenance attestation
Q4. Will the cluster REFUSE Kyverno verifyImages Operational
the rest? ClusterPolicy (Audit -> Enforce). discipline
SLSA LADDER:
Level Defining feature Bookstore platform position
───── ─────────────────────────── ───────────────────────────────────────
SLSA 1 Scripted, documented build [yes] We have a Dockerfile + CI script.
SLSA 2 Hosted CI, signed build, [yes] GitHub Actions OIDC -> Fulcio ->
OIDC identity Rekor; cosign sign on every image.
SLSA 3 Source + build integrity, [partial] slsa-github-generator step
provenance attestations available; teams enable per repo.
SLSA 4 Hermetic, reproducible, [no] Not pursued; cost vs. value gap.
two-party review
The 80/20: getting to SLSA 2 captures most of the supply-chain win.
SLSA 3 is the right next step for orgs that handle regulated data.
SLSA 4 is for high-assurance workloads (finance, gov, kernel-level libs).
Hands-on with the Bookstore Platform¶
0. Prerequisites¶
- The bookstore-platform tree applied with
enable_image_signing = trueinterraform.tfvars(thekyverno-image-signing.tfresource installs Kyverno + the ClusterPolicy). - An ECR repository per service (the bookstore tree's
addons.tfdoes not create ECR repos; assume one exists, e.g.<ACCOUNT-ID>.dkr.ecr.<REGION>.amazonaws.com/bookstore-catalog). - A GitHub repository containing the bookstore source + a workflow.
cosignCLI installed locally for verification:brew install cosignor download from https://docs.sigstore.dev/cosign/system_config/installation/.syftCLI installed:brew install syftor https://github.com/anchore/syft/releases.
1. Enable ECR Enhanced Scanning on your repositories¶
In your account-level Terraform (or via aws CLI):
# Configure account-wide enhanced scanning.
aws ecr put-registry-scanning-configuration \
--scan-type ENHANCED \
--rules '[{
"scanFrequency": "CONTINUOUS_SCAN",
"repositoryFilters": [{"filter": "*", "filterType": "WILDCARD"}]
}]'
# Verify.
aws ecr get-registry-scanning-configuration \
--query 'scanningConfiguration.scanType'
# Output: "ENHANCED"
This is account-wide; every ECR repo in this region scans continuously from now on. Cost: $0.09/image/month — for 50 images, $4.50/month. Track it in Cost Explorer; the line item is "Inspector — Continuous Scanning - Inspector2".
2. Audit existing images for vulnerabilities¶
# List unique images across all repos with HIGH/CRITICAL findings.
aws ecr describe-image-scan-findings \
--repository-name bookstore-catalog \
--image-id imageTag=v1.2.3 \
--query 'imageScanFindings.findings[?severity==`CRITICAL` || severity==`HIGH`]' \
--output table
# Or use the AWS Console: ECR -> Repositories -> Image -> Scan Results.
Set up an EventBridge rule on ECR Image Action (the
aws.ecr source's ECR Image Action detail-type) to alert when a
new CRITICAL finding is published — your incident-response runbook
should turn this into a ticket within minutes of the finding.
3. Generate an SBOM with syft in CI¶
In your GitHub Actions workflow:
name: build-and-sign
on:
push:
branches: [main]
tags: ['v*']
permissions:
id-token: write # required for OIDC -> Fulcio
contents: read
packages: write # to push to GHCR if you use it
attestations: write # for SBOM attestation
jobs:
build:
runs-on: ubuntu-24.04
steps:
- uses: actions/checkout@v4
- name: Configure AWS credentials (OIDC)
uses: aws-actions/configure-aws-credentials@v4
with:
role-to-assume: arn:aws:iam::<ACCOUNT-ID>:role/<GHA-OIDC-ROLE>
aws-region: <REGION>
- name: Login to ECR
run: |
aws ecr get-login-password --region <REGION> | \
docker login --username AWS --password-stdin \
<ACCOUNT-ID>.dkr.ecr.<REGION>.amazonaws.com
- name: Set up Docker Buildx
uses: docker/setup-buildx-action@v3
- name: Build + push multi-arch image
id: build
uses: docker/build-push-action@v6
with:
platforms: linux/amd64,linux/arm64
tags: <ACCOUNT-ID>.dkr.ecr.<REGION>.amazonaws.com/bookstore-catalog:${{ github.sha }}
push: true
- name: Generate SBOM with syft
uses: anchore/sbom-action@v0
with:
image: <ACCOUNT-ID>.dkr.ecr.<REGION>.amazonaws.com/bookstore-catalog@${{ steps.build.outputs.digest }}
format: spdx-json
output-file: sbom.spdx.json
- name: Upload SBOM artifact
uses: actions/upload-artifact@v4
with:
name: sbom-${{ github.sha }}
path: sbom.spdx.json
- name: Install cosign
uses: sigstore/cosign-installer@v3
- name: cosign sign + attest SBOM
env:
IMAGE: <ACCOUNT-ID>.dkr.ecr.<REGION>.amazonaws.com/bookstore-catalog@${{ steps.build.outputs.digest }}
run: |
# --recursive signs the manifest list AND every per-arch entry.
cosign sign --yes --recursive "$IMAGE"
# Attach the SBOM as a signed attestation.
cosign attest --yes --type spdxjson \
--predicate sbom.spdx.json \
"$IMAGE"
What this does:
- Build a multi-arch image, push to ECR. ECR Enhanced Scanning starts scanning immediately.
- Generate an SBOM in SPDX-JSON format with
syft. - Upload SBOM as a GitHub Actions artifact (durable for 90 days by default; retain longer for compliance).
- Sign the image with cosign keyless — GitHub mints an OIDC token for this run, cosign exchanges with Fulcio for a cert, signs every per-arch digest, writes signatures to ECR + Rekor.
- Attest the SBOM with cosign — the SBOM is stored alongside the image in ECR, signed by the same cert, retrievable for audit.
4. Verify the signature locally¶
IMAGE=<ACCOUNT-ID>.dkr.ecr.<REGION>.amazonaws.com/bookstore-catalog:<TAG>
# Verify the signature against the expected GitHub Actions identity.
cosign verify \
--certificate-identity-regexp="https://github.com/<ORG>/<REPO>/.+" \
--certificate-oidc-issuer="https://token.actions.githubusercontent.com" \
"$IMAGE"
Expected output (truncated):
Verification for <IMAGE> --
The following checks were performed on each of these signatures:
- The cosign claims were validated
- Existence of the claims in the transparency log was verified offline
- The code-signing certificate was verified using trusted certificate authority certificates
[
{
"critical": {
"identity": {"docker-reference": "<IMAGE>"},
"image": {"docker-manifest-digest": "sha256:<DIGEST>"},
"type": "cosign container image signature"
},
"optional": {
"Bundle": {...},
"Issuer": "https://token.actions.githubusercontent.com",
"Subject": "https://github.com/<ORG>/<REPO>/.github/workflows/build.yml@refs/heads/main"
}
}
]
The Subject is the OIDC identity that signed — the specific
workflow file in your repo. Kyverno verifies against this exact
subject (or a regex matching your workflow naming convention).
5. Enable + inspect the Kyverno verifyImages policy¶
The Terraform shipping this is in
../examples/bookstore-platform/terraform/kyverno-image-signing.tf.
Read it end-to-end before running anything.
In terraform.tfvars:
enable_image_signing = true
image_signing_keyless_issuer = "https://token.actions.githubusercontent.com"
image_signing_keyless_subject = "https://github.com/<ORG>/<REPO>/.+" # regex
Apply:
terraform apply
This installs Kyverno + the require-signed-images ClusterPolicy
in Audit mode. Verify:
kubectl get clusterpolicy require-signed-images -o yaml \
| grep -A 1 'validationFailureAction'
# validationFailureAction: Audit
6. Watch the policy reports for unsigned images¶
In Audit mode, Kyverno doesn't block Pod creation — it writes PolicyReports instead. Inspect:
# Cluster-wide PolicyReports (for cluster-scoped resources).
kubectl get clusterpolicyreport
# Per-namespace PolicyReports.
kubectl get policyreport --all-namespaces
# Inspect failures.
kubectl get policyreport --all-namespaces \
-o jsonpath='{range .items[*]}{.metadata.namespace}{"\t"}{.summary.fail}{"\n"}{end}' \
| grep -v '^.*\t0$'
For every Pod in scope (not in the excluded namespaces: kube-system,
kube-public, kube-node-lease, kyverno, falco, velero,
argocd, bookstore-platform-system — all 8 listed in
kyverno-image-signing.tf, which is the source of truth),
the policy verifies the image is cosign-signed by the expected
identity. Failures mean unsigned images deployed in your cluster —
flag them; require CI to sign before flipping to Enforce.
7. Flip the policy to Enforce mode¶
After 2-4 weeks of Audit mode with zero unaddressed failures:
# Edit the ClusterPolicy directly:
kubectl patch clusterpolicy require-signed-images \
--type=merge \
-p '{"spec":{"validationFailureAction":"Enforce"}}'
Or — better — edit it in Terraform (
kyverno-image-signing.tf
line 187) and terraform apply. From this point on, any Pod whose
image isn't validly cosign-signed fails admission:
Error from server: admission webhook "validate.kyverno.svc-fail" denied the
request:
policy require-signed-images/verify-image-signatures
failed: image verification failed for ...:
no matching signatures: invalid signature when validating
ASN.1 encoded certificate
8. Add SLSA provenance attestation (optional, SLSA-3 step)¶
For SLSA-3-grade provenance, add the
slsa-github-generator
to the workflow:
- name: Generate SLSA provenance
uses: slsa-framework/slsa-github-generator/.github/workflows/generator_container_slsa3.yml@v2.0.0
with:
image: <ACCOUNT-ID>.dkr.ecr.<REGION>.amazonaws.com/bookstore-catalog
digest: ${{ steps.build.outputs.digest }}
registry-username: AWS
secrets:
registry-password: ${{ steps.ecr-password.outputs.password }}
This generates a SLSA Provenance attestation — a signed document recording the build inputs (source repo SHA, builder image, build steps). The attestation lives alongside the image signature in ECR + Rekor; verifiable by cosign:
cosign verify-attestation \
--type slsaprovenance \
--certificate-identity-regexp="https://github.com/slsa-framework/slsa-github-generator/.+" \
--certificate-oidc-issuer="https://token.actions.githubusercontent.com" \
<IMAGE>
This step is optional for the bookstore platform; teams targeting SLSA 3 add it; teams at SLSA 2 are fine without.
9. (Optional) Demonstrate the rejection in Enforce mode¶
To prove Enforce works, deploy a Pod with an unsigned image and watch it fail:
# A public unsigned image, e.g. some old debian:bullseye
kubectl run unsigned-pod --image=debian:bullseye-slim --command -- sleep 3600
In Audit mode: Pod creates; PolicyReport shows the failure. In
Enforce mode: kubectl run exits non-zero; the API server's
admission webhook denied the request.
How it works under the hood¶
ECR Enhanced Scanning architecture. Enhanced scanning is powered
by Amazon Inspector v2 under the hood. When an image is pushed,
ECR notifies Inspector; Inspector unpacks the image, enumerates OS
packages (via dpkg/rpm/apk) and language dependencies (via
package.json/go.sum/requirements.txt), looks each up in the CVE
database, and writes findings to ECR's scan-findings API. Continuous
re-scan: Inspector periodically (~hourly to daily, depending on
the CVE feed) re-evaluates existing images against the updated CVE
database; new findings appear as the database grows. The findings
are pulled into AWS Security Hub if configured; EventBridge fires on
Inspector2 Finding events for alerting integration.
SBOM generation — what syft does internally. syft mounts the
image's layers in a virtual filesystem and walks them. For each layer
it identifies:
- OS packages via
/var/lib/dpkg/status(Debian/Ubuntu),/var/lib/rpm/Packages(RHEL/Fedora),/lib/apk/db/installed(Alpine). - Language packages via
package.json+package-lock.json(Node),go.sum(Go),requirements.txt+Pipfile.lock(Python),pom.xml+gradle.lockfile(Java),Cargo.lock(Rust),composer.lock(PHP),Gemfile.lock(Ruby). - Binaries —
syftruns abinarycataloger that fingerprints well-known binaries (Go, Node, Python interpreters embedded in scratch/distroless images). - Files of interest —
LICENSE,Dockerfile, etc.
The output is a structured document (SPDX or CycloneDX) listing
every package + version + license + relationships. Stored alongside
the image (via cosign attest), it's retrievable for the lifetime
of the image.
Cosign keyless signing — Fulcio + Rekor. The keyless flow:
- The CI process has an OIDC token for its identity (GitHub
Actions mints one when
id-token: writeis granted; GitLab CI mints one when configured; AWS IAM identities use STS). cosign signgenerates a fresh ephemeral keypair in memory.- cosign sends the OIDC token + the public key to Fulcio (Sigstore's CA, run by the Linux Foundation).
- Fulcio verifies the OIDC token (calls back to the OIDC issuer), binds the public key to the OIDC identity, and issues a short-lived (~10 minute) X.509 certificate.
- cosign signs the image digest with the private key, then bundles the signature + the cert + the OIDC identity into an OCI artifact stored alongside the image in the registry.
- cosign submits the signature + cert to Rekor (Sigstore's transparency log) — an append-only Merkle tree of all signing events. Rekor returns a log entry index.
- The private key is discarded — it never persists.
Verification reverses the flow: load the signature artifact, verify the cert's chain to Fulcio's root, verify the cert's identity claim matches the expected OIDC issuer + subject, verify the signature against the image digest, optionally verify the Rekor log inclusion. The "no long-lived keys" property is the central security advantage.
Kyverno verifyImages admission flow. When the kube-apiserver
receives a Pod create/update request, the validating-admission
webhook chain runs. Kyverno's admission webhook checks the request
against every ClusterPolicy with verifyImages rules. For each
image reference in the Pod's containers, Kyverno:
- Resolves the image reference to a digest (via registry call).
- Fetches the signature artifact from the registry (cosign stores
it as
<DIGEST>.sig). - Verifies the signature against the configured
keylessclaim (issuer + subject regex). - Optionally verifies the Rekor log entry.
- If
mutateDigest: true, rewrites the Pod's image reference to include the verified digest (<IMAGE>:<TAG>@sha256:<DIGEST>). - Returns
allowed: trueorallowed: falseto the apiserver.
In Audit mode, Kyverno returns allowed: true even on verification
failure but writes a PolicyReport. In Enforce mode, failure returns
allowed: false and the apiserver rejects the request.
The transparency log's role. Rekor is the public, append-only
log where every cosign signature is recorded. Anyone (you, a
security auditor, a regulator) can query Rekor for "every signing
event from this workflow" or "every signing event from this
identity". Rekor's append-only property means signatures can't be
silently removed — a compromised registry could remove the
signature OCI artifact, but the Rekor entry remains. The
verification step cosign verify --rekor-url confirms the entry
is in the log; without it, you trust the registry's own signature
storage, which is weaker.
Why digest-pinning matters more than tag-pinning. A tag is a
mutable pointer. myorg/api:v1.2.3 today points at digest A;
tomorrow it could point at digest B (registry rewrites). Pulling
by myorg/api:v1.2.3 after a registry compromise gets you whatever
the attacker pointed the tag at. Pulling by digest
(myorg/api:v1.2.3@sha256:abc...) gets you exactly the bytes whose
hash is abc... — content-addressed, cryptographically anchored.
The Kyverno policy's mutateDigest: true rewrites tag references
to digest references at admission time; once admitted, the Pod's
image reference is a digest, and the kubelet pulls that exact
content.
Production notes¶
In production: Land the Kyverno policy in Audit mode for 2-4 weeks before flipping to Enforce. The PolicyReports surface every unsigned image — including the ones you didn't know were running (third-party charts, legacy workloads, pre-CI- signing artifacts). Triage every one: either CI starts signing the image, or the image's namespace gets added to the policy's
excludelist, or the workload is replaced. Don't flip to Enforce with un-triaged failures — you'll block production changes.In production: Native ARM CI runners for multi-arch signing. The Graviton chapter's point about emulation overhead applies double when signing —
cosign sign --recursiveon a multi-arch image signs each per-arch manifest digest separately; on a QEMU-emulated arm64 build the per-arch sign is fast (signing is small computation) but the overall CI pipeline (build + scan + SBOM + sign) is 5-10x slower than native. GitHub Actionsubuntu-24.04-armis the fastest fix.In production: ECR replication for multi-region. Cosign signatures are OCI artifacts stored in the same registry as the image. For a multi-region cluster pulling from a per-region ECR mirror, the signature artifact must also be replicated. ECR Replication (account-level + cross-region) handles this if configured. Without it, the replicated image has no replicated signature, and Kyverno's verifyImages fails in the non-source region. Audit cost: $0.09/GB replicated (typically tiny — signature artifacts are ~5 KB each).
In production: SLSA-3 provenance is a one-time CI investment with a continuous compliance payoff. The
slsa-github-generatorworkflow adds ~30 sec to each build but produces a verifiable provenance attestation. The right time to add it: when you have external auditors asking for the build process, not at the initial CI bring-up. The bookstore platform is at SLSA 2; bumping to SLSA 3 is a half-day's CI work.In production: Continuous SBOM querying. SBOMs are valuable in incident response: when CVE-2024-XXX drops against
libcv2.34, you want to know which deployed images containlibcv2.34 within minutes. Tools:grypequeries SBOMs against the CVE database; AWS Inspector reports findings against ECR images directly. The pattern: emit SBOMs in CI, store them in S3, run a nightlygrypescan against the latest CVE database, page when CRITICAL findings appear in deployed-image SBOMs.In production: Cosign verification needs network access to Fulcio + Rekor. Air-gapped clusters require the offline verification path — pre-pulling Fulcio root certs and Rekor log entries, configuring
cosign verifywith--insecure-ignore- tlog. The bookstore platform assumes internet-connected clusters; air-gapped variants need the offline-verification setup, which Sigstore docs cover at https://docs.sigstore.dev/cosign/system_config/airgapped/.In production: Public-image policy. Pods running public-image dependencies (Postgres, Redis, Kafka — every helm chart's images) are not signed by your CI. The Kyverno policy's
excludelist (kube-system, kyverno, falco, velero, argocd, bookstore- platform-system) covers the platform layer; for application workloads pulling public images, either (a) mirror to your ECR + re-sign, or (b) add the source namespace toexclude, or (c) use a separate Kyverno rule that allow-lists specific public registries (docker.io/library/*,quay.io/cncf/*). Pattern (a) is the right answer for high-assurance; (b/c) for teaching environments.
Quick Reference¶
# Enable ECR Enhanced Scanning (account-wide).
aws ecr put-registry-scanning-configuration \
--scan-type ENHANCED \
--rules '[{"scanFrequency":"CONTINUOUS_SCAN","repositoryFilters":[{"filter":"*","filterType":"WILDCARD"}]}]'
# Generate an SBOM with syft.
syft <IMAGE> -o spdx-json > sbom.spdx.json
# Sign an image with cosign keyless (CI-only; needs OIDC token).
cosign sign --yes --recursive <IMAGE>@sha256:<DIGEST>
# Attach a signed SBOM attestation.
cosign attest --yes --type spdxjson \
--predicate sbom.spdx.json <IMAGE>@sha256:<DIGEST>
# Verify a signature against an expected identity.
cosign verify \
--certificate-identity-regexp="https://github.com/<ORG>/<REPO>/.+" \
--certificate-oidc-issuer="https://token.actions.githubusercontent.com" \
<IMAGE>
# Check the Kyverno policy mode.
kubectl get clusterpolicy require-signed-images \
-o jsonpath='{.spec.validationFailureAction}'
# Flip to Enforce after Audit period.
kubectl patch clusterpolicy require-signed-images \
--type=merge \
-p '{"spec":{"validationFailureAction":"Enforce"}}'
# Inspect PolicyReports across the cluster.
kubectl get policyreport --all-namespaces \
-o jsonpath='{range .items[*]}{.metadata.namespace}{"\t"}{.summary}{"\n"}{end}'
Minimal verifyImages ClusterPolicy skeleton:
apiVersion: kyverno.io/v1
kind: ClusterPolicy
metadata:
name: require-signed-images
spec:
validationFailureAction: Audit # Audit first; Enforce after triage
background: true
webhookTimeoutSeconds: 30
rules:
- name: verify-cosign-signatures
match:
any:
- resources:
kinds: [Pod]
exclude:
any:
- resources:
namespaces: [kube-system, kube-public, kube-node-lease, kyverno, falco, velero, argocd, bookstore-platform-system]
verifyImages:
- imageReferences: ["*"]
attestors:
- entries:
- keyless:
issuer: "https://token.actions.githubusercontent.com"
subject: "https://github.com/<ORG>/.+"
mutateDigest: true
required: true
verifyDigest: true
Supply-chain-security checklist (the production setup is right when all eight are yes):
- ECR Enhanced Scanning enabled account-wide; CRITICAL findings surface via EventBridge to incident response.
- CI generates an SBOM (
syft) for every image and uploads it as an artifact + attests it viacosign attest. - CI signs every image with
cosign sign --recursiveusing keyless OIDC trust (no long-lived keys). - Multi-arch images sign every per-arch manifest entry (the
--recursiveflag). - Kyverno
verifyImagesClusterPolicy is installed in Audit mode and has been observed for >= 2 weeks with zero unaddressed failures. - Policy is now in Enforce mode; unsigned images are rejected at admission.
- SLSA level is documented (the bookstore platform: SLSA 2; SLSA 3 with slsa-github-generator step).
- Public-image exclusions (kube-system, kyverno, etc.) are reviewed quarterly — long exclusion lists are technical debt.
Test your understanding¶
Try each before opening the answer drawer. The act of trying is the exercise; the answer is the check.
-
Why does the chapter call cosign keyless signing the "right default" over key-managed signing?
Show answer
Keyless signing eliminates the long-lived signing key. The signing cert is short-lived (~10 minutes), generated by Fulcio at sign-time, and bound to the OIDC identity of the runner (GitHub Actions workflow `repo:org/repo:ref:refs/heads/main`). Key-managed signing requires you to store + rotate a private key somewhere — same blast-radius problem as long-lived AWS access keys. Keyless also publishes the signature to Rekor (the public transparency log), so independent auditors can verify "this image was signed by this workflow at this time." The chapter calls this "best-in-class" because there's literally nothing to rotate. -
Your CI starts failing with
Error: failed to sign: signature certificate not found in transparency logafter a CI rewrite. What's a likely cause?
Show answer
The CI workflow lost its OIDC token permissions — the most common cause is removing `id-token: write` from the workflow's `permissions:` block, or the workflow's job doesn't request the `id-token` permission at all. Without `id-token: write`, GitHub Actions doesn't mint a JWT, Fulcio refuses to issue a cert, cosign fails. The second cause: the Fulcio/Rekor service is briefly unavailable (network or upstream outage); cosign will fail closed. Both are diagnosable from the failed step's logs. The chapter's CI pattern is to make signing a required CI check so this never silently degrades — an unsigned image at admission is rejected by Kyverno in production, surfacing the broken-signing issue immediately. -
A team flips Kyverno's
verifyImagesClusterPolicy to Enforce on a Friday afternoon. By Monday morning, half the cluster's pods areImagePullBackOffwith admission errors. What was missed?
Show answer
The 2-4 week Audit observation window. Audit mode logs would have surfaced the long tail of unsigned images: third-party Helm charts (the LB Controller, Karpenter, metrics-server) pulling public images with no cosign signature, AWS-shipped addon images, ECR Public images for cluster-system workloads. Production-ready supply chain has an explicit **exception list** in the policy for these (typically by image-registry-prefix match), and the team has triaged the Audit findings into "fix the signing" vs "add to exception" before flipping to Enforce. Friday-afternoon Enforce flips without that triage cycle are the canonical incident the chapter warns about. -
Hands-on extension — build a multi-arch image with
docker buildx --platform linux/amd64,linux/arm64, then sign only withcosign sign <image>(no--recursive). Pull on an arm64 node with Kyverno verifyImages enforcing.
What you should see
The pull on the arm64 node fails admission because cosign signed only the top-level manifest list, not the per-architecture entries. Kyverno's `verifyImages` walks to the platform-specific image entry that's actually being pulled and looks for a signature on *that*; finding none, it rejects. The fix: `cosign sign --recursive` signs every per-architecture entry in the manifest list. The Graviton chapter's discipline reaches in here: multi-arch + signing requires `--recursive` or you build a cluster that runs fine on x86 and rejects every arm64 pull silently.
Further reading¶
- Sigstore documentation https://docs.sigstore.dev/; the canonical source for cosign, Fulcio, Rekor — the entire keyless signing toolchain this chapter relies on.
- Kyverno
verifyImagesrules https://kyverno.io/docs/writing-policies/verify-images/; the upstream documentation for the policy shape the bookstore tree ships. - AWS ECR Enhanced Scanning + Amazon Inspector https://docs.aws.amazon.com/AmazonECR/latest/userguide/image-scanning-enhanced.html; the AWS-side documentation for the scanning mode this chapter recommends and the EventBridge integration for alerting.
syftSBOM generator https://github.com/anchore/syft; the upstream tool, including the format reference (SPDX, CycloneDX) and the per-language cataloger list.- SLSA framework
https://slsa.dev/; the supply-chain levels-of-assurance
framework — read
spec/v1.0/levels.mdfor the formal definitions. slsa-github-generatorhttps://github.com/slsa-framework/slsa-github-generator; the GitHub Actions reusable workflow that gets you to SLSA 3.- Rosso et al., Production Kubernetes, ch.15 — "Software Supply Chain"; the broader image-build, scan, sign, admit pipeline this chapter operationalizes for AWS.
- CNCF TAG-Security supply-chain whitepaper https://github.com/cncf/tag-security/blob/main/supply-chain-security/supply-chain-security-paper/CNCF_SSCP_v1.pdf; the cloud-native industry consensus on the supply-chain threat model + the controls this chapter implements.