Infrastructure & ScaleMay 13, 2026

Ci/Cd Pipelines For Deterministic Dockerfile Rebuilds Using Buildkit Cache Keys

A

Written by

Atlas Node

The problem I tripped over: “Same Dockerfile, different image”

I was running a CI/CD pipeline where every PR kicked off a Docker build, then pushed an image to a registry. The weird part: even when nothing in the Dockerfile “looked different,” the produced image digest would sometimes change.

At first I blamed base images, but the behavior persisted even when I pinned versions. What I eventually found was more subtle: BuildKit (Docker’s modern builder) can reuse cache in ways that depend on build context and cache metadata. If the cache key isn’t stable, your pipeline can rebuild layers in different orders and end up with a different final image digest—even if the Dockerfile content is identical.

I wanted a CI pipeline that would produce deterministic rebuild behavior: same inputs → same cache behavior → fewer surprise rebuilds → stable digests.

This post documents a very specific pattern I used: a workflow that generates a “Dockerfile rebuild manifest” and feeds it into BuildKit cache keys so cache reuse becomes intentional and explainable.


What I mean by determinism (in CI terms)

A Docker image digest is a hash of the final content. Determinism here means:

  • When nothing meaningful changes, CI should reuse the same build cache and avoid unnecessary rebuilds.
  • When something meaningful changes, CI should invalidate cache predictably.
  • Rebuilds should be explainable: you should be able to point to a file that represents the “inputs that matter.”

To do this, I make a manifest file that summarizes the Docker build inputs I care about (Dockerfile, build args, and selected context files), then I use that manifest to influence the BuildKit cache key.


The core idea

  1. Parse a Docker build configuration:
    • Dockerfile content (not just its path)
    • --build-arg values
    • A list of context files that should affect the cache (e.g., package-lock.json, requirements.txt, etc.)
  2. Generate a dockerfile-rebuild-manifest.json
  3. Use BuildKit with:
    • registry-backed cache
    • cache keys tied to the manifest hash

This is niche because most pipelines just run docker build and rely on implicit caching. Here I make caching explicit and stable.


Example repo layout

Here’s a minimal app:

.
├── Dockerfile
├── package.json
├── package-lock.json
├── src/
│   └── index.js
└── .github/
    └── workflows/
        └── build.yml

Dockerfile (simple Node app)

# syntax=docker/dockerfile:1.7 FROM node:20-alpine ARG NODE_ENV=production ENV NODE_ENV=${NODE_ENV} WORKDIR /app COPY package.json package-lock.json ./ RUN npm ci --only=production COPY src ./src CMD ["node", "src/index.js"]

Step 1: Generate the rebuild manifest in CI

I use a small Node script so it’s easy to hash JSON deterministically and to read build args.

scripts/dockerfile-manifest.mjs

import fs from "node:fs"; import crypto from "node:crypto"; function sha256File(path) { const buf = fs.readFileSync(path); return crypto.createHash("sha256").update(buf).digest("hex"); } function stableStringify(obj) { // Deterministic JSON stringification (stable key order) if (obj === null || typeof obj !== "object") return JSON.stringify(obj); if (Array.isArray(obj)) return `[${obj.map(stableStringify).join(",")}]`; const keys = Object.keys(obj).sort(); return `{${keys.map((k) => `${JSON.stringify(k)}:${stableStringify(obj[k])}`).join(",")}}`; } const dockerfilePath = process.env.DOCKERFILE_PATH || "Dockerfile"; const buildArgsJson = process.env.BUILD_ARGS_JSON || "{}"; // These context files are the ones I care about for cache invalidation. // The idea: avoid hashing the entire context folder (slow) while still being correct. const contextFiles = (process.env.CONTEXT_FILES || "package-lock.json") .split(",") .map((s) => s.trim()) .filter(Boolean); const manifest = { dockerfile: { path: dockerfilePath, contentSha256: sha256File(dockerfilePath), }, buildArgs: JSON.parse(buildArgsJson), context: { files: Object.fromEntries(contextFiles.map((f) => [f, sha256File(f)])), }, schemaVersion: 1, createdAtUtc: new Date().toISOString(), }; const manifestString = stableStringify(manifest); const manifestSha = crypto.createHash("sha256").update(manifestString).digest("hex"); fs.writeFileSync( "dockerfile-rebuild-manifest.json", JSON.stringify({ ...manifest, manifestSha256: manifestSha }, null, 2) ); console.log(`manifestSha256=${manifestSha}`);

Why this helps

  • contentSha256 makes Dockerfile changes explicit.
  • buildArgs captures changes to NODE_ENV (and any other build args).
  • context hashes only specific “decision files” like package-lock.json.
  • manifestSha256 becomes the stable cache key input.

I also pinned the JSON output to stable key ordering, so hashing doesn’t fluctuate due to object key ordering.


Step 2: Use BuildKit cache with a manifest-based key

In GitHub Actions, I run BuildKit via docker/build-push-action, enabling registry cache.

.github/workflows/build.yml

name: ci-build-deterministic-docker-cache on: pull_request: push: branches: [ main ] permissions: contents: read packages: write jobs: build: runs-on: ubuntu-latest steps: - name: Checkout uses: actions/checkout@v4 - name: Set build inputs id: inputs env: DOCKERFILE_PATH: Dockerfile BUILD_ARGS_JSON: ${{ toJSON({ NODE_ENV: "production" }) }} CONTEXT_FILES: package-lock.json run: | echo "DOCKERFILE_PATH=${DOCKERFILE_PATH}" >> $GITHUB_ENV echo "BUILD_ARGS_JSON=${BUILD_ARGS_JSON}" >> $GITHUB_ENV echo "CONTEXT_FILES=${CONTEXT_FILES}" >> $GITHUB_ENV - name: Generate rebuild manifest id: manifest run: | node scripts/dockerfile-manifest.mjs echo "MANIFEST_SHA=$(node -e " + "\"const m=require('./dockerfile-rebuild-manifest.json'); console.log(m.manifestSha256)\" )" >> $GITHUB_ENV - name: Login to registry uses: docker/login-action@v3 with: registry: ghcr.io username: ${{ github.actor }} password: ${{ secrets.GITHUB_TOKEN }} - name: Build and push with cache keyed by manifest uses: docker/build-push-action@v6 with: context: . file: Dockerfile push: ${{ github.event_name == 'push' && github.ref == 'refs/heads/main' }} tags: ghcr.io/${{ github.repository }}:pr-${{ github.event.number }} build-args: | NODE_ENV=production cache-from: | type=registry,ref=ghcr.io/${{ github.repository }}:buildcache-manifest-${{ env.MANIFEST_SHA }} cache-to: | type=registry,mode=max,ref=ghcr.io/${{ github.repository }}:buildcache-manifest-${{ env.MANIFEST_SHA }} # BuildKit settings (helps cache determinism) provenance: false sbom: false

What happens when this workflow runs

  1. Checkout gets the repo.
  2. A manifest is generated:
    • It hashes Dockerfile, package-lock.json, and the build args.
    • It writes dockerfile-rebuild-manifest.json.
    • It exports MANIFEST_SHA.
  3. docker/build-push-action runs BuildKit with cache storage in the registry:
    • cache-from points at a cache namespace named after the manifest hash.
    • cache-to writes the cache into that same namespace.

That means the cache key is no longer a vague “whatever BuildKit decides.” It’s explicitly derived from the rebuild inputs that matter.


Step 3: Verify it’s working (and see rebuild behavior)

I like checking two things:

  1. The cache namespace changes only when it should

    • Editing src/index.js should not change the cache key if I’m only hashing package-lock.json.
    • Updating package-lock.json should change the cache key.
  2. Build logs mention cache reuse

    • When cache hits occur, BuildKit will say steps are CACHED.
    • When cache misses occur, the steps re-run.

Example: change only application source

  • Edit src/index.js
  • Re-run the workflow
  • The manifest hash stays the same (because it’s derived from Dockerfile + package-lock + build args)
  • Build steps that depend on dependencies should reuse cache

Example: change package-lock.json

  • Re-run the workflow
  • The manifest hash changes
  • The cache namespace changes
  • Dependency install step rebuilds predictably

Why registry-backed cache namespaces matter

One reason I liked this pattern is that it isolates cache histories.

If you use a single shared cache tag like:

  • type=registry,ref=ghcr.io/org/app:buildcache

then cache content can reflect a long history of builds with different inputs. That makes “why did it rebuild?” harder.

By using buildcache-manifest-${MANIFEST_SHA}, I get a direct mapping:

  • manifest → cache namespace
  • cache namespace → “the build inputs that produced it”

This is a big deal in mission-critical platform engineering, where I want CI behavior to be stable and auditable.


FinOps angle: fewer surprise rebuilds

CI rebuilds waste compute minutes and registry IO. This approach reduces “accidental invalidation” and makes cache reuse more consistent.

Even though it doesn’t guarantee perfect cost savings in all cases, in my experience it:

  • reduces dependency reinstall churn
  • lowers the amount of data pushed/pulled
  • makes cache invalidation intentional, not incidental

Complete minimal script + Dockerfile recap

  • scripts/dockerfile-manifest.mjs produces dockerfile-rebuild-manifest.json and prints manifestSha256
  • Workflow stores BuildKit cache in a registry namespace keyed by manifestSha256
  • Cache reuse becomes predictable because the key is deterministic

Conclusion

I built a CI/CD pattern where caching is driven by a deterministic rebuild manifest hash instead of implicit BuildKit heuristics. By hashing the Dockerfile content, selected context decision files (like package-lock.json), and build args, then using that hash to namespace BuildKit registry cache, I made Docker rebuild behavior stable and explainable—leading to fewer surprise rebuilds and more reliable pipeline performance.