Ci/Cd Pipelines For Deterministic Dockerfile Rebuilds Using Buildkit Cache Keys

The problem I tripped over: “Same Dockerfile, different image”

I was running a CI/CD pipeline where every PR kicked off a Docker build, then pushed an image to a registry. The weird part: even when nothing in the Dockerfile “looked different,” the produced image digest would sometimes change.

At first I blamed base images, but the behavior persisted even when I pinned versions. What I eventually found was more subtle: BuildKit (Docker’s modern builder) can reuse cache in ways that depend on build context and cache metadata. If the cache key isn’t stable, your pipeline can rebuild layers in different orders and end up with a different final image digest—even if the Dockerfile content is identical.

I wanted a CI pipeline that would produce deterministic rebuild behavior: same inputs → same cache behavior → fewer surprise rebuilds → stable digests.

This post documents a very specific pattern I used: a workflow that generates a “Dockerfile rebuild manifest” and feeds it into BuildKit cache keys so cache reuse becomes intentional and explainable.

What I mean by determinism (in CI terms)

A Docker image digest is a hash of the final content. Determinism here means:

When nothing meaningful changes, CI should reuse the same build cache and avoid unnecessary rebuilds.
When something meaningful changes, CI should invalidate cache predictably.
Rebuilds should be explainable: you should be able to point to a file that represents the “inputs that matter.”

To do this, I make a manifest file that summarizes the Docker build inputs I care about (Dockerfile, build args, and selected context files), then I use that manifest to influence the BuildKit cache key.

The core idea

Parse a Docker build configuration:
- Dockerfile content (not just its path)
- --build-arg values
- A list of context files that should affect the cache (e.g., package-lock.json, requirements.txt, etc.)
Generate a dockerfile-rebuild-manifest.json
Use BuildKit with:
- registry-backed cache
- cache keys tied to the manifest hash

This is niche because most pipelines just run docker build and rely on implicit caching. Here I make caching explicit and stable.

Example repo layout

Here’s a minimal app:

.
├── Dockerfile
├── package.json
├── package-lock.json
├── src/
│   └── index.js
└── .github/
    └── workflows/
        └── build.yml

Dockerfile (simple Node app)

# syntax=docker/dockerfile:1.7
FROM node:20-alpine

ARG NODE_ENV=production
ENV NODE_ENV=${NODE_ENV}

WORKDIR /app

COPY package.json package-lock.json ./
RUN npm ci --only=production

COPY src ./src

CMD ["node", "src/index.js"]

Step 1: Generate the rebuild manifest in CI

I use a small Node script so it’s easy to hash JSON deterministically and to read build args.

`scripts/dockerfile-manifest.mjs`

import fs from "node:fs";
import crypto from "node:crypto";

function sha256File(path) {
  const buf = fs.readFileSync(path);
  return crypto.createHash("sha256").update(buf).digest("hex");
}

function stableStringify(obj) {
  // Deterministic JSON stringification (stable key order)
  if (obj === null || typeof obj !== "object") return JSON.stringify(obj);
  if (Array.isArray(obj)) return `[${obj.map(stableStringify).join(",")}]`;
  const keys = Object.keys(obj).sort();
  return `{${keys.map((k) => `${JSON.stringify(k)}:${stableStringify(obj[k])}`).join(",")}}`;
}

const dockerfilePath = process.env.DOCKERFILE_PATH || "Dockerfile";
const buildArgsJson = process.env.BUILD_ARGS_JSON || "{}";

// These context files are the ones I care about for cache invalidation.
// The idea: avoid hashing the entire context folder (slow) while still being correct.
const contextFiles = (process.env.CONTEXT_FILES || "package-lock.json")
  .split(",")
  .map((s) => s.trim())
  .filter(Boolean);

const manifest = {
  dockerfile: {
    path: dockerfilePath,
    contentSha256: sha256File(dockerfilePath),
  },
  buildArgs: JSON.parse(buildArgsJson),
  context: {
    files: Object.fromEntries(contextFiles.map((f) => [f, sha256File(f)])),
  },
  schemaVersion: 1,
  createdAtUtc: new Date().toISOString(),
};

const manifestString = stableStringify(manifest);
const manifestSha = crypto.createHash("sha256").update(manifestString).digest("hex");

fs.writeFileSync(
  "dockerfile-rebuild-manifest.json",
  JSON.stringify({ ...manifest, manifestSha256: manifestSha }, null, 2)
);

console.log(`manifestSha256=${manifestSha}`);

Why this helps

contentSha256 makes Dockerfile changes explicit.
buildArgs captures changes to NODE_ENV (and any other build args).
context hashes only specific “decision files” like package-lock.json.
manifestSha256 becomes the stable cache key input.

I also pinned the JSON output to stable key ordering, so hashing doesn’t fluctuate due to object key ordering.

Step 2: Use BuildKit cache with a manifest-based key

In GitHub Actions, I run BuildKit via docker/build-push-action, enabling registry cache.

`.github/workflows/build.yml`

name: ci-build-deterministic-docker-cache

on:
  pull_request:
  push:
    branches: [ main ]

permissions:
  contents: read
  packages: write

jobs:
  build:
    runs-on: ubuntu-latest
    steps:
      - name: Checkout
        uses: actions/checkout@v4

      - name: Set build inputs
        id: inputs
        env:
          DOCKERFILE_PATH: Dockerfile
          BUILD_ARGS_JSON: ${{ toJSON({ NODE_ENV: "production" }) }}
          CONTEXT_FILES: package-lock.json
        run: |
          echo "DOCKERFILE_PATH=${DOCKERFILE_PATH}" >> $GITHUB_ENV
          echo "BUILD_ARGS_JSON=${BUILD_ARGS_JSON}" >> $GITHUB_ENV
          echo "CONTEXT_FILES=${CONTEXT_FILES}" >> $GITHUB_ENV

      - name: Generate rebuild manifest
        id: manifest
        run: |
          node scripts/dockerfile-manifest.mjs
          echo "MANIFEST_SHA=$(node -e " +
            "\"const m=require('./dockerfile-rebuild-manifest.json'); console.log(m.manifestSha256)\" )" >> $GITHUB_ENV

      - name: Login to registry
        uses: docker/login-action@v3
        with:
          registry: ghcr.io
          username: ${{ github.actor }}
          password: ${{ secrets.GITHUB_TOKEN }}

      - name: Build and push with cache keyed by manifest
        uses: docker/build-push-action@v6
        with:
          context: .
          file: Dockerfile
          push: ${{ github.event_name == 'push' && github.ref == 'refs/heads/main' }}
          tags: ghcr.io/${{ github.repository }}:pr-${{ github.event.number }}
          build-args: |
            NODE_ENV=production
          cache-from: |
            type=registry,ref=ghcr.io/${{ github.repository }}:buildcache-manifest-${{ env.MANIFEST_SHA }}
          cache-to: |
            type=registry,mode=max,ref=ghcr.io/${{ github.repository }}:buildcache-manifest-${{ env.MANIFEST_SHA }}

          # BuildKit settings (helps cache determinism)
          provenance: false
          sbom: false

What happens when this workflow runs

Checkout gets the repo.
A manifest is generated:
- It hashes Dockerfile, package-lock.json, and the build args.
- It writes dockerfile-rebuild-manifest.json.
- It exports MANIFEST_SHA.
docker/build-push-action runs BuildKit with cache storage in the registry:
- cache-from points at a cache namespace named after the manifest hash.
- cache-to writes the cache into that same namespace.

That means the cache key is no longer a vague “whatever BuildKit decides.” It’s explicitly derived from the rebuild inputs that matter.

Step 3: Verify it’s working (and see rebuild behavior)

I like checking two things:

The cache namespace changes only when it should
- Editing src/index.js should not change the cache key if I’m only hashing package-lock.json.
- Updating package-lock.json should change the cache key.
Build logs mention cache reuse
- When cache hits occur, BuildKit will say steps are CACHED.
- When cache misses occur, the steps re-run.

Example: change only application source

Edit src/index.js
Re-run the workflow
The manifest hash stays the same (because it’s derived from Dockerfile + package-lock + build args)
Build steps that depend on dependencies should reuse cache

Example: change `package-lock.json`

Re-run the workflow
The manifest hash changes
The cache namespace changes
Dependency install step rebuilds predictably

Why registry-backed cache namespaces matter

One reason I liked this pattern is that it isolates cache histories.

If you use a single shared cache tag like:

type=registry,ref=ghcr.io/org/app:buildcache

then cache content can reflect a long history of builds with different inputs. That makes “why did it rebuild?” harder.

By using buildcache-manifest-${MANIFEST_SHA}, I get a direct mapping:

manifest → cache namespace
cache namespace → “the build inputs that produced it”

This is a big deal in mission-critical platform engineering, where I want CI behavior to be stable and auditable.

FinOps angle: fewer surprise rebuilds

CI rebuilds waste compute minutes and registry IO. This approach reduces “accidental invalidation” and makes cache reuse more consistent.

Even though it doesn’t guarantee perfect cost savings in all cases, in my experience it:

reduces dependency reinstall churn
lowers the amount of data pushed/pulled
makes cache invalidation intentional, not incidental

Complete minimal script + Dockerfile recap

scripts/dockerfile-manifest.mjs produces dockerfile-rebuild-manifest.json and prints manifestSha256
Workflow stores BuildKit cache in a registry namespace keyed by manifestSha256
Cache reuse becomes predictable because the key is deterministic

Conclusion

I built a CI/CD pattern where caching is driven by a deterministic rebuild manifest hash instead of implicit BuildKit heuristics. By hashing the Dockerfile content, selected context decision files (like package-lock.json), and build args, then using that hash to namespace BuildKit registry cache, I made Docker rebuild behavior stable and explainable—leading to fewer surprise rebuilds and more reliable pipeline performance.