Air-Gapped Receipt Provenance With Merkleized Attachment Hash Chains
Written by
Cipher Stone
The problem I wanted to solve
I once had to prove that a printed, scanned receipt image hadn’t been altered between a vendor’s laptop and an offline audit workstation. The catch was nasty: the vendor’s machine was air-gapped (no network), and the audit workstation was also air-gapped. So there was no centralized ledger to “look it up later.”
What I needed was something I could stamp onto the receipt package itself: a compact, tamper-evident trail that can be verified later without contacting anyone.
That led me to a specific design: Merkleized attachment hash chains.
- Each attachment (like
receipt.png) gets hashed. - Those hashes are arranged in a Merkle tree (a Merkle tree is a hash-based structure where parent hashes depend on child hashes).
- The audit record also includes a chain of Merkle roots so multiple receipts processed at different times can be linked together.
Below is the exact workflow and working code I used.
The format I used: “Merkle root chain” metadata
For a batch of attachments, I compute:
attachment_hashes[i] = SHA256(file_bytes)- A Merkle tree over
attachment_hashes, producing a singlemerkle_root - A chained root:
- If this is batch
n, then
root_chain[n] = SHA256(root_chain[n-1] || merkle_root) - For the first batch,
root_chain[0] = SHA256(b"GENESIS" || merkle_root)
- If this is batch
Then I store everything in a small JSON file next to the attachments:
- list of attachment filenames and their hashes
- the Merkle root
- the chain root
This metadata can be verified later offline: recompute hashes → recompute Merkle root → validate chain link.
Working code (end to end)
1) Build the metadata for a folder of attachments
Save as provenance_build.py.
import os import json import hashlib from typing import List, Tuple def sha256(data: bytes) -> bytes: return hashlib.sha256(data).digest() def sha256_hex(data: bytes) -> str: return hashlib.sha256(data).hexdigest() def sha256_file(path: str) -> bytes: h = hashlib.sha256() with open(path, "rb") as f: for chunk in iter(lambda: f.read(1024 * 1024), b""): h.update(chunk) return h.digest() def merkle_parent(left: bytes, right: bytes) -> bytes: # Domain-separate the node type so it’s harder to mix things up accidentally. return sha256(b"MERKLE_NODE_V1" + left + right) def merkle_root(leaves: List[bytes]) -> bytes: if not leaves: raise ValueError("No leaves provided") level = leaves[:] # If a level has an odd number of nodes, duplicate the last one. while len(level) > 1: if len(level) % 2 == 1: level.append(level[-1]) next_level = [] for i in range(0, len(level), 2): next_level.append(merkle_parent(level[i], level[i + 1])) level = next_level return level[0] def load_previous_chain_root(metadata_path: str) -> bytes: if not os.path.exists(metadata_path): return sha256(b"GENESIS") # only used if the chain file doesn't exist with open(metadata_path, "r", encoding="utf-8") as f: prev = json.load(f) # Stored as hex string return bytes.fromhex(prev["root_chain_hex"]) def write_metadata( out_json_path: str, folder: str, prev_chain_root: bytes ) -> None: # Pick deterministic ordering so verification works cross-platform. files = sorted( [f for f in os.listdir(folder) if os.path.isfile(os.path.join(folder, f))] ) leaves: List[bytes] = [] attachments: List[dict] = [] for fname in files: path = os.path.join(folder, fname) digest = sha256_file(path) # 32 bytes leaves.append(digest) attachments.append({ "filename": fname, "sha256_hex": digest.hex() }) root = merkle_root(leaves) root_hex = root.hex() # Chain the roots to link batches in time. root_chain = sha256(b"ROOT_CHAIN_V1" + prev_chain_root + root) root_chain_hex = root_chain.hex() metadata = { "format": "merkle-root-chain", "version": "1", "folder": os.path.abspath(folder), "merkle_root_hex": root_hex, "root_chain_hex": root_chain_hex, "prev_root_chain_hex": prev_chain_root.hex(), "attachments": attachments } with open(out_json_path, "w", encoding="utf-8") as f: json.dump(metadata, f, indent=2) def main(): import argparse p = argparse.ArgumentParser() p.add_argument("--folder", required=True, help="Folder containing attachments") p.add_argument("--out", required=True, help="Output metadata JSON path") p.add_argument("--prev", required=True, help="Previous metadata JSON path") args = p.parse_args() prev_chain_root = load_previous_chain_root(args.prev) write_metadata(args.out, args.folder, prev_chain_root) print(f"Wrote {args.out}") if __name__ == "__main__": main()
What this script does (in plain terms)
- It hashes each file’s bytes using SHA-256.
- It builds a Merkle tree from those per-file hashes.
- It computes a single
merkle_root_hex. - It links this root to the previous batch’s chain root, producing
root_chain_hex. - It writes everything to a JSON file so later verification is offline.
2) Verify the metadata later (offline)
Save as provenance_verify.py.
import os import json import hashlib from typing import List def sha256(data: bytes) -> bytes: return hashlib.sha256(data).digest() def merkle_parent(left: bytes, right: bytes) -> bytes: return sha256(b"MERKLE_NODE_V1" + left + right) def merkle_root(leaves: List[bytes]) -> bytes: if not leaves: raise ValueError("No leaves provided") level = leaves[:] while len(level) > 1: if len(level) % 2 == 1: level.append(level[-1]) next_level = [] for i in range(0, len(level), 2): next_level.append(merkle_parent(level[i], level[i + 1])) level = next_level return level[0] def sha256_file(path: str) -> bytes: h = hashlib.sha256() with open(path, "rb") as f: for chunk in iter(lambda: f.read(1024 * 1024), b""): h.update(chunk) return h.digest() def verify_metadata(metadata_path: str, attachments_folder: str, prev_metadata_path: str) -> None: with open(metadata_path, "r", encoding="utf-8") as f: meta = json.load(f) with open(prev_metadata_path, "r", encoding="utf-8") as f: prev = json.load(f) prev_chain_root = bytes.fromhex(prev["root_chain_hex"]) merkle_expected = bytes.fromhex(meta["merkle_root_hex"]) chain_expected_prev_hex = meta["prev_root_chain_hex"] chain_expected_prev = bytes.fromhex(chain_expected_prev_hex) # 1) Chain link check: metadata must claim the correct prev hash. if chain_expected_prev != prev_chain_root: raise ValueError("Chain link mismatch: prev_root_chain_hex does not match previous metadata") # 2) Recompute attachment hashes from disk and match per-file claims. leaves: List[bytes] = [] for att in meta["attachments"]: fname = att["filename"] expected_hex = att["sha256_hex"] path = os.path.join(attachments_folder, fname) if not os.path.exists(path): raise FileNotFoundError(f"Missing attachment: {path}") digest = sha256_file(path) if digest.hex() != expected_hex: raise ValueError(f"Attachment hash mismatch for {fname}") leaves.append(digest) # 3) Recompute Merkle root from the same leaf list. merkle_actual = merkle_root(leaves) if merkle_actual != merkle_expected: raise ValueError("Merkle root mismatch: attachments changed or ordering differs") # 4) Recompute root chain and compare. root_chain_actual = sha256(b"ROOT_CHAIN_V1" + prev_chain_root + merkle_actual) if root_chain_actual.hex() != meta["root_chain_hex"]: raise ValueError("Root chain mismatch") print("Verification OK: attachments intact, Merkle root correct, chain link valid.") def main(): import argparse p = argparse.ArgumentParser() p.add_argument("--meta", required=True, help="metadata JSON to verify") p.add_argument("--folder", required=True, help="folder containing attachments") p.add_argument("--prev-meta", required=True, help="previous metadata JSON used for chain link") args = p.parse_args() verify_metadata(args.meta, args.folder, args.prev_meta) if __name__ == "__main__": main()
What this verifier checks
- Per-file hashes match what was recorded.
- The Merkle root matches what was recorded.
- The batch chain link matches the previous batch’s recorded root.
No network calls. Nothing external.
A small demo you can reproduce locally
1) Create two “receipt batches”
mkdir -p batch1 batch2 # Example files (pretend these are scanned images or PDFs) echo "VENDOR=A;TOTAL=19.99;DATE=2026-04-01" > batch1/receipt.txt echo "Signed note for batch 1" > batch1/notes.txt echo "VENDOR=A;TOTAL=42.00;DATE=2026-04-02" > batch2/receipt.txt echo "Signed note for batch 2" > batch2/notes.txt
2) Build metadata for the first batch
python3 provenance_build.py --folder batch1 --out batch1/meta1.json --prev /tmp/does_not_exist_prev.json
Since --prev doesn’t exist, the script uses a genesis-based previous chain root.
3) Build metadata for the second batch
python3 provenance_build.py --folder batch2 --out batch2/meta2.json --prev batch1/meta1.json
4) Verify both batches later
python3 provenance_verify.py --meta batch1/meta1.json --folder batch1 --prev-meta /tmp/does_not_exist_prev.json python3 provenance_verify.py --meta batch2/meta2.json --folder batch2 --prev-meta batch1/meta1.json
To simulate tampering, edit batch2/receipt.txt and rerun verification. You’ll get a clear failure at the attachment hash stage or Merkle root stage—whichever comes first.
Why Merkle + chaining works well for air-gapped provenance
I found this combination especially practical:
- Merkle tree gives you a single compact fingerprint (
merkle_root_hex) for many attachments. - Hash chain of roots ties successive batches together so later records can’t be “replayed” without breaking the link.
- Offline verification is just recomputation: hashes in, proofs out.
This is a digital provenance mechanism that doesn’t require a blockchain, smart contracts, or any online services—yet it has the core property you want: any content change becomes detectable with high confidence.
Summary
I built an air-gapped digital provenance scheme for receipt packages using Merkleized attachment hashes and a chained Merkle root (root_chain_hex). The workflow is straightforward: hash each attachment, compute a Merkle root, chain it to the previous batch’s root, and write compact JSON metadata. Later, verification is deterministic and offline: recompute attachment hashes → recompute Merkle root → validate the chain link, giving strong tamper evidence for offline digital provenance.