Reproducibility: How Was It Built?
Attestation proves what's running. But a hash is just a number until you can answer: where did that binary come from, and can anyone independently verify it?
This is the R dimension of the KRAB model. It measures reproducibility — how much of the software stack can be traced from source code to deployed binary, and at what level of trust.
The spectrum
R is scored per-layer. Every component in the stack — firmware, OS, libraries, application — gets its own R grade:
| Level | Name | What it means |
|---|---|---|
| R0 | Opaque | No source, no build instructions. The binary is a black box. You trust whoever built it — completely. |
| R1 | Source Available | Source is published and builds are documented. You can audit the code, but you cannot prove the deployed binary was built from it. |
| R2 | Maintainer-Signed | A maintainer cryptographically asserts the binary was built from the published source. Trust shifts to the maintainer's key. |
| R2+ | Threshold Multi-Party Signed | Binary signed by M-of-N independent maintainers. All M must collude to forge the claim. Correspondence remains asserted, not independently verifiable, but collusion resistance is qualitatively stronger than single-key R2. |
| R3 | Provenance-Verified | Signed build provenance from a CI/CD pipeline (e.g., SLSA). The build process is auditable, but the CI system is now in your trust chain. |
| R4 | Deterministic / Reproducible | Anyone can rebuild from source and get the identical hash. No trust in any builder, maintainer, or pipeline required. |
Each level is useful. Each level has limits.
The trust chain shifts, it doesn't vanish
R0 means blind trust. You run the binary because someone gave it to you. This is where most CSP firmware sits today.
R1 gives you auditability. You can read the code. But "the source looks fine" and "the binary matches the source" are different statements. Azure's OpenHCL paravisor is R1: the source is public, but the production builds are not reproducible.
R2 adds a cryptographic claim: someone with a signing key asserts the binary matches the source. This is better — but if the key is compromised, the claim collapses back to R1. R2+ extends this with M-of-N threshold signing: all M maintainers must collude to forge the claim, making single-key compromise insufficient. The source-to-binary correspondence is still asserted rather than independently verifiable — R2+ is not R4 — but collusion resistance is qualitatively stronger.
R3 means you have signed build provenance — typically from a CI/CD system following the SLSA framework. SLSA records the source repo, commit hash, build environment, dependencies, and output hash. The chain is auditable, but the CI pipeline itself is now trusted infrastructure. If the pipeline is compromised, the provenance is worthless.
R4 is the gold standard: deterministic builds. Anyone can clone the source, run the build, and get the exact same binary hash. No maintainer keys, no CI trust, no "just believe me." AWS provides Nix-reproducible OVMF firmware at this level.
Where SLSA fits
SLSA (Supply-chain Levels for Software Artifacts) is the practical tool for R2-R3. It doesn't replace hardware attestation — it complements it.
What SLSA covers: your application binary. Source repo, commit, build environment, dependencies, output hash — all signed and auditable.
What SLSA doesn't cover: the OS kernel, firmware, paravisor, hypervisor, CPU, or TDX module. SLSA operates above the hardware trust boundary.
The powerful move is binding SLSA to hardware attestation: hash the SLSA provenance document and include it in the attestation quote's report_data field. Now the verification chain runs from source code, through the build pipeline, into the binary hash, through the hardware measurement, all the way to the silicon vendor's certificate chain. One unbroken line.
For most teams, SLSA 2-3 for the application layer is the practical target. It gives you auditable provenance without requiring deterministic builds for everything.
R is per-layer — and that's the point
Unlike Attestation (which is a single platform ceiling), Reproducibility is scored at every layer. This creates layered profiles:
| Component | R level | What it means |
|---|---|---|
| Application | R4 | Deterministic build, anyone can verify |
| OS image | R3 | SLSA provenance from CI |
| Firmware | R0 | CSP black box |
The system's R profile is the full column, not a single number. A perfectly reproducible application and libraries sitting on an opaque OS and firmware is written as R[f0/o0/l4/a4] — each slot showing exactly what can and cannot be verified.
Bridging the measurement gap
Hardware measures what is in memory at VM launch: firmware, the kernel, and the initial boot filesystem (initramfs). Everything loaded from disk after boot — your application, libraries, configuration — sits outside that initial measurement window. A malicious hypervisor could swap a disk image after launch without changing the attestation report at all. This is commonly called the measurement gap.
Two patterns close it:
initramfs packing — Bundle your entire application into the initial RAM filesystem that loads alongside the kernel at boot. Because initramfs is measured by hardware at launch, the application hash becomes part of the launch digest directly. Simple and airtight, but produces large images and means any update requires a full rebuild and re-measurement.
dm-verity — Compute a Merkle tree over your application filesystem image at build time, embed only the 32-byte root hash in the initramfs. At runtime, the kernel verifies every disk block against that tree as it is read — any tampering causes a read error. Because the root hash is inside the measured initramfs, the chain extends cleanly: silicon measurement → firmware → kernel + initramfs → root hash → application disk. This keeps images small and updateable while preserving end-to-end measurement.
In both cases the R-grade of the l and a layers describes build reproducibility — how verifiably the binary was produced. Whether the measurement chain actually reaches those layers depends on which pattern is used. A a4 score is only meaningful if an unbroken measurement chain runs from silicon all the way up through the application.
IGVM (Independent Guest Virtual Machine) is a standardized file format used by TDX and SEV-SNP tooling to specify exactly which bytes land at which guest physical addresses at launch. The launch digest depends on both content and placement, so IGVM enables consistent firmware measurements across different hypervisor implementations — without it, the same firmware bytes can produce different launch digests depending on memory layout. Note that IGVM addresses measurement consistency across hypervisors, not build reproducibility: f4 requires deterministic builds (same source → same binary), which is a separate property. AWS SEV-SNP achieves f4 via Nix-reproducible OVMF without using IGVM.
Verification gaps
An R[f0/o0/l4/a4] profile is common on public clouds. It means: "I can prove my application binary is exactly what I intended, but the OS and firmware beneath it? Opaque." On Azure TDX the firmware is source-available (f1), giving a slightly better R[f1/o0/l4/a4] — but still with a blind OS layer.
This isn't necessarily a failure. If your threat model trusts the CSP (and you've made that explicit in your Attestation ceiling), an R[f0/o0/l4/a4] profile with CSP trust explicitly documented is a coherent engineering choice. But the gap must be visible. The notation forces it into the open.
The worst outcome isn't a gap — it's a gap nobody knows about.
R2 shifts trust to the maintainer's key. R3 shifts trust to the CI/CD pipeline. Only R4 eliminates the build system as a trust dependency. Know what you're trusting at each level.
Start with R4 for your own application (Nix, Bazel, or Go's reproducible builds). Target R3 (SLSA provenance) for dependencies you don't control. Accept R0-R1 for CSP firmware only if your threat model explicitly trusts the provider — and write it down.