-
Notifications
You must be signed in to change notification settings - Fork 1
[Audit] #25 reviewed-code artifact identity #126
Copy link
Copy link
Open
Labels
attestation-auditAttestation evidence audit reviewAttestation evidence audit reviewgovernanceGovernance attestationGovernance attestationpriority:highCompliance-critical controlCompliance-critical controlreview-flowReview flow attestationReview flow attestation
Description
Artifact Identity
- Artifact name:
reviewed-code - Flow:
{repo}-{issue_id}-CodeReview(one per GitHub issue) - Level: Artifact-level (all 80 agent review attestations bind to this artifact)
- Kosli type:
dir(source directory fingerprint viakosli fingerprint . --artifact-type dir) - Schema: Defined in
kosli/flows/code-review-template.ymlundertrail.artifacts[0] - Algorithm: Kosli dir fingerprinting — recursive SHA256 of file contents, respecting
.kosli_ignoreexclusion patterns. Deterministic: same files produce same hash regardless of machine. - Instances per trail: 1 per trail (recomputed after each resolver loop; new trail = new fingerprint)
Control Objective
- Risk mitigated: Evidence-to-code binding — ensures all 80 review attestations (5 personas x 2 models x 8 steps) are cryptographically bound to the exact source code that was reviewed, not some other version.
- Auditor question: "Can you prove the code that was reviewed is the same code that was built and shipped? Can you prove review evidence was not re-bound to different code after the fact?"
- Regulatory mapping: SOC2 CC8.1 (change management integrity), ISO 27001 A.14.2.2 (system change control), FDA 21 CFR Part 11 (electronic records integrity), DORA Art. 9 (ICT change management)
Evidence Specification
| Field | Type | Source | Required | Validated by |
|---|---|---|---|---|
fingerprint |
string (SHA256 hex) | kosli fingerprint . --artifact-type dir |
Yes | Kosli CLI + server-side |
artifact_name |
string | Hardcoded "reviewed-code" in report_artifact() |
Yes | Flow template schema |
artifact_type |
string | Hardcoded "dir" |
Yes | Flow template schema |
artifact_path |
string | Hardcoded "." (repo root) |
Yes | Kosli CLI |
.kosli_ignore |
file | Checked into repo root | Yes (implicit) | Git version control |
commit_sha |
string | git rev-parse HEAD |
No (metadata) | Git |
- Evidence producer: Two independent producers compute the same fingerprint:
- Review flow:
compute_source_fingerprint()inci_gate.py— usesgit archive HEADto extract clean copy, thenkosli fingerprint <tmpdir> --artifact-type dir - Build flow: CI workflow step
Compute source directory fingerprint— runskosli fingerprint . --artifact-type diron a cleanactions/checkout@v4working directory
- Review flow:
- Producer trust level: MEDIUM — both run in environments controlled by GitHub Actions runners. Review flow has a fallback path that fingerprints the working directory directly (lower trust). Build flow always operates on a clean checkout (higher trust).
- Tamper resistance: MEDIUM — fingerprint is computed client-side by the Kosli CLI and reported to the Kosli server. The
.kosli_ignorefile is attacker-modifiable (checked into git). Once reported to Kosli, the artifact record is immutable server-side.
Fingerprint Determinism
- Deterministic when: Both review runner and CI runner operate on the same committed files at the same git commit, with the same
.kosli_ignoreexclusions. Thegit archive HEADprimary path in the review runner ensures only tracked files are included. - Non-deterministic when: (1) Fallback path is triggered — fingerprints working directory which may include uncommitted/untracked files not in
.kosli_ignore. (2).kosli_ignorepatterns differ between review and build (impossible if same commit, but possible if file is modified between runs). (3) Resolver fixes code after fingerprint computation — the fingerprint was computed on pre-fix code but attestations bind to it anyway. - Algorithm: Kosli CLI
dirfingerprinting — walks directory tree, excludes paths matching.kosli_ignoreglobs, computes SHA256 of each file's contents in sorted deterministic order, produces a single SHA256 digest. No timestamps, no metadata, no filesystem attributes — purely content-addressed.
Edge Cases & Failure Modes
- Fingerprint cannot be computed:
compute_source_fingerprint()returns empty string"".report_artifact()catches this and returnsFalse, halting the pipeline. Logged as error: "source_fingerprint is empty". Verdict: correctly handled. - Fallback path activated: If
git archive HEADfails (e.g., not in a git repo, corrupted.git), the function falls back to fingerprinting the working directory directly. This may include: uncommitted changes, untracked files not in.kosli_ignore, build artifacts from prior CI steps (junit-results/, egg-info). Verdict: risk of fingerprint mismatch between review and build flows. .kosli_ignoreas bypass vector: An attacker who can modify.kosli_ignorecan exclude files from the fingerprint (e.g., exclude a backdoored module). The file is version-controlled, so changes are visible in git history, but the fingerprint itself does not cover.kosli_ignorechanges in a self-referential way. Verdict: medium risk, mitigated by code review of.kosli_ignorechanges.- TOCTOU gap — resolver fixes: The fingerprint is computed by
run_ci_checks()at the end of the CI gate. If the resolver then modifies code and commits, the fingerprint bound to the trail no longer matches the actual reviewed code. A new trail/fingerprint is created for the next loop, but the old trail's fingerprint is stale. Verdict: acceptable if each loop creates a fresh trail with a fresh fingerprint. Risk if attestations from loop N reference the fingerprint from loop N-1. - False positive risk: Low — a matching fingerprint genuinely means same file contents.
- False negative risk: Medium — two semantically identical codebases with different file content (whitespace, comments) produce different fingerprints. This is by design (content-addressed) but means a benign reformatting breaks the cross-flow match.
- Cross-flow fingerprint mismatch: The review flow uses
git archive HEAD+kosli fingerprint <tmpdir>, while the build flow useskosli fingerprint .on a clean checkout. If.kosli_ignoreis correctly applied in both cases, these should match. However, the review flow fingerprints a tmpdir (no.kosli_ignorefile present unless it was in the archive), while CI fingerprints.(where.kosli_ignoreis present). This is a potential determinism bug — the.kosli_ignorefile must be in the archived tmpdir for Kosli CLI to find it.
Dependencies
- Upstream:
compute_source_fingerprint()inci_gate.py(review flow),kosli fingerprint .step inci.yml(build flow),.kosli_ignoreexclusion patterns,git archive HEADfor clean export - Downstream: All 80 artifact-level attestations (5 personas x 2 models x 8 review steps) bind to this fingerprint.
report_artifact()inkosli_trail.pyregisters the artifact. Build flow control evaluations compare review fingerprint to build fingerprint. - Cross-flow: This artifact is the bridge between the review flow and the build flow. The build flow's
code-review-controljob fetches review trail attestations and verifies they are bound to the same source fingerprint. If fingerprints do not match, the control fails.
Assessment
- Implementation match: GOOD — The code faithfully implements the design intent. The
git archive HEADapproach is sound for excluding untracked files. The flow template correctly definesreviewed-codeas adirartifact with 80 bound attestation slots. - Evidence sufficiency: ADEQUATE with caveats — The fingerprint provides strong content-addressed identity. However, the evidence does not capture: which
.kosli_ignoreversion was used, whether the primary or fallback path was taken, or the git commit SHA as a first-class field on the artifact report. - Gaps:
.kosli_ignorein tmpdir: Thegit archiveextracts to a tmpdir, butkosli fingerprintis called on the tmpdir path. The.kosli_ignorefile IS included in the git archive (it is tracked), so Kosli CLI should find it. However, this is not explicitly verified or logged.- Fallback path silent degradation: The fallback from
git archiveto working-dir fingerprinting logs a warning but does not mark the attestation as lower-confidence. Downstream consumers cannot distinguish "clean export fingerprint" from "dirty working dir fingerprint". - No commit SHA on artifact:
report_artifact()does not pass--committo the Kosli CLI. The build flow does (--commit "$COMMIT_SHA"). This means the review-flow artifact lacks git provenance metadata. - Resolver TOCTOU: Fingerprint is computed before resolver fixes. Each loop creates a new trail, but the timing within a single loop means the fingerprint may not reflect the final state of the code in that loop.
- Recommendations:
- Log whether primary (
git archive) or fallback (working dir) path was used, and include this in artifact metadata. - Add
--commitflag toreport_artifact()so the review-flow artifact has git provenance. - Consider asserting that
.kosli_ignoreexists in the tmpdir before fingerprinting. - Add an integration test that verifies
git archive+kosli fingerprint <tmpdir>==kosli fingerprint .on a clean checkout of the same commit. - Document that
.kosli_ignorechanges require the same review scrutiny as production code changes.
- Log whether primary (
- Verdict: SOUND WITH KNOWN LIMITATIONS — The artifact identity model is well-designed and correctly implements deterministic source fingerprinting. The
git archiveapproach is the right call for the review runner. The cross-flow integrity check (review fingerprint == build fingerprint) is the key value proposition. The main risks are the silent fallback path and the.kosli_ignoretrust boundary. These are addressable without architectural changes.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
attestation-auditAttestation evidence audit reviewAttestation evidence audit reviewgovernanceGovernance attestationGovernance attestationpriority:highCompliance-critical controlCompliance-critical controlreview-flowReview flow attestationReview flow attestation