Skip to content

[Audit] #25 reviewed-code artifact identity #126

@AlexKantor87

Description

@AlexKantor87

Artifact Identity

  • Artifact name: reviewed-code
  • Flow: {repo}-{issue_id}-CodeReview (one per GitHub issue)
  • Level: Artifact-level (all 80 agent review attestations bind to this artifact)
  • Kosli type: dir (source directory fingerprint via kosli fingerprint . --artifact-type dir)
  • Schema: Defined in kosli/flows/code-review-template.yml under trail.artifacts[0]
  • Algorithm: Kosli dir fingerprinting — recursive SHA256 of file contents, respecting .kosli_ignore exclusion patterns. Deterministic: same files produce same hash regardless of machine.
  • Instances per trail: 1 per trail (recomputed after each resolver loop; new trail = new fingerprint)

Control Objective

  • Risk mitigated: Evidence-to-code binding — ensures all 80 review attestations (5 personas x 2 models x 8 steps) are cryptographically bound to the exact source code that was reviewed, not some other version.
  • Auditor question: "Can you prove the code that was reviewed is the same code that was built and shipped? Can you prove review evidence was not re-bound to different code after the fact?"
  • Regulatory mapping: SOC2 CC8.1 (change management integrity), ISO 27001 A.14.2.2 (system change control), FDA 21 CFR Part 11 (electronic records integrity), DORA Art. 9 (ICT change management)

Evidence Specification

Field Type Source Required Validated by
fingerprint string (SHA256 hex) kosli fingerprint . --artifact-type dir Yes Kosli CLI + server-side
artifact_name string Hardcoded "reviewed-code" in report_artifact() Yes Flow template schema
artifact_type string Hardcoded "dir" Yes Flow template schema
artifact_path string Hardcoded "." (repo root) Yes Kosli CLI
.kosli_ignore file Checked into repo root Yes (implicit) Git version control
commit_sha string git rev-parse HEAD No (metadata) Git
  • Evidence producer: Two independent producers compute the same fingerprint:
    1. Review flow: compute_source_fingerprint() in ci_gate.py — uses git archive HEAD to extract clean copy, then kosli fingerprint <tmpdir> --artifact-type dir
    2. Build flow: CI workflow step Compute source directory fingerprint — runs kosli fingerprint . --artifact-type dir on a clean actions/checkout@v4 working directory
  • Producer trust level: MEDIUM — both run in environments controlled by GitHub Actions runners. Review flow has a fallback path that fingerprints the working directory directly (lower trust). Build flow always operates on a clean checkout (higher trust).
  • Tamper resistance: MEDIUM — fingerprint is computed client-side by the Kosli CLI and reported to the Kosli server. The .kosli_ignore file is attacker-modifiable (checked into git). Once reported to Kosli, the artifact record is immutable server-side.

Fingerprint Determinism

  • Deterministic when: Both review runner and CI runner operate on the same committed files at the same git commit, with the same .kosli_ignore exclusions. The git archive HEAD primary path in the review runner ensures only tracked files are included.
  • Non-deterministic when: (1) Fallback path is triggered — fingerprints working directory which may include uncommitted/untracked files not in .kosli_ignore. (2) .kosli_ignore patterns differ between review and build (impossible if same commit, but possible if file is modified between runs). (3) Resolver fixes code after fingerprint computation — the fingerprint was computed on pre-fix code but attestations bind to it anyway.
  • Algorithm: Kosli CLI dir fingerprinting — walks directory tree, excludes paths matching .kosli_ignore globs, computes SHA256 of each file's contents in sorted deterministic order, produces a single SHA256 digest. No timestamps, no metadata, no filesystem attributes — purely content-addressed.

Edge Cases & Failure Modes

  • Fingerprint cannot be computed: compute_source_fingerprint() returns empty string "". report_artifact() catches this and returns False, halting the pipeline. Logged as error: "source_fingerprint is empty". Verdict: correctly handled.
  • Fallback path activated: If git archive HEAD fails (e.g., not in a git repo, corrupted .git), the function falls back to fingerprinting the working directory directly. This may include: uncommitted changes, untracked files not in .kosli_ignore, build artifacts from prior CI steps (junit-results/, egg-info). Verdict: risk of fingerprint mismatch between review and build flows.
  • .kosli_ignore as bypass vector: An attacker who can modify .kosli_ignore can exclude files from the fingerprint (e.g., exclude a backdoored module). The file is version-controlled, so changes are visible in git history, but the fingerprint itself does not cover .kosli_ignore changes in a self-referential way. Verdict: medium risk, mitigated by code review of .kosli_ignore changes.
  • TOCTOU gap — resolver fixes: The fingerprint is computed by run_ci_checks() at the end of the CI gate. If the resolver then modifies code and commits, the fingerprint bound to the trail no longer matches the actual reviewed code. A new trail/fingerprint is created for the next loop, but the old trail's fingerprint is stale. Verdict: acceptable if each loop creates a fresh trail with a fresh fingerprint. Risk if attestations from loop N reference the fingerprint from loop N-1.
  • False positive risk: Low — a matching fingerprint genuinely means same file contents.
  • False negative risk: Medium — two semantically identical codebases with different file content (whitespace, comments) produce different fingerprints. This is by design (content-addressed) but means a benign reformatting breaks the cross-flow match.
  • Cross-flow fingerprint mismatch: The review flow uses git archive HEAD + kosli fingerprint <tmpdir>, while the build flow uses kosli fingerprint . on a clean checkout. If .kosli_ignore is correctly applied in both cases, these should match. However, the review flow fingerprints a tmpdir (no .kosli_ignore file present unless it was in the archive), while CI fingerprints . (where .kosli_ignore is present). This is a potential determinism bug — the .kosli_ignore file must be in the archived tmpdir for Kosli CLI to find it.

Dependencies

  • Upstream: compute_source_fingerprint() in ci_gate.py (review flow), kosli fingerprint . step in ci.yml (build flow), .kosli_ignore exclusion patterns, git archive HEAD for clean export
  • Downstream: All 80 artifact-level attestations (5 personas x 2 models x 8 review steps) bind to this fingerprint. report_artifact() in kosli_trail.py registers the artifact. Build flow control evaluations compare review fingerprint to build fingerprint.
  • Cross-flow: This artifact is the bridge between the review flow and the build flow. The build flow's code-review-control job fetches review trail attestations and verifies they are bound to the same source fingerprint. If fingerprints do not match, the control fails.

Assessment

  • Implementation match: GOOD — The code faithfully implements the design intent. The git archive HEAD approach is sound for excluding untracked files. The flow template correctly defines reviewed-code as a dir artifact with 80 bound attestation slots.
  • Evidence sufficiency: ADEQUATE with caveats — The fingerprint provides strong content-addressed identity. However, the evidence does not capture: which .kosli_ignore version was used, whether the primary or fallback path was taken, or the git commit SHA as a first-class field on the artifact report.
  • Gaps:
    1. .kosli_ignore in tmpdir: The git archive extracts to a tmpdir, but kosli fingerprint is called on the tmpdir path. The .kosli_ignore file IS included in the git archive (it is tracked), so Kosli CLI should find it. However, this is not explicitly verified or logged.
    2. Fallback path silent degradation: The fallback from git archive to working-dir fingerprinting logs a warning but does not mark the attestation as lower-confidence. Downstream consumers cannot distinguish "clean export fingerprint" from "dirty working dir fingerprint".
    3. No commit SHA on artifact: report_artifact() does not pass --commit to the Kosli CLI. The build flow does (--commit "$COMMIT_SHA"). This means the review-flow artifact lacks git provenance metadata.
    4. Resolver TOCTOU: Fingerprint is computed before resolver fixes. Each loop creates a new trail, but the timing within a single loop means the fingerprint may not reflect the final state of the code in that loop.
  • Recommendations:
    1. Log whether primary (git archive) or fallback (working dir) path was used, and include this in artifact metadata.
    2. Add --commit flag to report_artifact() so the review-flow artifact has git provenance.
    3. Consider asserting that .kosli_ignore exists in the tmpdir before fingerprinting.
    4. Add an integration test that verifies git archive + kosli fingerprint <tmpdir> == kosli fingerprint . on a clean checkout of the same commit.
    5. Document that .kosli_ignore changes require the same review scrutiny as production code changes.
  • Verdict: SOUND WITH KNOWN LIMITATIONS — The artifact identity model is well-designed and correctly implements deterministic source fingerprinting. The git archive approach is the right call for the review runner. The cross-flow integrity check (review fingerprint == build fingerprint) is the key value proposition. The main risks are the silent fallback path and the .kosli_ignore trust boundary. These are addressable without architectural changes.

Metadata

Metadata

Assignees

No one assigned

    Labels

    attestation-auditAttestation evidence audit reviewgovernanceGovernance attestationpriority:highCompliance-critical controlreview-flowReview flow attestation

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions