-
Notifications
You must be signed in to change notification settings - Fork 1
[Audit] #26 coding-agent-completed #127
Description
Attestation Identity
| Field | Value |
|---|---|
| Slot name | coding-agent-completed |
| Flow | Build (e.g. agentic-sdlc-demo-GH{issue}-Build) |
| Level | Trail-level |
| Kosli type | custom:coding-agent-result |
| Schema | kosli/attestation-types/schemas/coding-agent-result.json |
| jq evaluator rules | .success == true, .budget_exceeded == false, (.commit_sha | length) >= 7, (.files_changed | length) > 0 |
| Instances per trail | 1 |
Control Objective
Risk Mitigated
In a regulated financial services context, the absence of this control exposes the organization to:
- Unverified autonomous code entering the build pipeline. Without attesting the coding agent's result, there is no gate confirming that the AI-generated implementation actually compiles, passes lint, and passes tests before downstream CI and review begin. A broken or budget-blown agent run could silently propagate into the build trail.
- Unbounded agent cost exposure. An autonomous coding agent operating without a budget check could consume unlimited API credits. The
budget_exceeded == falserule prevents a runaway agent from being treated as successful, enforcing fiscal discipline on AI compute. - Phantom or empty changes. The
(.files_changed | length) > 0rule ensures the agent actually produced code modifications. Without it, a no-op agent run (e.g., due to prompt confusion or model refusal) could be silently marked as successful, creating a trail with no meaningful artifact. - Unattributed code provenance. The
(.commit_sha | length) >= 7rule binds the attestation to a specific Git commit, establishing chain of custody from AI-generated code to the build artifact. Without it, the trail cannot prove which code was built.
Auditor Question
"Can you demonstrate, with structured evidence, that every AI-generated code change entering your build pipeline was verified to pass lint and tests, stayed within its cost budget, produced identifiable file modifications, and is bound to a specific Git commit — all before any downstream CI or review activities began?"
Regulatory Mapping
| Framework | Control Reference | Relevance |
|---|---|---|
| SOC 2 | CC6.1 — Logical and Physical Access Controls | Ensures only verified, successful agent output enters build pipeline |
| SOC 2 | CC8.1 — Change Management | Structured evidence that AI-generated changes were validated before build |
| SOC 2 | CC6.8 — Unauthorized Software Prevention | Budget check prevents runaway agent from injecting unlimited code iterations |
| ISO 27001 | A.8.25 — Secure Development Lifecycle | Validates AI coding agent output before it enters the review/build lifecycle |
| ISO 27001 | A.8.32 — Change Management | Commit SHA binding provides change traceability |
| NIST CSF | PR.DS-6 — Integrity checking mechanisms | Commit SHA and file list establish integrity of agent output |
| NIST CSF | PR.IP-3 — Configuration change control | Agent result attestation is a change control checkpoint |
| NIST CSF | DE.CM-4 — Malicious code detection | Budget and success checks detect anomalous agent behavior |
| PCI DSS | 6.5.6 — Secure coding practices | Lint and test validation of AI-generated code |
Evidence Specification
Attestation Payload Fields
| Field | Type | Source | Required | Validated by |
|---|---|---|---|---|
success |
boolean | coding-result-GH{N}.json → .success |
Yes (schema + jq) | jq: .success == true |
issue_number |
integer | coding-result-GH{N}.json → .issue_number or $ISSUE_NUMBER env |
Yes (schema) | Schema: minimum: 1 |
issue_id |
string | $ISSUE_ID env var (e.g. GH123) |
Yes (schema) | Schema: minLength: 1 |
branch |
string | coding-result-GH{N}.json → .branch |
Yes (schema) | Schema: minLength: 1 |
commit_sha |
string | coding-result-GH{N}.json → .commit_sha |
Yes (schema + jq) | jq: (.commit_sha | length) >= 7; Schema: minLength: 7 |
files_changed |
array[string] | coding-result-GH{N}.json → .files_changed |
Yes (schema + jq) | jq: (.files_changed | length) > 0 |
lint_passed |
boolean | coding-result-GH{N}.json → .lint_passed |
Yes (schema) | Schema only (not in jq rules) |
tests_passed |
boolean | coding-result-GH{N}.json → .tests_passed |
Yes (schema) | Schema only (not in jq rules) |
claude_turns |
integer | coding-result-GH{N}.json → .claude_turns |
No | Schema: minimum: 0 |
claude_cost_usd |
number | coding-result-GH{N}.json → .claude_cost |
No | Schema: minimum: 0 |
max_cost_budget_usd |
number | coding-result-GH{N}.json → .max_cost_budget (default 15.0) |
No | Schema: minimum: 0 |
budget_exceeded |
boolean | coding-result-GH{N}.json → .budget_exceeded |
Yes (schema + jq) | jq: .budget_exceeded == false |
duration_ms |
integer | coding-result-GH{N}.json → .duration_ms |
No | Schema: minimum: 0 |
retries |
integer | coding-result-GH{N}.json → .retries |
No | Schema: minimum: 0 |
error |
string | coding-result-GH{N}.json → error message if failed |
No | Schema only |
timestamp |
string (ISO 8601) | datetime.now(timezone.utc).isoformat() at attestation time |
Yes (schema) | Schema: format: date-time |
Producer Metadata
- Evidence producer: GitHub Actions runner (
ubuntu-latest),finalizejob in.github/workflows/agentic-code.yml, step "Attest coding-agent-completed" (lines 375-431) - Producer trust level: Ephemeral CI runner. The payload is assembled from
coding-result-GH{N}.jsonwhich is an artifact uploaded by thecodejob and downloaded viaactions/download-artifact@v4. The Python inline script reads the JSON and constructs the payload. Kosli CLI installed viakosli-dev/setup-cli-action@v2with pinned version. - Tamper resistance: The coding result JSON is produced by the
codejob on a separate runner, uploaded as a GitHub Actions artifact, and downloaded in thefinalizejob. Cross-job artifact transfer provides some isolation but relies on GitHub Actions artifact integrity. The attestation is immutable once written to Kosli.
Compliance Logic
Compliant when
ALL four jq rules return true (AND-gated):
.success == true— The coding agent completed successfully (lint + tests passed after agent finished).budget_exceeded == false— The agent did not exceed its cost budget (CODING_MAX_COST, default $15.00)(.commit_sha | length) >= 7— A valid Git commit SHA is present (at least 7 characters, the minimum for a short SHA)(.files_changed | length) > 0— The agent actually modified at least one file
Additionally, the JSON schema enforces:
- All required fields are present (
success,issue_number,issue_id,branch,commit_sha,files_changed,lint_passed,tests_passed,budget_exceeded,timestamp) - Type constraints (booleans are booleans, integers have minimums, strings have minimum lengths)
Non-compliant when
- Agent failed (success=false) — lint or tests did not pass after retries
- Agent exceeded budget (budget_exceeded=true) — cost overrun
- No commit SHA or SHA too short — agent did not produce a valid commit
- No files changed — agent ran but made no modifications
- Missing required fields — schema validation fails
- Wrong field types — schema validation fails
Threshold source
- jq rules: Defined in
kosli/attestation-types/setup.shlines 271-278 - Schema:
kosli/attestation-types/schemas/coding-agent-result.json - Budget limit:
CODING_MAX_COSTenv var in workflow (default15.0), checked by the coding agent itself - No Rego policy: This is a direct custom type attestation, not a Rego-evaluated control
Edge Cases & Failure Modes
Check Cannot Run
Coding result file missing: If coding-result-GH{N}.json does not exist, the attestation step prints a warning and exits with code 0 (line 384-387). The attestation slot remains unfilled, making the trail non-compliant. This is safe (fail-open on attestation, fail-closed on trail) but produces no diagnostic error in the Kosli trail — the slot is simply absent.
Artifact download fails: actions/download-artifact@v4 has continue-on-error: true (line 373). If the code job did not produce a result file (crash, timeout), the download silently fails, and the missing-file guard above triggers.
Kosli API unavailable: The kosli attest custom call (lines 423-431) has no || true wrapper. If Kosli is down, the step fails hard, which is correct behavior — the trail cannot record the attestation without Kosli.
Trail not begun: If the "Create Kosli build flow and begin trail" step (lines 335-365) failed, KOSLI_BUILD_FLOW and KOSLI_BUILD_TRAIL env vars are empty. The kosli attest custom call will fail with an API error.
Partial Evidence
If the coding agent crashes mid-run and writes a partial coding-result-GH{N}.json (e.g., missing files_changed), the Python inline script will default missing fields to empty values ([] for files_changed, False for success). The jq rules will then correctly mark it non-compliant because (.files_changed | length) > 0 will fail.
If the agent succeeds at coding but the result JSON has success: true while lint_passed: false (inconsistent state), the jq rules will still pass because they only check .success, not .lint_passed directly. This is a gap — see Assessment.
Bypass Vectors
| Vector | Severity | Details |
|---|---|---|
| Fabricated result JSON | HIGH | The coding agent writes coding-result-GH{N}.json itself. A compromised agent or modified scripts.coding.agent module could write success: true regardless of actual lint/test results. The jq rules trust the self-reported payload. |
| Shared KOSLI_API_TOKEN | HIGH | Anyone with repo write access can use the token to attest arbitrary payloads to any flow/trail. No flow-scoped RBAC prevents a malicious actor from directly calling kosli attest custom with a forged payload. |
| Artifact upload/download substitution | MEDIUM | GitHub Actions artifacts are scoped to the workflow run, but a compromised code job could upload a falsified result file. The finalize job has no way to verify the authenticity of the downloaded artifact beyond trusting the Actions runtime. |
| **` | true` on agent invocation** | |
| Timestamp is attestation-time, not agent-time | LOW | The timestamp field is set when the payload is assembled in finalize, not when the agent actually ran. There could be a significant time gap. |
lint_passed and tests_passed not in jq rules |
MEDIUM | These fields are in the schema but not enforced by jq. An agent could report success: true, lint_passed: false, tests_passed: false and the attestation would be compliant. |
False Positive Risk
- Self-reported success: The agent marks itself successful. If the agent's internal lint/test verification is buggy or incomplete, the attestation may be compliant despite real failures. The downstream
lint-controlandunit-test-controlattestations (on the artifact) partially mitigate this by re-running checks in CI.
False Negative Risk
- Transient test flakiness: If the agent's local test run encounters a flaky test failure,
successwill be false and the attestation non-compliant, even though the code is correct. The retry mechanism (retriesfield) partially mitigates this. - Budget threshold too low: If
CODING_MAX_COSTis set too aggressively for complex issues, legitimate work may be flagged as budget-exceeded.
TOCTOU Gaps
- Code changes after agent completes: The
codejob pushes the feature branch (line 100-105) after the coding agent runs. Thecommit_shain the result file refers to the agent's last commit. If any subsequent step in thecodejob modifies files (none currently do), the attested SHA would be stale. - Review job may push fixes: The
reviewjob's resolver can push additional commits (line 228-229). Thecoding-agent-completedattestation records the pre-review SHA. This is by design — the coding agent's result is attested separately from review-time fixes.
Single Points of Failure
coding-result-GH{N}.json— The entire attestation depends on this single file. If it is corrupted, truncated, or absent, no attestation is recorded.- Self-reported evidence — The coding agent is both the executor and the evidence producer. No independent verification of lint/test results at attestation time.
- GitHub Actions artifact transfer — Cross-job evidence depends on the Actions artifact system. No checksum or signature verification on download.
Dependencies
Upstream
| Dependency | Type | Provides |
|---|---|---|
code job |
CI job | coding-result-GH{N}.json artifact, branch, success, issue_id outputs |
review job |
CI job | verdict output (must be ACCEPTED for attestation to run) |
scripts.coding.agent |
Python module | Produces the coding result JSON |
actions/upload-artifact@v4 |
GitHub Action | Transfers result file from code to finalize job |
actions/download-artifact@v4 |
GitHub Action | Retrieves result file in finalize job |
| Kosli build flow + trail | Kosli | Must be created before attestation (step "Create Kosli build flow and begin trail") |
KOSLI_API_TOKEN |
Secret | Authentication for Kosli API |
ISSUE_NUMBER, ISSUE_ID |
Env vars | Issue identification for file naming and payload |
Downstream
| Consumer | Impact of Failure |
|---|---|
| Kosli trail compliance | Unfilled slot → trail NON-COMPLIANT (entire build trail blocked) |
| CI workflow dispatch | Triggered after this attestation (line 433-454); runs regardless of attestation outcome |
| Artifact-level attestations | Not directly dependent, but share the same trail |
code-review-control |
Indirectly depends on coding agent having produced a valid commit |
Cross-Flow Bridge
- Code job (same workflow) produces
coding-result-GH{N}.jsonwith agent outcome, cost, and commit SHA - Finalize job downloads the result, constructs a schema-compliant payload, and attests to the Build flow trail
- Build trail requires this slot filled and compliant before the trail can be marked compliant
- The
commit_shain this attestation is the same commit that the review flow reviewed and that CI will build — establishing provenance linkage across flows
Assessment
Implementation Match: PARTIAL
The attestation correctly captures the coding agent's outcome and enforces the four key invariants (success, budget, commit, files). The schema is well-designed with appropriate types and constraints. However, the implementation has a structural weakness: lint_passed and tests_passed are required schema fields but are NOT enforced by jq rules. The success field is a self-reported composite that is supposed to reflect lint+test outcomes, but there is no independent verification at attestation time.
Evidence Sufficiency: ADEQUATE WITH GAPS
The evidence is sufficient for most audit scenarios — the payload captures what the agent did, how much it cost, and what it produced. Gaps include:
- No independent verification of self-reported fields (success, lint_passed, tests_passed)
- Timestamp is attestation-time, not agent-completion-time (temporal accuracy gap)
- No model identity in jq rules (which Claude model version produced the code)
- No diff size or complexity metrics (how much code was generated)
- Cost fields (
claude_cost_usd,max_cost_budget_usd) are optional in schema despitebudget_exceededbeing required — allows attesting budget compliance without disclosing the actual cost
Gaps
- HIGH: Self-reported success is the sole compliance signal. The jq rules check
.success == truebut this value is written by the coding agent itself. A buggy or adversarial agent can writesuccess: trueregardless of actual lint/test outcomes. Mitigation: downstream CI re-runs lint and tests, but by that point the attestation is already compliant. - HIGH:
lint_passedandtests_passednot enforced by jq rules. These are required schema fields but have no compliance impact. An attestation withsuccess: true, lint_passed: falsewould be compliant. Add jq rules:.lint_passed == trueand.tests_passed == true. - HIGH: Shared KOSLI_API_TOKEN allows forged attestations. Any workflow or actor with the token can attest arbitrary payloads. No flow-scoped or trail-scoped access control.
- MEDIUM: Silent exit on missing result file. The step exits 0 when the result file is missing (line 384-387), meaning the workflow succeeds but the attestation is simply absent. While the trail remains non-compliant, there is no explicit failure signal in the CI run.
- MEDIUM:
|| trueon agent invocation masks crashes. The coding agent can crash and still produce a partial result that gets attested. Remove|| trueor validate result file completeness. - LOW: Timestamp mismatch. The attestation timestamp is set at payload construction time, not agent completion time. Add
agent_completed_atto the payload. - LOW: No model version in jq rules. The schema does not require
modelormodel_versionfields. The workflow hardcodesCODING_MODEL: claude-sonnet-4-20250514but this is not recorded in the attestation.
Recommendations
- Add jq rules for
lint_passedandtests_passed:.lint_passed == trueand.tests_passed == true— these fields are already required by schema but have no compliance enforcement. - Add result file validation: Before constructing the payload, validate that the result JSON contains all required fields with non-default values. Fail the step (non-zero exit) if validation fails.
- Investigate flow-scoped KOSLI_API_TOKEN or separate tokens for different flows to prevent cross-flow attestation forgery.
- Remove
|| truefrom agent invocation or add explicit post-run validation that the result file is complete and internally consistent. - Add
model_versionto required schema fields and jq rules to establish model provenance. - Record actual cost values as required fields (not optional) to support cost auditing.
- Add a checksum to the uploaded artifact to detect tampering during cross-job transfer.
Verdict: NEEDS IMPROVEMENT
The attestation covers the right conceptual ground — it gates on agent success, budget compliance, commit provenance, and file modification. The schema design is solid. However, the self-reported nature of the evidence, the gap between schema requirements and jq enforcement (lint_passed/tests_passed), and the shared API token create meaningful compliance and security risks. The downstream CI checks partially compensate, but an auditor would flag the self-attestation pattern and the unenforced schema fields as control weaknesses requiring remediation.