Skip to content

Commit dba2c28

Browse files
committed
docs: expand ci and dsse guidance
1 parent a7a3b25 commit dba2c28

File tree

5 files changed

+391
-0
lines changed

5 files changed

+391
-0
lines changed

README.md

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -187,6 +187,7 @@ The same process works against forks or sandboxes—helpful when validating new
187187
- The workflow helper collects the PR diff (`base_sha..head_sha`), submits it to `/v1/analysis`, polls `/v1/analysis/{id}`, and prints the enriched decision payload so reviewers can inspect risk summaries inline.
188188
- Consume `/v1/analysis/{id}/sarif` when you need static-analysis interoperability (e.g., uploading to GitHub code scanning or aggregating findings in other dashboards).
189189
- Surface decision bundles in CI by hitting `/v1/analysis/{id}/bundle` (e.g., attach the DSSE envelope as a build artifact) to preserve signed provenance for downstream policy checks.
190+
- See [docs/ci-integration.md](docs/ci-integration.md) for a comprehensive workflow guide, SARIF upload recipe, artifact archiving, and non-GitHub CI examples.
190191

191192
## Telemetry Export
192193

@@ -218,6 +219,12 @@ The same process works against forks or sandboxes—helpful when validating new
218219
- A lightweight synchronous client lives in `clients/python`; use `ProvenanceClient` for basic ingestion/status/analytics calls.
219220
- Async support is available via `AsyncProvenanceClient`. Install the client SDK with `pip install provenance[client]` and import from `clients.python`.
220221

222+
## Additional Documentation
223+
224+
- [CI Integration Guide](docs/ci-integration.md) – Configure GitHub Actions, upload SARIF, archive decision bundles, and adapt the workflow to other CI systems.
225+
- [SARIF Reporting](docs/sarif.md) – Understand the SARIF 2.1.0 output, severity mapping, and customization hooks.
226+
- [DSSE Decision Bundles](docs/decision-bundles.md) – Inspect the envelope schema, verify signatures, and integrate with transparency logs.
227+
221228
## Data Persistence Model
222229

223230
- **Analyses** – Stored as JSON blobs keyed by `analysis:{analysis_id}` with a sorted set index for time-window queries.

clients/github-action/run.py

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,7 @@
22

33
import argparse
44
import json
5+
import os
56
import subprocess
67
import time
78
from pathlib import Path
@@ -103,6 +104,12 @@ def main() -> None:
103104
response = submit_analysis(args.api_url, args.api_token, payload)
104105
analysis_id = response["analysis_id"]
105106
decision = poll_decision(args.api_url, args.api_token, analysis_id)
107+
write_response_path = os.getenv("PROVENANCE_WRITE_RESPONSE_PATH")
108+
if write_response_path:
109+
out_path = Path(write_response_path)
110+
if not out_path.parent.exists():
111+
out_path.parent.mkdir(parents=True, exist_ok=True)
112+
out_path.write_text(json.dumps(decision, indent=2) + "\n", encoding="utf-8")
106113
print(json.dumps(decision, indent=2))
107114
decision_info = decision.get("decision") or {}
108115
outcome_value = decision_info.get("outcome") or decision.get("status")

docs/ci-integration.md

Lines changed: 151 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,151 @@
1+
# CI Integration Guide
2+
3+
This guide walks through wiring Provenance into continuous integration systems so every pull request is analyzed, decisions are enforced automatically, and evidence is archived for future audits.
4+
5+
## Prerequisites
6+
7+
- Provenance API endpoint (e.g. `https://provenance.example.com`).
8+
- API token with permission to create analyses and read decisions.
9+
- Python 3.12 runtime available to the pipeline (the bundled GitHub Action installs [uv](https://docs.astral.sh/uv/latest/) and reuses this repository's dependency lockfile).
10+
- Optional: Ed25519 public key published via `PROVENANCE_DECISION_VERIFY_KEY` if your governance service signs DSSE bundles.
11+
12+
## GitHub Actions
13+
14+
We ship a composite action in `clients/github-action/` that:
15+
16+
1. Collects the unified diff between the PR base and head commits.
17+
2. Submits the diff plus provenance metadata to `/v1/analysis`.
18+
3. Polls `/v1/analysis/{id}` until the analysis completes.
19+
4. Prints the structured decision payload for reviewer visibility.
20+
5. Exits with a non-zero status when the policy outcome is `block`.
21+
22+
### Example Workflow
23+
24+
Save the following as `.github/workflows/provenance.yml` and provide the API configuration through GitHub secrets:
25+
26+
```yaml
27+
name: Provenance Governance
28+
29+
on:
30+
pull_request:
31+
types: [opened, synchronize, reopened]
32+
33+
jobs:
34+
analyze:
35+
runs-on: ubuntu-latest
36+
steps:
37+
- uses: actions/checkout@v4
38+
with:
39+
# Ensure the full history is present for an accurate diff.
40+
fetch-depth: 0
41+
42+
- name: Run Provenance analysis
43+
uses: ./clients/github-action
44+
with:
45+
api_url: ${{ secrets.PROVENANCE_API_URL }}
46+
api_token: ${{ secrets.PROVENANCE_API_TOKEN }}
47+
```
48+
49+
When a decision is `block`, the job fails and the PR is marked red. `allow` and `warn` outcomes complete successfully; governance context is still attached to the run log for reviewer triage.
50+
51+
### Exposing SARIF Findings in GitHub
52+
53+
The analysis API now exposes a SARIF 2.1.0 representation of each run. Add a follow-up step to fetch the SARIF payload and upload it to the GitHub code scanning UI:
54+
55+
```yaml
56+
- name: Download SARIF report
57+
if: success() || failure()
58+
run: |
59+
set -euo pipefail
60+
ANALYSIS_ID=$(jq -r '.analysis_id' provenance.json)
61+
curl -sSf -H "Authorization: Bearer ${{ secrets.PROVENANCE_API_TOKEN }}" \
62+
"$${{ secrets.PROVENANCE_API_URL }}/v1/analysis/${ANALYSIS_ID}/sarif" \
63+
-o provenance.sarif
64+
65+
- name: Upload SARIF to GitHub
66+
if: success() || failure()
67+
uses: github/codeql-action/upload-sarif@v3
68+
with:
69+
sarif_file: provenance.sarif
70+
```
71+
72+
To make the SARIF payload available, update the action invocation to persist the API response JSON:
73+
74+
```yaml
75+
- name: Run Provenance analysis
76+
uses: ./clients/github-action
77+
with:
78+
api_url: ${{ secrets.PROVENANCE_API_URL }}
79+
api_token: ${{ secrets.PROVENANCE_API_TOKEN }}
80+
env:
81+
PROVENANCE_WRITE_RESPONSE_PATH: provenance.json
82+
```
83+
84+
The `clients/github-action/run.py` script respects `PROVENANCE_WRITE_RESPONSE_PATH` and mirrors the latest decision payload to disk so downstream steps can reference the analysis identifier without re-polling the API.
85+
86+
### Archiving DSSE Decision Bundles
87+
88+
Signed DSSE envelopes provide tamper-evident evidence for pipeline attestations. Attach the bundle to the workflow artifacts:
89+
90+
```yaml
91+
- name: Archive decision bundle
92+
if: success() || failure()
93+
run: |
94+
set -euo pipefail
95+
ANALYSIS_ID=$(jq -r '.analysis_id' provenance.json)
96+
curl -sSf -H "Authorization: Bearer ${{ secrets.PROVENANCE_API_TOKEN }}" \
97+
"$${{ secrets.PROVENANCE_API_URL }}/v1/analysis/${ANALYSIS_ID}/bundle" \
98+
-o decision-bundle.json
99+
100+
- uses: actions/upload-artifact@v4
101+
if: success() || failure()
102+
with:
103+
name: provenance-decision-bundle
104+
path: decision-bundle.json
105+
retention-days: 30
106+
```
107+
108+
Auditors can later verify the payload hash and (if configured) Ed25519 signature against the published governance verification key.
109+
110+
## Other CI Systems
111+
112+
The workflow runner is a thin wrapper around four HTTP calls, so porting the integration to other CI providers is straightforward.
113+
114+
1. Generate the diff for the change under review. For example, in Jenkins:
115+
116+
```bash
117+
git fetch origin "${CHANGE_TARGET}"
118+
git diff --unified=0 "origin/${CHANGE_TARGET}...${GIT_COMMIT}" > diff.patch
119+
```
120+
121+
2. Convert the diff to the `changed_lines` payload expected by `/v1/analysis`. You can reuse `clients/github-action/run.py` directly (`python -m clients.github-action.run ...`) or craft JSON with a custom script.
122+
123+
3. Submit the payload:
124+
125+
```bash
126+
curl -sSf -H "Authorization: Bearer ${PROVENANCE_API_TOKEN}" \
127+
-H "Content-Type: application/json" \
128+
-d "@payload.json" \
129+
"${PROVENANCE_API_URL}/v1/analysis"
130+
```
131+
132+
4. Poll `/v1/analysis/{id}` until `status` is `completed`; enforce `decision.outcome == "block"` to fail the job.
133+
134+
5. Optionally fetch `/v1/analysis/{id}/sarif` and `/v1/analysis/{id}/bundle` to integrate with downstream scanners or evidence stores.
135+
136+
### Containerized Stages
137+
138+
If your CI stages run in disposable containers:
139+
140+
- Install `uv` (or `pip`) to execute `clients/github-action/run.py`.
141+
- Mount the repository workspace so the diff generator can inspect tracked files.
142+
- Provide `PROVENANCE_API_URL` and `PROVENANCE_API_TOKEN` via environment variables or secrets injection.
143+
- Persist the JSON response to disk if later stages depend on the analysis identifier.
144+
145+
## Debugging Tips
146+
147+
- The composite action logs the raw decision payload; review the `risk_summary` and `decision.rationale` fields when a run blocks unexpectedly.
148+
- Use the `PROVENANCE_TRACE=1` environment variable to enable verbose HTTP logging inside the action script.
149+
- When testing locally, run `uv run clients/github-action/run.py --help` to see available arguments.
150+
- Double-check that `fetch-depth: 0` (or an equivalent full clone) is configured; shallow clones omit base commits, leading to empty diffs and analyses that no-op.
151+
- If polling times out, inspect the Provenance server logs for long-running detectors or governance evaluations; consider extending the `--timeout-s` flag in the CLI.

docs/decision-bundles.md

Lines changed: 120 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,120 @@
1+
# DSSE Decision Bundles
2+
3+
Provenance emits DSSE-formatted envelopes for every governance evaluation so downstream systems can verify policy outcomes independently. This document explains the payload structure, signing process, and recommended verification steps.
4+
5+
## Envelope Structure
6+
7+
Decision bundles follow the [in-toto DSSE specification](https://github.com/in-toto/ietf-draft), using JSON as the payload type:
8+
9+
```json
10+
{
11+
"payloadType": "application/provenance.decision+json",
12+
"payload": "<base64-encoded canonical JSON>",
13+
"payloadSha256": "<hex digest of the decoded payload>",
14+
"signatures": [
15+
{
16+
"keyid": "decision-key",
17+
"sig": "<base64-encoded Ed25519 signature>"
18+
}
19+
]
20+
}
21+
```
22+
23+
The decoded payload contains the authoritative decision context:
24+
25+
```json
26+
{
27+
"analysis_id": "an_abcd1234",
28+
"repo_id": "acme/shop",
29+
"pr_number": "77",
30+
"decided_at": "2024-06-07T19:11:42.483920Z",
31+
"outcome": "block",
32+
"policy_version": "2024-06-01",
33+
"rationale": "Critical findings detected.",
34+
"risk_summary": {
35+
"findings_total": 3,
36+
"findings_by_category": {"code_execution": 2, "secrets": 1},
37+
"findings_by_severity": {"critical": 1, "high": 2},
38+
"coverage": {
39+
"total_lines": 22,
40+
"attributed_lines": 18,
41+
"unknown_line_count": 4,
42+
"coverage_percent": 81.82
43+
}
44+
},
45+
"provenance_confidence": {
46+
"agent_attribution_percent": 95.0,
47+
"cryptographic_attestations": 17
48+
},
49+
"thresholds": {
50+
"warn": {"code_execution": 1},
51+
"block": {"secrets": 1}
52+
},
53+
"detector_capabilities": {
54+
"semgrep": {
55+
"ruleset": "app/detection_rules/semgrep_rules.yml",
56+
"sha256": "..."
57+
}
58+
},
59+
"line_count": 22,
60+
"finding_count": 3,
61+
"inputs_sha256": "e843a82d88e1416f33804ce96f41d2a57c99b35a2f1b9e1d4fb86a03d38f6c5d"
62+
}
63+
```
64+
65+
### Canonicalization
66+
67+
- The payload is serialized with `sort_keys=True` and compact separators `(",", ":")`.
68+
- `payloadSha256` is computed over the raw UTF-8 bytes of this canonical JSON.
69+
- Signatures cover the same byte sequence, ensuring tamper detection.
70+
71+
## Retrieving Bundles
72+
73+
- API: `GET /v1/analysis/{id}/bundle` returns the envelope alongside the original analysis identifier.
74+
- Telemetry: a `decision_bundle` event is published to configured sinks (Redis, ClickHouse, Snowflake, etc.) for streaming ingestion.
75+
- CI: the GitHub Action recipe in [docs/ci-integration.md](ci-integration.md) demonstrates downloading the bundle and archiving it as an artifact.
76+
77+
## Signature Verification
78+
79+
If `PROVENANCE_DECISION_SIGNING_KEY` is configured on the server, an Ed25519 signature is attached to each bundle. Clients verify signatures with the corresponding public key (`VerifyKey`).
80+
81+
Python example:
82+
83+
```python
84+
import base64
85+
import json
86+
from nacl.signing import VerifyKey
87+
88+
def verify_bundle(envelope: dict, verify_key_b64: str) -> dict:
89+
payload_bytes = base64.b64decode(envelope["payload"])
90+
expected_sha = envelope["payloadSha256"]
91+
actual_sha = __import__("hashlib").sha256(payload_bytes).hexdigest()
92+
if actual_sha != expected_sha:
93+
raise ValueError("payloadSha256 mismatch")
94+
95+
signatures = envelope.get("signatures") or []
96+
if not signatures:
97+
raise ValueError("no signatures present")
98+
99+
verify_key = VerifyKey(base64.b64decode(verify_key_b64))
100+
signature = base64.b64decode(signatures[0]["sig"])
101+
verify_key.verify(payload_bytes, signature)
102+
return json.loads(payload_bytes)
103+
```
104+
105+
Validation checklist:
106+
107+
- Compare `analysis_id`, `repo_id`, and `pr_number` against the workflow that fetched the bundle.
108+
- Inspect `inputs_sha256` to confirm the provenance inputs used by governance match the expected diff metadata.
109+
- Review `detector_capabilities` to confirm the rule packs and detectors loaded at evaluation time.
110+
111+
## Integrating with Sigstore / Rekor
112+
113+
- Push the envelope to an append-only transparency log to create an auditable trail of governance decisions.
114+
- Optionally include the DSSE bundle as an annotation on OCI artifacts, release tags, or SBOM documents to tie provenance decisions to shipped assets.
115+
116+
## Troubleshooting
117+
118+
- Empty `signatures`: ensure `PROVENANCE_DECISION_SIGNING_KEY` is set to a base64-encoded Ed25519 private key on the server; restart the service after updating the setting.
119+
- Hash mismatch: verify that intermediaries are not reformatting the JSON (e.g., pretty-printing the payload before storage). Always store the original envelope untouched.
120+
- Out-of-sync detector metadata: confirm the analysis ingestion stage persisted detector capability snapshots (`AnalysisService` copies active detector digests into `provenance_inputs.detector_capabilities`).

0 commit comments

Comments
 (0)