Skip to content

Commit da4b503

Browse files
committed
update ingestion with SBOM action details
1 parent 08dc51c commit da4b503

File tree

2 files changed

+53
-13
lines changed

2 files changed

+53
-13
lines changed

docs/pg-atlas/ingestion.md

Lines changed: 51 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -32,16 +32,44 @@ Each SBOM submission is associated with a specific **Repo**, not a project direc
3232

3333
**Workflow**:
3434

35-
- Teams add a lightweight GitHub Action that generates a CycloneDX or SPDX SBOM (JSON format
36-
preferred for parsing ease).
37-
- Action posts the SBOM to a designated endpoint directly from the GitHub hosted runner. (only
38-
accepts allow-listed callers)
35+
- Teams add a
36+
[lightweight GitHub Action](https://github.com/SCF-Public-Goods-Maintenance/pg-atlas-sbom-action)
37+
to their workflows. The action fetches the repo's SPDX 2.3 dependency graph from the
38+
[GitHub Dependency Graph API](https://docs.github.com/en/rest/dependency-graph/sboms) and submits
39+
it to the PG Atlas ingestion endpoint, authenticated via a GitHub OIDC token. Supports both public
40+
and private repos.
3941
- Optional: allow non-GitHub SBOM submissions which are signed with a project key for provenance
4042
(deferred for v0).
4143

44+
**Authentication**:
45+
46+
The action requests a short-lived GitHub OIDC token (RS256-signed JWT issued by GitHub's OIDC
47+
provider) with the PG Atlas API URL as the audience, and sends it in the `Authorization: Bearer`
48+
header of the submission request. No secrets need to be configured in the calling repository — the
49+
only caller-side requirement is `id-token: write` in the workflow's `permissions` block.
50+
51+
The API verifies the token by:
52+
53+
1. Fetching GitHub's public JWKS from `https://token.actions.githubusercontent.com/.well-known/jwks`.
54+
2. Verifying the RS256 signature and standard claims (`iss`, `exp`, `aud`).
55+
3. Extracting the `repository` claim (`owner/repo`) to establish which repo submitted the SBOM, and
56+
recording the `actor` (triggering user) for audit purposes.
57+
58+
Both GitHub-hosted and self-hosted runners are supported. The OIDC token in both cases is signed by
59+
GitHub's OIDC provider and contains a `runner_environment` claim (`github-hosted` or `self-hosted`).
60+
61+
**Trust model**: The OIDC token cryptographically proves the _identity_ of the submitting repo — it
62+
guarantees that the submission originated from a workflow running in the context of `owner/repo`,
63+
authorized by a GitHub user with write access. It does **not** independently verify the _content_ of
64+
the submitted SBOM: a workflow author controls the workflow YAML and could in principle modify the
65+
payload before submission. The principal mitigations are: (1) the reference graph cross-check (A8)
66+
flags declared dependencies that diverge from the inferred graph; (2) all submissions are logged with
67+
the `repository` and `actor` claims, making falsification an attributable act; (3) community review
68+
and the public leaderboard create social accountability.
69+
4270
**Processing**:
4371

44-
- Validate format and schema.
72+
- Validate SPDX 2.3 format and schema.
4573
- Extract dependencies (package name + version range).
4674
- Map each dependency to a `Repo` (if within-ecosystem) or `ExternalRepo` (if external). Normalize
4775
ecosystem-specific names (e.g., `soroban-sdk` across crates/npm) to match `canonical_id` format
@@ -58,14 +86,26 @@ Each SBOM submission is associated with a specific **Repo**, not a project direc
5886
- Planned: Tie to SCF Build testnet tranche release (preferred over mainnet to capture dependencies
5987
early).
6088

61-
<!-- FUTURE SELF: Add example GitHub Action YAML snippet here once finalized. Link to template repo.
62-
-->
89+
**Example workflow**:
90+
91+
```yaml
92+
jobs:
93+
sbom:
94+
runs-on: ubuntu-latest
95+
permissions:
96+
contents: read # for GitHub Dependency Graph API
97+
id-token: write # for OIDC authentication to PG Atlas
98+
steps:
99+
- uses: SCF-Public-Goods-Maintenance/pg-atlas-sbom-action@<full-commit-hash>
100+
```
101+
102+
The `api-url` input defaults to the production PG Atlas endpoint and does not need to be set. The
103+
calling repo must have the GitHub dependency graph enabled.
63104

64105
**Open Questions**:
65106

66107
- Mandatory vs. optional for v0? (Risk: low uptake → sparse graph; mitigation: strong reference graph
67108
bootstrapping).
68-
- Which SBOM format to standardize on (CycloneDX JSON recommended for tool support)?
69109

70110
## Reference Graph Bootstrapping
71111

@@ -89,7 +129,7 @@ metadata, starting from curated root nodes.
89129
1. **Bootstrap Project vertices from OpenGrants**: Load SCF-awarded projects as `Project` rows.
90130
Populate `activity_status` from SCF Impact Survey data when available; default to `non-responsive`
91131
for projects with no survey response (see
92-
[Activity Status Update Logic](/pg-atlas/storage.md#activity-status-update-logic)).
132+
[Activity Status Update Logic](storage.md#activity-status-update-logic)).
93133
1. **Discover Repos**: From each project's `git_org_url`, enumerate repositories and create `Repo`
94134
vertices linked to the parent `Project` via `project_id` foreign key.
95135
1. Maintain a curated seed list of known Stellar/Soroban public goods (e.g., `soroban-sdk`,
@@ -125,7 +165,7 @@ Cloned repos may be LRU-cached to avoid re-cloning on every refresh.
125165
commits). Aggregate to `Project.pony_factor` by computing pony factor over the union of unique
126166
contributors across all project repos (deduplicated by `Contributor.email_hash`).
127167
- Update `Repo.latest_commit_date` from git log — feeds into activity status triangulation (see
128-
[Activity Status Update Logic](/pg-atlas/storage.md#activity-status-update-logic)).
168+
[Activity Status Update Logic](storage.md#activity-status-update-logic)).
129169
- Update on triggers (new release tag, quarterly refresh).
130170

131171
**Open Questions**:
@@ -148,7 +188,7 @@ Cloned repos may be LRU-cached to avoid re-cloning on every refresh.
148188
- Store raw ingested artifacts (SBOM files, crawl snapshots) in repo or S3/IPFS for auditability.
149189
- All writes target `Repo`, `ExternalRepo`, `Contributor`, and edge tables. `Project` vertices are
150190
bootstrapped from OpenGrants and updated via survey/OpenGrants pipelines (see
151-
[Incremental Updates](/pg-atlas/storage.md#incremental-updates) in Storage).
191+
[Incremental Updates](storage.md#incremental-updates) in Storage).
152192

153193
<!-- QUESTION FOR LEAD: Do we want a diagram here (Mermaid of ingestion flows: SBOM → API →
154194
Validation → Graph Update vs. Reference Crawl → Periodic Job)? -->

docs/pg-atlas/metric-computation.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -45,8 +45,8 @@ active leaf project. This filters out dead branches and ensures criticality refl
4545
- Result: Subgraph containing only repo nodes with paths from active leaves.
4646

4747
Repo-level activity status is derived from the parent project's status (see
48-
[Activity Status Update Logic](/pg-atlas/storage.md#activity-status-update-logic) in Storage). Both
49-
`live` and `in-dev` repos are treated as active for subgraph projection.
48+
[Activity Status Update Logic](storage.md#activity-status-update-logic) in Storage). Both `live` and
49+
`in-dev` repos are treated as active for subgraph projection.
5050

5151
**Preferred v0 implementation**:
5252

0 commit comments

Comments
 (0)