Skip to content

Commit c3b9fd0

Browse files
authored
Merge pull request #21 from advanced-security/dependency-review
Add Dependency Review and Submission
2 parents fc53815 + 2d3fe41 commit c3b9fd0

18 files changed

+2246
-206
lines changed

.gitignore

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -3,4 +3,6 @@ dist/
33
.env
44
data/
55
.vscode/
6-
.DS_Store
6+
.DS_Store
7+
component-detection
8+
tmp-branch-search-cache/

.gitmodules

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
[submodule "component-detection-dependency-submission-action"]
2+
path = component-detection-dependency-submission-action
3+
url = https://github.com/advanced-security/component-detection-dependency-submission-action.git

CHANGELOG.md

Lines changed: 33 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,33 @@
1+
# Changelog
2+
3+
## [2025-12-09] – 0.2.0 - Branch scanning and dependency submission
4+
5+
Added:
6+
7+
- Branch scanning:
8+
- Fetch SBOM diffs for non‑default branches via Dependency Review API.
9+
- Added `--branch-scan`, `--branch-limit`, and `--diff-base` CLI flags.
10+
- Dependency Submission integration:
11+
- Automatically submits dependency snapshots for branches being scanned, if not already present, using Component Detection.
12+
- Language-aware sparse checkout.
13+
- Use a pre-downloaded binary (`--component-detection-bin`) or an auto-downloaded release.
14+
- Allows forcing submission, even if a snapshot already exists.
15+
- Search and matching:
16+
- Refactored search to de-duplicate logic and include branch diffs (added/updated packages only).
17+
- Malware matching enhanced to enumerate packages from diffs; matches annotated with branch.
18+
- CLI and CSV outputs include branch context; CSV adds a `branch` column.
19+
- CLI and UX improvements:
20+
- Argument validation updated: `--sync-sboms` requires `--sbom-cache`.
21+
- Malware-only mode: allow `--sync-malware` without `--sbom-cache` (requires `--malware-cache`).
22+
- JSON/CLI/CSV interaction clarified and documented.
23+
- Added examples for malware-only sync and branch scanning.
24+
- Advisory sync robustness:
25+
- GraphQL advisory sync implements adaptive retries with exponential backoff and `Retry-After` support.
26+
27+
Fixed:
28+
29+
- Added `--ghes` flag to ensure proper API URL construction for GitHub Enterprise Server instances.
30+
31+
## [2025-10-06] - 0.1.0 - Initial public release
32+
33+
- Initial release, with: SBOM sync; malware sync; malware matching; CLI, file based and interactive PURL searching. SARIF, CSV and JSON outputs supported.

README.md

Lines changed: 127 additions & 26 deletions
Original file line numberDiff line numberDiff line change
@@ -16,6 +16,7 @@ Supports human-readable, JSON, CSV and SARIF output. SARIF alerts can be uploade
1616
- Optional progress bar while fetching SBOMs
1717
- Option to suppress secondary rate limit warnings, and full quiet mode to suppress informative messages
1818
- Adaptive backoff: each secondary rate limit hit increases the SBOM fetch delay by 10% to reduce future throttling
19+
- Optional branch scanning†: fetch SBOM diffs with Dependency Review for non-default branches and submit missing dependency snapshots if needed with Component Detection + Dependency Submission
1920
- Offline caching of SBOMs and security advisories with incremental updates
2021
- Matching:
2122
- Version-aware matching of SBOM packages against malware advisories
@@ -27,9 +28,12 @@ Supports human-readable, JSON, CSV and SARIF output. SARIF alerts can be uploade
2728
- Output:
2829
- Human-readable console output
2930
- JSON or CSV output (to stdout or file) with both search and malware matches
30-
- Optional SARIF 2.1.0 output per repository for malware matches with optional Code Scanning upload
31+
- Optional SARIF 2.1.0 output per repository for malware matches
32+
- includes Code Scanning upload†
3133
- Works with GitHub.com, GitHub Enterprise Server, GitHub Enterprise Managed Users and GitHub Enterprise Cloud with Data Residency (custom base URL)
3234

35+
† GitHub Advanced Security/GitHub Code Security required for this feature
36+
3337
## Usage
3438

3539
### Quick Start
@@ -55,6 +59,76 @@ Using GitHub Enterprise Server:
5559
npm run start -- --sync-sboms --enterprise ent --base-url https://github.internal/api/v3 --sbom-cache sboms --token $GHES_TOKEN
5660
```
5761

62+
### 🔀 Branch Scanning & Dependency Review
63+
64+
Enable branch SBOM collection and dependency diffs with `--branch-scan`.
65+
66+
Flags:
67+
68+
```bash
69+
--branch-scan # Fetch SBOMs for non-default branches
70+
--branch-limit <n> # Max number of non-default branches per repo (default 10)
71+
--diff-base <branch> # Override base branch for diffs (default: repository default)
72+
```
73+
74+
Example: scan first 5 feature branches and diff them against `main`:
75+
76+
```bash
77+
npm run start -- --sync-sboms --org my-org \
78+
--sbom-cache sboms --branch-scan --branch-limit 5 \
79+
--diff-base main --token $GITHUB_TOKEN
80+
```
81+
82+
Search results will include branch matches: package PURLs annotated with `@branch` inside the match list (e.g. `pkg:npm/[email protected]@feature-x`). Dependency Review additions / updates are also searched; only added/updated head-side packages are considered.
83+
84+
If a branch SBOM or diff retrieval fails, the error is recorded but does not stop collection for other branches or repositories.
85+
86+
#### Handling Missing Dependency Review Snapshots
87+
88+
If the Dependency Review API returns a 404 for a branch diff (commonly due to a missing dependency snapshot on either the base or head commit), the toolkit can optionally attempt to generate and submit a snapshot using Component Detection and Dependency Submission. This is vendored-in and forked from the public [Component Detection Dependency Submission Action](https://github.com/advanced-security/component-detection-dependency-submission-action).
89+
90+
Enable automatic submission + retry with:
91+
92+
```bash
93+
--submit-on-missing-snapshot
94+
```
95+
96+
The tool will attempt to download the latest Component Detection release from GitHub Releases into the current directory, to run it, unless you provide a local binary with `--component-detection-bin`.
97+
98+
If submission fails, the original 404 reason is retained and collection proceeds.
99+
100+
##### Using a Local Component Detection Binary
101+
102+
Instead of downloading the latest release automatically, you can point the toolkit at a local `component-detection` executable. This is useful if you already manage the binary or need a custom build.
103+
104+
Pass the path via `--component-detection-bin` and optionally limit languages to reduce sparse checkout size:
105+
106+
```bash
107+
npm run start -- \
108+
--sync-sboms --org my-org --sbom-cache sboms \
109+
--branch-scan --submit-on-missing-snapshot \
110+
--submit-languages JavaScript,TypeScript \
111+
--component-detection-bin /usr/local/bin/component-detection
112+
```
113+
114+
On MacOS, you may find that system protection prevents running a downloaded binary. You can [check out the .NET code](https://github.com/microsoft/component-detection/) and run it via a wrapper script such as:
115+
116+
```bash
117+
#!/bin/bash
118+
119+
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
120+
121+
cd "$SCRIPT_DIR" || exit 1
122+
123+
dotnet run --project "./src/Microsoft.ComponentDetection/Microsoft.ComponentDetection.csproj" "$@"
124+
```
125+
126+
Notes:
127+
128+
- Providing `--component-detection-bin` skips any download logic and uses your binary directly.
129+
- Snapshot submission performs a language-aware sparse checkout of common manifest/lock files (e.g., `package.json`, `requirements.txt`, `pom.xml`).
130+
- After submission, the toolkit waits briefly and retries the dependency review diff once.
131+
58132
### 🔑 Authentication
59133

60134
A GitHub token with appropriate scope is required when performing network operations such as `--sync-sboms`, `--sync-malware` and `--upload-sarif`.
@@ -123,6 +197,12 @@ Offline match with already-cached malware advisories (no network calls):
123197
npm run start -- --sbom-cache sboms --malware-cache malware-cache --match-malware
124198
```
125199

200+
Malware-only advisory sync (no SBOM cache required):
201+
202+
```bash
203+
npm run start -- --sync-malware --malware-cache malware-cache --token $GITHUB_TOKEN
204+
```
205+
126206
Write malware matches (and optionally search results later) to a JSON file using `--output-file`:
127207

128208
```bash
@@ -131,6 +211,16 @@ npm run start -- --sbom-cache sboms --malware-cache malware-cache --match-malwar
131211

132212
If you also perform a search in the same invocation (add `--purl` or `--purl-file`), the JSON file will contain both `malwareMatches` and `search` top-level keys.
133213

214+
#### Advisory Rate Limit Handling
215+
216+
Advisory sync uses GitHub GraphQL with adaptive retry/backoff to handle secondary rate limits and transient errors:
217+
218+
- Retries on `403` secondary rate limit, `429`, and `5xx` responses.
219+
- Honors `Retry-After` when provided; otherwise uses exponential backoff with jitter.
220+
- Respects `--quiet` to suppress retry log messages.
221+
222+
If retries are exhausted, the sync aborts gracefully and leaves previously cached advisories intact.
223+
134224
#### Ignoring Matches
135225

136226
Provide a YAML ignore file via `--ignore-file` to suppress specific matches (before SARIF generation / JSON output). Structure:
@@ -295,31 +385,42 @@ Then type one PURL query per line. Entering a blank line or using Ctrl+C on a bl
295385

296386
| Arg | Purpose |
297387
|------|---------|
298-
| `--sbom-cache <dir>` | Directory holding per-repo SBOM JSON files (required for offline mode; used as write target when syncing) |
299-
| `--sync-sboms` | Perform API calls to (re)collect SBOMs; without it the CLI runs offline loading cached SBOMs. Requires a GitHub token |
300-
| `--enterprise <slug>` / `--org <login>` | Scope selection (mutually exclusive when syncing) |
301-
| `--purl <purl>` | Add a PURL/range/wildcard query (repeatable) |
302-
| `--purl-file <file>` | File with one query per line |
303-
| `--json` | Emit search JSON to stdout (unless overridden by `--output-file`) |
304-
| `--cli` | Also emit human-readable output when producing JSON (requires `--output-file`) |
305-
| `--output-file <file>` | Write search JSON payload to file; required when using both `--json` and `--cli` |
306-
| `--interactive` | Enter interactive search prompt after initial processing |
307-
| `--sync-malware` | Fetch & cache malware advisories (MALWARE classification). Requires a GitHub token |
308-
| `--match-malware` | Match current SBOM set against cached advisories |
309-
| `--malware-cache <dir>` | Advisory cache directory (required with malware operations) |
310-
| `--malware-cutoff <ISO-date>` | Ignore advisories whose publishedAt AND updatedAt are both before this date/time (e.g. `2025-09-29` or full timestamp) |
311-
| `--ignore-file <path>` | YAML ignore file (advisories / purls / scoped blocks) to filter malware matches before output |
312-
| `--ignore-unbounded-malware` | Ignore matches whose advisory vulnerable version range covers all versions (e.g. `*`, `>=0`, `0.0.0`) |
313-
| `--sarif-dir <dir>` | Write SARIF 2.1.0 files per repository (with malware matches) |
314-
| `--upload-sarif` | Upload generated SARIF to Code Scanning (requires --match-malware & --sarif-dir and a GitHub token) |
388+
| `--token <token>` | GitHub token; required for `--sync-sboms`, `--sync-malware`, and `--upload-sarif` (or use `GITHUB_TOKEN`) |
389+
| `--enterprise <slug>` | Collect across all orgs in an Enterprise (mutually exclusive with `--org`/`--repo` when syncing) |
390+
| `--org <login>` | Single organization scope (mutually exclusive with `--enterprise`/`--repo` when syncing) |
391+
| `--repo <name>` | Single repository scope in the form `owner/name` (mutually exclusive with `--enterprise`/`--org` when syncing) |
392+
| `--base-url <url>` | GitHub Enterprise Server REST base URL (e.g. `https://ghe.example.com/api/v3`) |
315393
| `--concurrency <n>` | Parallel SBOM fetches (default 5) |
316-
| `--sbom-delay <ms>` | Delay between SBOM fetch (dependency-graph/sbom) requests (default 5000) |
317-
| `--light-delay <ms>` | Delay between lightweight metadata calls (listing repos, commit head checks) (default 500) |
318-
| `--base-url <url>` | GitHub Enterprise Server REST base URL (ends with /api/v3) |
319-
| `--progress` | Show a dynamic progress bar during SBOM collection |
320-
| `--suppress-secondary-rate-limit-logs` | Hide secondary rate limit warning lines (automatically applied with `--progress`) |
321-
| `--quiet` | Suppress all non-error and non-result output (progress bar, JSON and human readable output still show) |
322-
| `--ca-bundle <path>` | Path to a PEM file containing one or more additional CA certificates (self‑signed / internal PKI) |
394+
| `--sbom-delay <ms>` | Delay between SBOM fetch requests (default 3000) |
395+
| `--light-delay <ms>` | Delay between lightweight metadata requests (default 100) |
396+
| `--sbom-cache <dir>` | Directory to read/write per‑repo SBOM JSON; required for SBOM syncing and offline use |
397+
| `--sync-sboms` | Perform API calls to collect SBOMs; without it the CLI runs offline using `--sbom-cache` |
398+
| `--progress` | Show a progress bar during SBOM collection |
399+
| `--suppress-secondary-rate-limit-logs` | Suppress secondary rate limit warning logs (useful with `--progress`) |
400+
| `--quiet` | Suppress non‑error output (progress bar and machine output still emitted) |
401+
| `--ca-bundle <path>` | PEM bundle with additional CA certs for REST/GraphQL/SARIF upload |
402+
| `--purl <purl>` | Add a PURL / semver range / wildcard query (repeatable) |
403+
| `--purl-file <file>` | File with one query per line (supports comments) |
404+
| `--json` | Emit search results as JSON (to stdout unless `--output-file` specified) |
405+
| `--cli` | Also emit human‑readable output when producing JSON/CSV; requires `--output-file` to avoid mixed stdout |
406+
| `--csv` | Emit results (search + malware matches) as CSV (to stdout or `--output-file`) |
407+
| `--output-file <file>` | Write JSON/CSV output to file; required when using `--cli` with `--json` or `--csv` |
408+
| `--interactive` | Enter interactive PURL search prompt after initial processing |
409+
| `--sync-malware` | Fetch & cache malware advisories (MALWARE); requires a token |
410+
| `--match-malware` | Match SBOM packages against cached malware advisories |
411+
| `--malware-cache <dir>` | Directory to store malware advisory cache (required with malware operations) |
412+
| `--malware-cutoff <ISO-date>` | Exclude advisories whose `publishedAt` and `updatedAt` are both before cutoff |
413+
| `--ignore-file <path>` | YAML ignore file (advisories / purls / scoped blocks) to filter matches before output |
414+
| `--ignore-unbounded-malware` | Suppress advisories with effectively unbounded vulnerable ranges (e.g. `*`, `>=0`) |
415+
| `--sarif-dir <dir>` | Write SARIF 2.1.0 files per repository (for malware matches) |
416+
| `--upload-sarif` | Upload generated SARIF to Code Scanning (requires `--match-malware` and `--sarif-dir`) |
417+
| `--branch-scan` | Fetch SBOM diffs for non‑default branches (limited by `--branch-limit`) |
418+
| `--branch-limit <n>` | Limit number of non‑default branches scanned per repository (default 10) |
419+
| `--diff-base <branch>` | Override base branch for dependency review diffs (defaults to repository default branch) |
420+
| `--submit-on-missing-snapshot` | On diff 404, run Component Detection to submit a snapshot, then retry |
421+
| `--submit-languages <list>` | Limit snapshot submission to specific languages (comma‑separated) |
422+
| `--component-detection-bin <path>` | Path to local `component-detection` executable (skip download) |
423+
| `--debug` | Enable debug logging |
323424

324425
## Build & test
325426

@@ -364,7 +465,7 @@ npm run start -- --sbom-cache fixtures/sboms --malware-cache fixtures/malware-ca
364465

365466
Standard & secondary rate limits trigger an automatic retry (up to 2 times).
366467

367-
You can tune concurrency and increase the delay to reduce the chance of hitting rate limits.
468+
You can tune concurrency and increase the various delays to reduce the chance of hitting rate limits, if you find that you have hit rate limits.
368469

369470
Each time a secondary rate limit is hit, the delay between fetching SBOMs is increased by 10%, to provide a way to adaptively respond to that rate limit.
370471

fixtures/sboms/advanced-security/test-sbom-repo/sbom.json

Lines changed: 20 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -69,5 +69,25 @@
6969
}
7070
]
7171
}
72+
],
73+
"branchDiffs": [
74+
{
75+
"latestCommitDate": "2025-12-01T12:39:01.734Z",
76+
"base": "main",
77+
"head": "test",
78+
"retrievedAt": "2025-12-01T12:39:01.734Z",
79+
"changes": [
80+
{
81+
"changeType": "added",
82+
"name": "chalk",
83+
"ecosystem": "npm",
84+
"packageURL": "pkg:npm/[email protected]",
85+
"license": "MIT",
86+
"manifest": "package-lock.json",
87+
"scope": "runtime",
88+
"version": "5.6.1"
89+
}
90+
]
91+
}
7292
]
7393
}

0 commit comments

Comments
 (0)