Skip to content

Commit 3a2b7ae

Browse files
authored
Merge pull request #3 from andrew/enrichment
Package and Version metadata
2 parents e3c1e88 + 22a13b1 commit 3a2b7ae

26 files changed

+2153
-47
lines changed

README.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -524,6 +524,7 @@ Git::Pkgs::Database.connect(repo_git_dir)
524524
Git::Pkgs::Models::DependencyChange.where(name: "rails").all
525525
```
526526

527+
527528
## Contributing
528529

529530
Bug reports, feature requests, and pull requests are welcome. If you're unsure about a change, open an issue first to discuss it.

docs/enrichment.md

Lines changed: 177 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,177 @@
1+
# Package Enrichment
2+
3+
git-pkgs can fetch additional metadata about your dependencies from the [ecosyste.ms Packages API](https://packages.ecosyste.ms/). This powers the `outdated` and `licenses` commands.
4+
5+
## outdated
6+
7+
Show packages that have newer versions available in their registries.
8+
9+
```
10+
$ git pkgs outdated
11+
lodash 4.17.15 -> 4.17.21 (patch)
12+
express 4.17.0 -> 4.19.2 (minor)
13+
webpack 4.46.0 -> 5.90.3 (major)
14+
15+
3 outdated packages: 1 major, 1 minor, 1 patch
16+
```
17+
18+
Major updates are shown in red, minor in yellow, patch in cyan.
19+
20+
### Options
21+
22+
```
23+
-e, --ecosystem=NAME Filter by ecosystem
24+
-r, --ref=REF Git ref to check (default: HEAD)
25+
-f, --format=FORMAT Output format (text, json)
26+
--major Show only major version updates
27+
--minor Show only minor or major updates (skip patch)
28+
--stateless Parse manifests directly without database
29+
```
30+
31+
### Examples
32+
33+
Show only major updates:
34+
35+
```
36+
$ git pkgs outdated --major
37+
webpack 4.46.0 -> 5.90.3 (major)
38+
```
39+
40+
Check a specific release:
41+
42+
```
43+
$ git pkgs outdated v1.0.0
44+
```
45+
46+
JSON output:
47+
48+
```
49+
$ git pkgs outdated -f json
50+
```
51+
52+
## licenses
53+
54+
Show licenses for dependencies with optional compliance checks.
55+
56+
```
57+
$ git pkgs licenses
58+
lodash MIT (npm)
59+
express MIT (npm)
60+
request Apache-2.0 (npm)
61+
```
62+
63+
### Options
64+
65+
```
66+
-e, --ecosystem=NAME Filter by ecosystem
67+
-r, --ref=REF Git ref to check (default: HEAD)
68+
-f, --format=FORMAT Output format (text, json, csv)
69+
--allow=LICENSES Comma-separated list of allowed licenses
70+
--deny=LICENSES Comma-separated list of denied licenses
71+
--permissive Only allow permissive licenses (MIT, Apache, BSD, etc.)
72+
--copyleft Flag copyleft licenses (GPL, AGPL, etc.)
73+
--unknown Flag packages with unknown/missing licenses
74+
--group Group output by license
75+
--stateless Parse manifests directly without database
76+
```
77+
78+
### Compliance Checks
79+
80+
Only allow permissive licenses:
81+
82+
```
83+
$ git pkgs licenses --permissive
84+
lodash MIT (npm)
85+
express MIT (npm)
86+
gpl-pkg GPL-3.0 (npm) [copyleft]
87+
88+
1 license violation found
89+
```
90+
91+
Explicit allow list:
92+
93+
```
94+
$ git pkgs licenses --allow=MIT,Apache-2.0
95+
```
96+
97+
Deny specific licenses:
98+
99+
```
100+
$ git pkgs licenses --deny=GPL-3.0,AGPL-3.0
101+
```
102+
103+
Flag packages with no license information:
104+
105+
```
106+
$ git pkgs licenses --unknown
107+
```
108+
109+
### Output Formats
110+
111+
Group by license:
112+
113+
```
114+
$ git pkgs licenses --group
115+
MIT (45)
116+
lodash
117+
express
118+
...
119+
120+
Apache-2.0 (12)
121+
request
122+
...
123+
```
124+
125+
CSV for spreadsheets:
126+
127+
```
128+
$ git pkgs licenses -f csv > licenses.csv
129+
```
130+
131+
JSON for scripting:
132+
133+
```
134+
$ git pkgs licenses -f json
135+
```
136+
137+
### Exit Codes
138+
139+
The licenses command exits with code 1 if any violations are found. This makes it suitable for CI pipelines:
140+
141+
```yaml
142+
- run: git pkgs licenses --stateless --permissive
143+
```
144+
145+
### License Categories
146+
147+
Permissive licenses (allowed with `--permissive`):
148+
MIT, Apache-2.0, BSD-2-Clause, BSD-3-Clause, ISC, Unlicense, CC0-1.0, 0BSD, WTFPL, Zlib, BSL-1.0
149+
150+
Copyleft licenses (flagged with `--copyleft` or `--permissive`):
151+
GPL-2.0, GPL-3.0, LGPL-2.1, LGPL-3.0, AGPL-3.0, MPL-2.0 (and their variant identifiers)
152+
153+
## Data Source
154+
155+
Both commands fetch package metadata from [ecosyste.ms](https://packages.ecosyste.ms/), which aggregates data from npm, RubyGems, PyPI, Cargo, and other package registries.
156+
157+
## Caching
158+
159+
Package metadata is cached in the pkgs.sqlite3 database. Each package tracks when it was last enriched, and stale data (older than 24 hours) is automatically refreshed on the next query.
160+
161+
The cache stores:
162+
- Latest version number
163+
- License (SPDX identifier)
164+
- Description
165+
- Homepage URL
166+
- Repository URL
167+
168+
## Stateless Mode
169+
170+
Both commands support `--stateless` mode, which parses manifest files directly from git without requiring a database. This is useful in CI environments where you don't want to run `git pkgs init` first.
171+
172+
```
173+
$ git pkgs outdated --stateless
174+
$ git pkgs licenses --stateless --permissive
175+
```
176+
177+
In stateless mode, package metadata is fetched fresh each time and cached only in memory for the duration of the command.

docs/internals.md

Lines changed: 40 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -10,7 +10,7 @@ The executable at [`exe/git-pkgs`](../exe/git-pkgs) loads [`lib/git/pkgs.rb`](..
1010

1111
[`Git::Pkgs::Database`](../lib/git/pkgs/database.rb) manages the SQLite connection using [Sequel](https://sequel.jeremyevans.net/) and [sqlite3](https://github.com/sparklemotion/sqlite3-ruby). It looks for the `GIT_PKGS_DB` environment variable first, then falls back to `.git/pkgs.sqlite3`. Schema migrations are versioned through a `schema_info` table. See [schema.md](schema.md) for the full schema.
1212

13-
The schema has nine tables. Six handle dependency tracking:
13+
The schema has ten tables. Six handle dependency tracking:
1414

1515
- `commits` holds commit metadata plus a flag indicating whether it changed dependencies
1616
- `branches` tracks which branches have been analyzed and their last processed SHA
@@ -19,9 +19,10 @@ The schema has nine tables. Six handle dependency tracking:
1919
- `dependency_changes` records every add, modify, or remove event
2020
- `dependency_snapshots` stores full dependency state at intervals
2121

22-
Three more support vulnerability scanning:
22+
Four more support vulnerability scanning and package enrichment:
2323

24-
- `packages` tracks which packages have been synced with OSV and when
24+
- `packages` tracks package metadata, vulnerability sync status, and enrichment data
25+
- `versions` stores per-version metadata (license, published date) for time-travel queries
2526
- `vulnerabilities` caches CVE/GHSA data fetched from OSV
2627
- `vulnerability_packages` maps which packages are affected by each vulnerability
2728

@@ -188,6 +189,42 @@ When scanning, git-pkgs:
188189
6. Matches version ranges against actual versions
189190
7. Excludes withdrawn vulnerabilities
190191

192+
## Package Enrichment
193+
194+
The [`outdated`](../lib/git/pkgs/commands/outdated.rb) and [`licenses`](../lib/git/pkgs/commands/licenses.rb) commands fetch package metadata from the [ecosyste.ms Packages API](https://packages.ecosyste.ms/).
195+
196+
### Ecosystems Client
197+
198+
[`Git::Pkgs::EcosystemsClient`](../lib/git/pkgs/ecosystems_client.rb) wraps the ecosyste.ms REST API. It uses batch lookups (`POST /api/v1/packages/lookup`) to check up to 100 packages per request. The response includes latest version, license, description, and repository URL for each package.
199+
200+
### Enrichment Caching
201+
202+
Like vulnerability data, enrichment data is cached in the database. The `packages` table has an `enriched_at` timestamp. Packages are refreshed if their data is more than 24 hours old. The `Package#needs_enrichment?` method checks this threshold.
203+
204+
When running `outdated` or `licenses`:
205+
206+
1. Get dependencies at the target commit
207+
2. Find or create package records for each purl
208+
3. Check which packages need enrichment (never enriched or stale)
209+
4. Batch query ecosyste.ms for those packages
210+
5. Store the enrichment data via `Package#enrich_from_api`
211+
6. Use the cached data for version comparison or license checking
212+
213+
### Version Comparison
214+
215+
The `outdated` command classifies updates as major, minor, or patch by comparing semver components. It handles the `v` prefix common in some ecosystems and pads partial versions (e.g., "1.2" becomes "1.2.0"). Updates are color-coded: red for major, yellow for minor, cyan for patch.
216+
217+
### License Compliance
218+
219+
The `licenses` command checks licenses against configured policies:
220+
221+
- `--permissive` only allows common permissive licenses (MIT, Apache-2.0, BSD variants)
222+
- `--copyleft` flags GPL, AGPL, and similar licenses
223+
- `--allow` and `--deny` let you specify explicit lists
224+
- `--unknown` flags packages with no license information
225+
226+
The command exits with code 1 when violations are found, making it suitable for CI pipelines.
227+
191228
## Models
192229

193230
Sequel models live in [`lib/git/pkgs/models/`](../lib/git/pkgs/models/). They're straightforward except for a few convenience methods:

docs/schema.md

Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -125,6 +125,25 @@ Tracks packages for vulnerability sync status.
125125

126126
Indexes: `purl` (unique)
127127

128+
### versions
129+
130+
Stores per-version metadata for packages.
131+
132+
| Column | Type | Description |
133+
|--------|------|-------------|
134+
| id | integer | Primary key |
135+
| purl | string | Full versioned purl (e.g., "pkg:npm/[email protected]") |
136+
| package_purl | string | Parent package purl (e.g., "pkg:npm/lodash") |
137+
| license | string | License for this specific version |
138+
| published_at | datetime | When this version was published |
139+
| integrity | text | Integrity hash (e.g., SHA256) |
140+
| source | string | Data source |
141+
| enriched_at | datetime | When metadata was fetched |
142+
| created_at | datetime | |
143+
| updated_at | datetime | |
144+
145+
Indexes: `purl` (unique), `package_purl`
146+
128147
### vulnerabilities
129148

130149
Caches vulnerability data from OSV.
@@ -170,5 +189,7 @@ branches ──┬── branch_commits ──┬── commits
170189
171190
└── last_analyzed_sha (references commits.sha)
172191
192+
packages ──── versions (via package_purl)
193+
173194
vulnerabilities ──── vulnerability_packages
174195
```

lib/git/pkgs.rb

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -10,14 +10,18 @@
1010
require_relative "pkgs/analyzer"
1111
require_relative "pkgs/ecosystems"
1212
require_relative "pkgs/osv_client"
13+
require_relative "pkgs/ecosystems_client"
14+
require_relative "pkgs/spinner"
1315

16+
require_relative "pkgs/purl_helper"
1417
require_relative "pkgs/models/branch"
1518
require_relative "pkgs/models/branch_commit"
1619
require_relative "pkgs/models/commit"
1720
require_relative "pkgs/models/manifest"
1821
require_relative "pkgs/models/dependency_change"
1922
require_relative "pkgs/models/dependency_snapshot"
2023
require_relative "pkgs/models/package"
24+
require_relative "pkgs/models/version"
2125
require_relative "pkgs/models/vulnerability"
2226
require_relative "pkgs/models/vulnerability_package"
2327

@@ -43,6 +47,8 @@
4347
require_relative "pkgs/commands/diff_driver"
4448
require_relative "pkgs/commands/completions"
4549
require_relative "pkgs/commands/vulns"
50+
require_relative "pkgs/commands/outdated"
51+
require_relative "pkgs/commands/licenses"
4652

4753
module Git
4854
module Pkgs

lib/git/pkgs/analyzer.rb

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -33,7 +33,7 @@ class Analyzer
3333
REQUIRE Project.toml Manifest.toml
3434
shard.yml shard.lock
3535
elm-package.json elm_dependencies.json elm-stuff/exact-dependencies.json
36-
haxelib.json
36+
haxelib.json stack.yaml stack.yaml.lock
3737
action.yml action.yaml .github/workflows/*.yml .github/workflows/*.yaml
3838
Dockerfile docker-compose*.yml docker-compose*.yaml
3939
dvc.yaml vcpkg.json _generated-vcpkg-list.json

lib/git/pkgs/cli.rb

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -33,7 +33,9 @@ class CLI
3333
},
3434
"Analysis" => {
3535
"stats" => "Show dependency statistics",
36-
"stale" => "Show dependencies that haven't been updated"
36+
"stale" => "Show dependencies that haven't been updated",
37+
"outdated" => "Show packages with newer versions available",
38+
"licenses" => "Show licenses for dependencies"
3739
},
3840
"Security" => {
3941
"vulns" => "Scan for known vulnerabilities"
@@ -42,7 +44,7 @@ class CLI
4244

4345
COMMANDS = COMMAND_GROUPS.values.flat_map(&:keys).freeze
4446
COMMAND_DESCRIPTIONS = COMMAND_GROUPS.values.reduce({}, :merge).freeze
45-
ALIASES = { "praise" => "blame", "outdated" => "stale" }.freeze
47+
ALIASES = { "praise" => "blame" }.freeze
4648

4749
def self.run(args)
4850
new(args).run

lib/git/pkgs/commands/diff_driver.rb

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -24,18 +24,24 @@ class DiffDriver
2424
gems.locked
2525
glide.lock
2626
go.mod
27+
go.sum
28+
gradle.lockfile
2729
mix.lock
2830
npm-shrinkwrap.json
2931
package-lock.json
3032
packages.lock.json
3133
paket.lock
34+
pdm.lock
3235
pnpm-lock.yaml
3336
poetry.lock
3437
project.assets.json
3538
pubspec.lock
3639
pylock.toml
40+
renv.lock
3741
shard.lock
42+
stack.yaml.lock
3843
uv.lock
44+
verification-metadata.xml
3945
yarn.lock
4046
].freeze
4147

0 commit comments

Comments
 (0)