You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository was archived by the owner on Jan 22, 2026. It is now read-only.
Copy file name to clipboardExpand all lines: docs/internals.md
+60-3Lines changed: 60 additions & 3 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -8,9 +8,9 @@ The executable at [`exe/git-pkgs`](../exe/git-pkgs) loads [`lib/git/pkgs.rb`](..
8
8
9
9
## Database
10
10
11
-
[`Git::Pkgs::Database`](../lib/git/pkgs/database.rb) manages the SQLite connection using [ActiveRecord](https://github.com/rails/rails/tree/main/activerecord) and [sqlite3](https://github.com/sparklemotion/sqlite3-ruby). It looks for the `GIT_PKGS_DB` environment variable first, then falls back to `.git/pkgs.sqlite3`. Schema migrations are versioned through a `schema_info` table. See [schema.md](schema.md) for the full schema.
11
+
[`Git::Pkgs::Database`](../lib/git/pkgs/database.rb) manages the SQLite connection using [Sequel](https://sequel.jeremyevans.net/) and [sqlite3](https://github.com/sparklemotion/sqlite3-ruby). It looks for the `GIT_PKGS_DB` environment variable first, then falls back to `.git/pkgs.sqlite3`. Schema migrations are versioned through a `schema_info` table. See [schema.md](schema.md) for the full schema.
12
12
13
-
The schema has six main tables:
13
+
The schema has nine tables. Six handle dependency tracking:
14
14
15
15
-`commits` holds commit metadata plus a flag indicating whether it changed dependencies
16
16
-`branches` tracks which branches have been analyzed and their last processed SHA
@@ -19,6 +19,12 @@ The schema has six main tables:
19
19
-`dependency_changes` records every add, modify, or remove event
20
20
-`dependency_snapshots` stores full dependency state at intervals
21
21
22
+
Three more support vulnerability scanning:
23
+
24
+
-`packages` tracks which packages have been synced with OSV and when
25
+
-`vulnerabilities` caches CVE/GHSA data fetched from OSV
26
+
-`vulnerability_packages` maps which packages are affected by each vulnerability
27
+
22
28
Snapshots exist because replaying thousands of change records to answer "what dependencies existed at commit X?" would be slow. Instead, we store the complete dependency set every 50 commits by default. Point-in-time queries find the nearest snapshot and replay only the changes since then.
23
29
24
30
## Git Access
@@ -131,9 +137,60 @@ This hybrid approach means `where` shows current file contents rather than histo
131
137
132
138
Create a new file in [`lib/git/pkgs/commands/`](../lib/git/pkgs/commands/). Define `self.description` for help text and `self.run(args)` as the entry point. The CLI finds commands by constantizing the argument.
133
139
140
+
## Vulnerability Scanning
141
+
142
+
The [`vulns` command](../lib/git/pkgs/commands/vulns.rb) checks dependencies against the [OSV database](https://osv.dev). Three additional tables support this:
143
+
144
+
-`packages` tracks which packages have been checked and when
-`vulnerability_packages` maps which packages are affected by each vulnerability
147
+
148
+
### OSV Client
149
+
150
+
[`Git::Pkgs::OsvClient`](../lib/git/pkgs/osv_client.rb) wraps the OSV REST API. It uses batch queries (`/querybatch`) to check up to 1000 packages per request, then fetches full details for each vulnerability found (`/vulns/{id}`). HTTP connections are reused across requests.
151
+
152
+
### Ecosystem Mapping
153
+
154
+
OSV uses different ecosystem names than bibliothecary. [`Git::Pkgs::Ecosystems`](../lib/git/pkgs/ecosystems.rb) translates between them:
155
+
156
+
| bibliothecary | OSV | purl |
157
+
|---------------|-----|------|
158
+
| rubygems | RubyGems | gem |
159
+
| npm | npm | npm |
160
+
| pypi | PyPI | pypi |
161
+
| cargo | crates.io | cargo |
162
+
| go | Go | golang |
163
+
| maven | Maven | maven |
164
+
| nuget | NuGet | nuget |
165
+
| packagist | Packagist | composer |
166
+
| hex | Hex | hex |
167
+
| pub | Pub | pub |
168
+
169
+
Only these ecosystems support vulnerability scanning. Others (Docker, Actions, etc.) are tracked for dependency history but have no OSV coverage.
170
+
171
+
### Version Matching
172
+
173
+
[`VulnerabilityPackage#affects_version?`](../lib/git/pkgs/models/vulnerability_package.rb) uses the [vers](https://github.com/package-url/vers) gem to check if a version falls within an affected range. OSV returns ranges like `>=1.0.0 <2.0.0` or `<4.17.21`. The vers gem handles semver comparison across different ecosystems.
174
+
175
+
Version ranges can have multiple OR conditions separated by `||`. Each condition is checked independently: `<1.0 || >=2.0 <3.0` means "affected if below 1.0 OR between 2.0 and 3.0".
176
+
177
+
### Caching
178
+
179
+
Vulnerability data is cached in the database to avoid repeated API calls. Each package in the `packages` table has a `vulns_synced_at` timestamp. Packages are refreshed if their data is more than 24 hours old. The `vulns sync --refresh` command forces a full refresh.
180
+
181
+
When scanning, git-pkgs:
182
+
183
+
1. Gets dependencies at the target commit (from snapshots or by parsing manifests)
184
+
2. Filters to ecosystems with OSV support
185
+
3. Checks which packages need syncing (never synced or stale)
186
+
4. Batch queries OSV for those packages
187
+
5. Fetches full vulnerability details for any new CVEs found
188
+
6. Matches version ranges against actual versions
189
+
7. Excludes withdrawn vulnerabilities
190
+
134
191
## Models
135
192
136
-
ActiveRecord models live in [`lib/git/pkgs/models/`](../lib/git/pkgs/models/). They're straightforward except for a few convenience methods:
193
+
Sequel models live in [`lib/git/pkgs/models/`](../lib/git/pkgs/models/). They're straightforward except for a few convenience methods:
0 commit comments