-
Notifications
You must be signed in to change notification settings - Fork 22
Add feature to detect if a package is unmaintained #111
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
| Repository URL or None if not a GitHub package | ||
| """ | ||
| if package.name.startswith("github.com/"): |
Check failure
Code scanning / CodeQL
Incomplete URL substring sanitization High
github.com/
Show autofix suggestion
Hide autofix suggestion
Copilot Autofix
AI 2 months ago
The best way to fix this issue is to reliably parse the fully-qualified repository URL/host using standard parsing utilities and check that the host is exactly (or ends with) "github.com" before proceeding.
Steps:
- Parse
package.nameas a URL (usingurllib.parse.urlparse). - Check that the hostname component of the parsed URL is
"github.com". - If so, proceed to extract the owner and repo from the path.
- If not, return None.
You only need to edit the get_repository_url function in file src/it_depends/go.py, and add the required import (from urllib.parse import urlparse) if not present (the file currently imports request only from urllib).
-
Copy modified line R19 -
Copy modified lines R513-R521
| @@ -16,6 +16,7 @@ | ||
| from subprocess import DEVNULL, CalledProcessError, check_call, check_output | ||
| from tempfile import TemporaryDirectory | ||
| from urllib import request | ||
| from urllib.parse import urlparse | ||
| from urllib.error import HTTPError, URLError | ||
|
|
||
| if TYPE_CHECKING: | ||
| @@ -509,11 +510,14 @@ | ||
| Repository URL or None if not a GitHub package | ||
|
|
||
| """ | ||
| if package.name.startswith("github.com/"): | ||
| # Extract owner/repo from path like github.com/owner/repo/subpath | ||
| parts = package.name.split("/") | ||
| if len(parts) >= 3: | ||
| owner = parts[1] | ||
| repo = parts[2] | ||
| # Parse as URL or fallback to direct path check | ||
| name = package.name | ||
| parsed = urlparse(name if name.startswith("http") else f"https://{name}") | ||
| if parsed.hostname == "github.com": | ||
| # Extract owner/repo from parsed path (format: /owner/repo[/...]) | ||
| parts = parsed.path.strip("/").split("/") | ||
| if len(parts) >= 2: | ||
| owner = parts[0] | ||
| repo = parts[1] | ||
| return f"https://github.com/{owner}/{repo}" | ||
| return None |
| # Try common keys for repository URLs | ||
| for key in ["Source", "Repository", "Homepage", "Code", "source"]: | ||
| url = project_urls.get(key) | ||
| if url and "github.com" in url: |
Check failure
Code scanning / CodeQL
Incomplete URL substring sanitization High
github.com
Show autofix suggestion
Hide autofix suggestion
Copilot Autofix
AI 2 months ago
To fix this issue, the code should properly parse the candidate URL with urllib.parse.urlparse and then check if its hostname is exactly github.com or is a valid subdomain of github.com. Specifically, replace the substring check with code that extracts the hostname and checks if it is github.com or ends with .github.com. To do this, import urlparse from urllib.parse at the top if not already present. Only return the URL if it passes this stricter host check.
Edit only within the get_repository_url method in src/it_depends/pip.py. Add the import if needed. The minimal change is to parse the URL in the if url ... block at line 263 and change the condition.
-
Copy modified line R9 -
Copy modified lines R264-R268
| @@ -6,6 +6,7 @@ | ||
| import subprocess | ||
| import sys | ||
| from logging import getLogger | ||
| from urllib.parse import urlparse | ||
| from pathlib import Path | ||
| from tempfile import TemporaryDirectory | ||
| from typing import TYPE_CHECKING | ||
| @@ -260,8 +261,11 @@ | ||
| # Try common keys for repository URLs | ||
| for key in ["Source", "Repository", "Homepage", "Code", "source"]: | ||
| url = project_urls.get(key) | ||
| if url and "github.com" in url: | ||
| return url | ||
| if url: | ||
| parsed = urlparse(url) | ||
| host = parsed.hostname | ||
| if host == "github.com" or (host and host.endswith(".github.com")): | ||
| return url | ||
| return None | ||
| except requests.RequestException: | ||
| return None |
|
Thanks for tackling this - identifying unmaintained dependencies is a real security concern and a valuable addition to it-depends. After reviewing the implementation, I have some suggestions for simplifying the approach. The current PR is ~700 lines across 12 files, but I think we can achieve the same goal with ~150 lines in 2-3 files by following the existing Key suggestions:1. Follow the The vulnerability checking feature ( # maintenance.py - mirrors audit.py structure
@dataclass
class MaintenanceInfo:
repo_url: str | None
last_commit: str | None
days_since_update: int | None
is_stale: bool
error: str | None = None
def check_maintenance(packages, stale_days=365, github_token=None):
"""Enrich packages with maintenance info."""
...2. Don't modify the resolvers The PR adds
This keeps the maintenance feature self-contained and doesn't expand the resolver interface. 3. Skip custom caching for now The 4. Skip parallelism for v1 Sequential processing is fine for typical dependency trees (<100 packages). The ThreadPoolExecutor + tqdm adds complexity that can be added later if needed. 5. Use a dataclass The Summary
The goal is great - let's just slim down the implementation to match the existing patterns in the codebase. Happy to help with a revised approach if that would be useful. |
Add Package Maintenance Status Checking to SBOM
Summary
This PR adds functionality to check when packages in the Software Bill of Materials (SBOM) were last maintained, helping users identify stale or unmaintained dependencies in their software supply chain.
Motivation
Unmaintained dependencies pose significant security and reliability risks. This feature enables users to:
Key Features
🔍 Maintenance Checking
⚡ Performance & Reliability
📊 Output Integration
maintenancefield to each package with repository URL, last commit date, staleness flag, and days since updateImplementation Details
New Components
maintenance.py(380 lines): Core logic withGitHubClient, GitHub URL parser, and parallel maintenance checkingMaintenanceInfoclass: Data structure for maintenance status with serialization supportGitHubMetadataCachetable: SQLite caching for GitHub API responsesModified Components
models.py: ExtendedPackageclass withmaintenance_infofield andupdate_maintenance_info()methodconfig.py: Added 4 new CLI flags (--check-maintenance,--stale-threshold,--github-token,--maintenance-cache-ttl)_cli.py: Integrated maintenance check into main flow after vulnerability auditsbom.py: Extended CycloneDX output with maintenance propertiesget_repository_url()static methodsArchitecture
Follows the same pattern as the existing
audit.pyvulnerability checking:Usage Examples
Example Output
{ "pip:requests": { "2.31.0": { "name": "requests", "version": "2.31.0", "source": "pip", "dependencies": {...}, "vulnerabilities": [], "maintenance": { "repository_url": "https://github.com/psf/requests", "last_commit_date": "2023-05-22T14:30:00Z", "is_stale": false, "days_since_update": 120, "error": null } } } }Testing
Unit Tests (
test/test_maintenance.py)Manual Testing
--helpDocumentation
Breaking Changes
None. This is a purely additive feature that:
--check-maintenanceflagFiles Changed
New Files (2)
src/it_depends/maintenance.py(380 lines)test/test_maintenance.py(330 lines)Modified Files (10)
src/it_depends/models.py(+59 lines)src/it_depends/config.py(+24 lines)src/it_depends/_cli.py(+10 lines)src/it_depends/db.py(+12 lines)src/it_depends/sbom.py(+36 lines)src/it_depends/npm.py(+20 lines)src/it_depends/pip.py(+28 lines)src/it_depends/cargo.py(+23 lines)src/it_depends/go.py(+20 lines)README.md(+55 lines)Total Impact: ~700 lines of new code across 12 files
Checklist
Future Enhancements
Potential follow-up work (not included in this PR):