Releases: SemClone/mcp-semclone
1.6.2
v1.6.2 - Improved LLM/IDE Integration
Overview
This release enhances how mcp-semclone integrates with LLM-powered development environments (Cursor,
Windsurf, etc.) by improving tool recognition and selection guidance.
Problem Addressed
When users asked LLM-powered IDEs to "do compliance for this project", the LLM would often:
- Not recognize that mcp-semclone handles compliance tasks
- Attempt to install external tools like
npm install license-checkerorpip install scancode-toolkit - Struggle to select the correct tool from the 14 available options
What's New
1. Trigger Keywords for Better Recognition
Added explicit keywords that help LLMs recognize when to use mcp-semclone:
- "compliance", "license compliance", "do compliance"
- "SBOM", "software bill of materials", "supply chain security"
- "can I ship this?", "license compatibility"
- "mobile app compliance", "SaaS compliance"
2. Clear Decision Tree for Tool Selection
Added IF-THEN logic for common scenarios:
- IF user says "do compliance" → use
run_compliance_check() - IF source code directory → use
scan_directory() - IF package archive (.jar, .whl, etc.) → use
check_package() - IF compiled binary (.apk, .exe, etc.) → use
scan_binary() - IF "can I use [license]?" → use
validate_policy()
3. Condensed Instructions
Reduced instruction block by ~55% (384 → 175 lines) while maintaining critical information:
- Kept: Anti-patterns, key workflows, tool descriptions, constraints
- Condensed: License interpretation, binary scanning, workflows
- Removed: Verbose examples, redundant explanations
- Added: References to detailed docstrings
Impact
- Faster tool recognition for compliance queries in LLM-powered IDEs
- Reduced hallucination and incorrect tool selection
- Better first-time user experience
- Clearer guidance without reading full documentation
Files Changed
mcp_semclone/server.py: Enhanced MCP server instructionspyproject.toml: Version bump to 1.6.2mcp_semclone/__init__.py: Version bump to 1.6.2CHANGELOG.md: Added v1.6.2 entry
Full Changelog
Full Changelog: v1.6.1...v1.6.2
1.6.1
[1.6.1] - 2025-01-13
Fixed
download_and_scan_package: Handle osslili Informational Output
Problem: The download_and_scan_package tool was failing with JSON parsing errors when osslili outputs informational messages before JSON output. osslili now prefixes output with messages like:
Processing local path: package.tar.gz
This caused json.loads() to fail with "Expecting value: line 1 column 1 (char 0)"
Root Cause: Line 2026 in server.py attempted to parse osslili stdout directly as JSON without stripping informational messages.
Solution: Added preprocessing to find the first { character and parse JSON from that position, effectively stripping any informational messages before the JSON payload.
Changes:
mcp_semclone/server.py:
Added informational message stripping before JSON parsing (lines 2026-2031)
Finds first { in output and parses from there
Preserves backward compatibility with osslili versions that don't output messages
Installation Note: When using pipx, ensure purl2src is installed with console scripts enabled:
pipx inject mcp-semclone purl2src --include-apps --force
1.6.0
[1.6.0] - 2025-01-13
Added
Maven Parent POM License Resolution + Source Header Detection
Problem:
Maven packages often don't declare licenses directly in their package POM - the license can be in:
- Source file headers (e.g.,
// Licensed under Apache-2.0) - Parent POM (declared in parent but not in package POM)
When download_and_scan_package analyzed such packages, it would miss one or both of these sources.
Solution:
Enhanced Maven-specific license resolution to check ALL three sources and combine results:
How it works:
- Source file headers: osslili scans all source files for license headers → populates
detected_licenses - Package POM: upmex extracts metadata from package POM → populates
declared_license(if present) - Parent POM (Maven-specific): If no
declared_license, automatically triggers upmex with--registry --api clearlydefinedto query ClearlyDefined, which resolves parent POM licenses - Combines results: Parent POM license added to
detected_licensesif not already there - Updates result with
license_source: "parent_pom_via_clearlydefined"
Examples:
Scenario 1: License only in parent POM
download_and_scan_package(purl="pkg:maven/org.example/library@1.0.0")
# Before (v1.5.8):
# declared_license: None
# detected_licenses: []
# After (v1.6.0):
# declared_license: "Apache-2.0" # From parent POM
# detected_licenses: ["Apache-2.0"]
# metadata.license_source: "parent_pom_via_clearlydefined"
Scenario 2: Licenses in BOTH source headers AND parent POM
download_and_scan_package(purl="pkg:maven/org.example/another@2.0.0")
# Result:
# declared_license: "Apache-2.0" # From parent POM
# detected_licenses: ["MIT", "Apache-2.0"] # MIT from source, Apache from parent
# scan_summary: "Deep scan completed. found 2 licenses. (includes parent POM license). ..."
Changes:
- mcp_semclone/server.py:
- Added detailed 3-source license detection comment (lines 2059-2068)
- Maven parent POM resolution with ClearlyDefined API integration
- Combines parent POM license with source header licenses
- Enhanced summary showing "(includes parent POM license)"
- Tool docstring: Documented Maven-specific behavior with all three sources
- tests/test_server.py:
- Added test_maven_parent_pom_resolution (parent POM only)
- Added test_maven_combined_source_and_parent_pom_licenses (both sources)
Impact:
- Maven packages now report licenses from ALL sources (source headers + parent POM)
- Source header licenses (MIT, BSD) combined with parent POM licenses (Apache-2.0)
- Automatic detection - no user configuration needed
- Transparent tracking with license_source metadata field
- Enhanced summary indicates when the parent POM was used
- Falls back gracefully if parent POM resolution fails1.5.8
v1.5.8 - 2025-01-13
Fixed & Redesigned
Critical Bug + Complete Redesign: download_and_scan_package
Two critical issues fixed:
Problem 1 - Tool was completely broken (v1.5.7):
The download_and_scan_package tool returned JSON parsing errors:
"metadata_error": "the JSON object must be str, bytes or bytearray, not CompletedProcess"
"scan_error": "the JSON object must be str, bytes or bytearray, not CompletedProcess"
Root Cause:
The _run_tool() helper returns subprocess.CompletedProcess objects, but the code tried to parse them directly as JSON instead of using .stdout.
Problem 2 - Incorrect workflow (v1.5.7):
Original implementation tried to use upmex and osslili with PURLs directly, but these tools require local file paths.
NEW IMPLEMENTATION - Correct Multi-Method Workflow:
The tool now implements a robust 3-step fallback workflow:
- Primary: Use purl2notices to download and analyze (fastest, most comprehensive)
- Deep scan: If incomplete, use purl2src to get download URL → download artifact → run osslili for deep license scanning + upmex for metadata
- Online fallback: If still incomplete, use upmex --api clearlydefined for online metadata
New Dependencies:
- Added purl2src>=1.2.3 to translate PURLs to download URLs for Step 2
Impact:
- Tool now works correctly with proper multi-method fallback
- Uses the correct workflow: purl2notices → download+osslili+upmex → online APIs
- Returns method_used field showing which method succeeded
- Proper error handling with methods_attempted tracking
- JSON parsing fixed (uses .stdout correctly)
Testing:
- Added 5 comprehensive unit tests covering all workflows
- All 31 tests pass (26 existing + 5 new)
- Test coverage: primary workflow, deep scan, online fallback, error handling, file cleanup
Thanks:
User feedback identified the bugs and clarified the correct workflow design.
Full Changelog: https://github.com/SemClone/mcp-semclone/blob/main/CHANGELOG.md
1.5.7
What's Changed in v1.5.7
New Tool: download_and_scan_package - Comprehensive Package Source Analysis
FEATURE: Download package source from registries and perform deep scanning
Problem:
- Users didn't know we CAN download source code from PURLs
- LLMs said "I don't have a tool to download source code" when we do!
- Existing tools (check_package, generate_legal_notices_from_purls) can download but it wasn't explicit
- No single tool that orchestrates: download → extract metadata → scan licenses → find copyrights
Solution:
New download_and_scan_package(purl) tool that makes it CRYSTAL CLEAR we can download and analyze packages
What it does:
- Downloads actual package source from npm/PyPI/Maven/etc registries
- Extracts package to temporary directory
- Uses upmex to extract metadata (license, homepage, description)
- Uses osslili to perform deep license scanning of ALL source files
- Scans for copyright statements in source code
- Returns download location for manual inspection (optional)
When to use:
- Package metadata is incomplete (e.g., PyPI shows "UNKNOWN" license)
- Need to verify what's ACTUALLY in package files (not just package.json)
- Security auditing - inspect actual package contents before approval
- Find licenses embedded in source files that aren't in metadata
- Extract copyright statements from source code
Real-world example from user conversation:
User: "Can you check if duckdb@0.2.3 has license info in the source code?"
Before: "I don't have a tool to download source code"
After: download_and_scan_package("pkg:pypi/duckdb@0.2.3")
Result: {"declared_license": "UNKNOWN", "detected_licenses": ["CC0-1.0"], ...}
API:
# Basic usage - download and scan
download_and_scan_package(purl="pkg:pypi/duckdb@0.2.3")
# Keep downloaded files for manual inspection
result = download_and_scan_package(
purl="pkg:npm/suspicious-package@1.0.0",
keep_download=True
)
print(f"Inspect at: {result['download_path']}")
# Quick metadata only (no deep scan)
download_and_scan_package(
purl="pkg:pypi/requests@2.28.0",
scan_licenses=False
)
Returns:
- purl: Package URL analyzed
- download_path: Where files are (if keep_download=True)
- metadata: Package metadata from upmex
- declared_license: License from package metadata
- detected_licenses: Licenses found by scanning source files
- copyright_statements: Copyright statements extracted
- files_scanned: Number of files analyzed
- scan_summary: Human-readable summary
Why this matters:
- Makes capabilities EXPLICIT - LLMs know we can download source
- Single orchestrating tool - no need to chain multiple tools
- Comprehensive analysis - metadata + deep scanning + copyrights
- Real source verification - see what's actually in the package1.5.6
What's Changed in v1.5.6
Split Legal Notices Generation into Two Clear Tools
CLARITY IMPROVEMENT: Separated source scanning from PURL downloads
Problem:
- v1.5.5 had one tool with two modes (path OR purls parameter)
- Confusing for LLMs to choose which parameter to use
- Not obvious which approach is faster/recommended
Solution:
Split generate_legal_notices into two distinct tools with clear purposes:
1. generate_legal_notices(path, ...) - PRIMARY TOOL (FAST)
- Default tool for most cases
- Scans source code directly (node_modules/, site-packages/)
- Detects all transitive dependencies automatically
- 10x faster than downloading from registries
- Required parameter:
path(no optional parameters confusion)
2. generate_legal_notices_from_purls(purls, ...) - SPECIAL CASES (SLOW)
- Use only when dependencies NOT installed locally
- Downloads packages from npm/PyPI/etc registries
- Required parameter:
purlslist - Clear name indicates it's downloading from registries
Benefits:
- Clear separation of concerns: Each tool does one thing
- Better LLM guidance: Tool names indicate purpose and performance
- No parameter confusion: path vs purls is now two separate tools
- Self-documenting: Names make it obvious which to use
Updated Workflow Instructions:
- CRITICAL WORKFLOW RULES now lists two tools clearly
- Guidance on when to use each tool
- Emphasizes generate_legal_notices (path) as default
Breaking Changes
generate_legal_notices(purls=[...])no longer works- Use
generate_legal_notices_from_purls(purls=[...])instead generate_legal_noticesnow requirespathparameter (not optional)
Migration Guide
# OLD (v1.5.5 - no longer works):
generate_legal_notices(purls=purl_list, output_file="NOTICE.txt")
# NEW (v1.5.6):
generate_legal_notices_from_purls(purls=purl_list, output_file="NOTICE.txt")
# RECOMMENDED (v1.5.6 - use this instead):
generate_legal_notices(path="/path/to/project", output_file="NOTICE.txt")
User Impact
- Clearer workflow: Know which tool to use by default
- 10x performance improvement: Fast source scanning vs slow downloads
- Better LLM guidance: Tool names are self-documenting
- Simpler API: Each tool has one clear purpose1.5.4
What's Changed in v1.5.4
Server Instructions: Prevent External Tool Installation
Added prominent warning to prevent LLMs from installing external compliance tools:
- Added "IMPORTANT - ALL TOOLS ARE BUILT-IN" section at the top of server instructions
- Explicitly warns against installing: npm license-checker, scancode-toolkit, ngx, fossil, etc.
- Clarifies that all necessary tools (purl2notices, ossnotices, osslili, ospac, vulnq) are pre-installed
- Directs LLMs to use MCP-provided tools instead of trying to install external packages
Why this matters:
- Prevents LLMs from wasting time trying to install tools that are already available
- Avoids confusion about which tools to use (use MCP tools, not external CLIs)
- Reduces risk of LLMs using outdated or incorrect external tools
- Ensures consistent compliance scanning using the SEMCL.ONE toolchain
User Impact:
- Faster response times (no unnecessary tool installation attempts)
- More reliable results (always uses the correct, pre-installed tools)
- Clearer guidance for LLMs on how to perform compliance tasks
1.5.2
v1.5.2 - 2025-01-12
Fixed
Improved Workflow Instructions to Prevent Single-Package Detection Issues
Problem: Users reported that compliance checks generated notices for only 1 package, rather than all transitive dependencies (e.g., 1 package instead of 48 in node_modules/).
Root Cause: LLMs were bypassing scan_directory or not using ALL packages from the scan result. Some were manually extracting PURLs from package.json instead of using the comprehensive scan.
Changes:
- Enhanced server instructions with CRITICAL WORKFLOW RULES section
- Added explicit warnings in generate_legal_notices against manual PURL extraction
- Added diagnostic logging to warn when suspiciously few packages detected (3 packages or fewer)
- Improved examples showing WRONG vs RIGHT workflow approaches
Impact:
- LLMs now understand ALWAYS to use scan_directory first
- Clear guidance that npm project with one dependency = approximately 50 packages in node_modules
- Better visibility when the workflow is not followed correctly
Note: The underlying MCP server code and purl2notices scanning work correctly. This release only improves instructions and logging to prevent misunderstandings in the workflow.
What's Changed
- Improve workflow instructions to prevent single-package detection issues
- Bump version to 1.5.2 and update changelog
Full Changelog: v1.5.1...v1.5.2
1.5.1
v1.5.1 - Architecture Simplification
Changed
Architecture Simplification: purl2notices for Everything
scan_directory now uses purl2notices scan mode exclusively:
- REMOVED: osslili dependency for scan_directory (still used by check_package)
- REMOVED: src2purl dependency entirely (replaced by purl2notices)
- NEW: purl2notices scan mode handles all scanning in one pass:
- Detects ALL packages including transitive dependencies
- Extracts licenses from both project source and dependencies
- Extracts copyright statements automatically from source code
- No manual PURL extraction needed
Benefits
- 100% accurate package detection (vs 83-88% fuzzy matching from src2purl)
- Detects ALL transitive dependencies (e.g., 51 packages vs 8 fuzzy matches)
- No confusing fuzzy match results
- Automatic copyright extraction as a bonus feature
- Simpler architecture: one tool instead of two
For npm projects
- Scans entire node_modules/ directory (50+ packages)
- NOT just direct dependencies from package.json (1-2 packages)
- Includes all transitive dependencies automatically
Deprecated parameters in scan_directory
identify_packages- now deprecated, purl2notices always detects packagescheck_licenses- now deprecated, purl2notices always scans licenses- Parameters still accepted for backwards compatibility but have no effect
Updated tool descriptions
- scan_directory now documents that it detects ALL packages including transitive deps
- Clarified that for npm projects, this means entire node_modules/ not just package.json
- Added emphasis on automatic copyright extraction
- Updated workflow examples to reflect simplified approach
Dependencies
- Removed:
src2purl>=1.3.4(no longer used) - Still kept:
osslili>=1.5.7andupmex>=1.6.7(used by check_package for archives)
What's Changed
- v1.5.1: Simplify architecture - use purl2notices for comprehensive scanning by @oscarvalenzuelab in #21
- Fix purl2notices integration - use JSON format output
- Fix test mocks to match purl2notices JSON output format
Full Changelog: v1.4.0...v1.5.1
1.4.0
mcp-semclone v1.4.0
This release implements a universal compliance workflow and improves agent usability. Changes include breaking changes from the removal of project-type-specific tools, as well as enhancements from v1.3.6 and v1.3.7.
Breaking Changes
Removed generate_mobile_legal_summary (formerly generate_mobile_legal_notice)
Project-type-specific tools do not scale across different distribution types.
Migration paths:
- Use run_compliance_check for automated one-shot workflows
- Use generate_legal_notices for manual workflow orchestration
The generate_legal_notices tool was always the correct choice for complete legal documentation.
New Tool: run_compliance_check
Universal compliance workflow that works for any project type (mobile, desktop, SaaS, embedded, etc).
Capabilities:
- Automatic workflow: scan → generate NOTICE.txt → validate policy → generate sbom.json → check vulnerabilities
- Returns APPROVED/REJECTED decision with risk level
- Generates NOTICE.txt and sbom.json artifacts
- Provides a complete report with actionable recommendations
- Uses default policy if none specified
- Distribution type is a parameter, not a separate workflow
Usage:
result = run_compliance_check(path, distribution_type="mobile")
Enhanced Tool Descriptions
All primary tools now include structured guidance:
scan_directory:
- Positioned as FIRST STEP in workflows
- WHEN TO USE and WHEN NOT TO USE sections
- WORKFLOW POSITION guidance
- Three complete workflow examples
generate_legal_notices:
- Positioned as a PRIMARY TOOL for legal documentation
- Emphasizes purl2notices backend for copyright extraction
- WHEN TO USE and WHEN NOT TO USE sections
- Three workflow examples: mobile app compliance, package analysis, batch compliance
validate_license_list:
- Positioned for quick license validation
- Clear return values: safe_for_distribution, app_store_compatible
- Complete workflow example
Documentation Updates
- Updated IDE integration guides for Cursor, Cline, and Kiro
- Updated mobile app compliance guide
- Updated configuration examples and autoApprove lists
- Removed all references to deleted tools
- Added migration guidance
Architecture Changes
Design principles:
- No project-type-specific tools
- Distribution type used only for policy validation context
- Default policy provided
- Single standardized workflow
- Scales without code changes
Standard workflow options:
Option 1 (Recommended):
run_compliance_check(path, distribution_type) → APPROVED/REJECTED + artifacts
Option 2 (Manual):
scan_directory → generate_legal_notices → validate_license_list → generate_sbom
From v1.3.7 (2025-11-10)
License Approval/Rejection Workflow:
- Enhanced validate_policy tool with approve/deny/review decision support
- Added context parameter for static_linking and dynamic_linking scenarios
- Returns structured decision output with action, severity, requirements, and remediation
- Added summary object with boolean flags: approved, blocked, requires_review
- Distribution-specific policy rules (GPL blocked for mobile, AGPL blocked for SaaS)
- Updated OSPAC dependency to >=1.2.3
From v1.3.6 (2025-11-10)
Pipx Installation Support:
- Comprehensive pipx installation documentation
- Instructions for pipx inject to include all SEMCL.ONE tools
- Isolated environment prevents dependency conflicts
- All tools are accessible as both libraries and CLI commands
- Updated MCP configuration examples for pip and pipx
- Documentation for included tools: osslili, binarysniffer, src2purl, purl2notices, ospac, vulnq, upmex
Migration Example
Before v1.4.0
scan_result = scan_directory(path)
notice = generate_mobile_legal_summary(project_name, licenses)
After v1.4.0
result = run_compliance_check(path, distribution_type="mobile")
Automatically generates NOTICE.txt and sbom.json
Returns APPROVED/REJECTED decision
Alternative: manual workflow
scan_result = scan_directory(path, identify_packages=True)
purls = [pkg["purl"] for pkg in scan_result["packages"]]
generate_legal_notices(purls, output_file="NOTICE.txt")
See https://github.com/SemClone/mcp-semclone/blob/main/CHANGELOG.md for complete details.