Skip to content

Commit 99c6837

Browse files
Merge pull request #29 from SemClone/feature/maven-parent-pom-resolution
feat: Add Maven parent POM license resolution (v1.6.0)
2 parents 9a6f1d0 + 8a4f2d3 commit 99c6837

File tree

6 files changed

+261
-5
lines changed

6 files changed

+261
-5
lines changed

CHANGELOG.md

Lines changed: 68 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -7,6 +7,74 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
77

88
## [Unreleased]
99

10+
## [1.6.0] - 2025-01-13
11+
12+
### Added
13+
14+
#### Maven Parent POM License Resolution + Source Header Detection
15+
16+
**Problem:**
17+
Maven packages often don't declare licenses directly in their package POM - the license can be in:
18+
1. **Source file headers** (e.g., `// Licensed under Apache-2.0`)
19+
2. **Parent POM** (declared in parent but not in package POM)
20+
21+
When `download_and_scan_package` analyzed such packages, it would miss one or both of these sources.
22+
23+
**Solution:**
24+
Enhanced Maven-specific license resolution to check ALL three sources and combine results:
25+
26+
**How it works:**
27+
1. **Source file headers**: osslili scans all source files for license headers → populates `detected_licenses`
28+
2. **Package POM**: upmex extracts metadata from package POM → populates `declared_license` (if present)
29+
3. **Parent POM** (Maven-specific): If no `declared_license`, automatically triggers upmex with `--registry --api clearlydefined` to query ClearlyDefined which resolves parent POM licenses
30+
4. **Combines results**: Parent POM license added to `detected_licenses` if not already there
31+
5. Updates result with `license_source: "parent_pom_via_clearlydefined"`
32+
33+
**Examples:**
34+
35+
**Scenario 1: License only in parent POM**
36+
```python
37+
download_and_scan_package(purl="pkg:maven/org.example/library@1.0.0")
38+
39+
# Before (v1.5.8):
40+
# declared_license: None
41+
# detected_licenses: []
42+
43+
# After (v1.6.0):
44+
# declared_license: "Apache-2.0" # From parent POM
45+
# detected_licenses: ["Apache-2.0"]
46+
# metadata.license_source: "parent_pom_via_clearlydefined"
47+
```
48+
49+
**Scenario 2: Licenses in BOTH source headers AND parent POM**
50+
```python
51+
download_and_scan_package(purl="pkg:maven/org.example/another@2.0.0")
52+
53+
# Result:
54+
# declared_license: "Apache-2.0" # From parent POM
55+
# detected_licenses: ["MIT", "Apache-2.0"] # MIT from source, Apache from parent
56+
# scan_summary: "Deep scan completed. found 2 licenses. (includes parent POM license). ..."
57+
```
58+
59+
**Changes:**
60+
- mcp_semclone/server.py:
61+
* Added detailed 3-source license detection comment (lines 2059-2068)
62+
* Maven parent POM resolution with ClearlyDefined API integration
63+
* Combines parent POM license with source header licenses
64+
* Enhanced summary showing "(includes parent POM license)"
65+
- Tool docstring: Documented Maven-specific behavior with all three sources
66+
- tests/test_server.py:
67+
* Added test_maven_parent_pom_resolution (parent POM only)
68+
* Added test_maven_combined_source_and_parent_pom_licenses (both sources)
69+
70+
**Impact:**
71+
- ✅ Maven packages now report licenses from ALL sources (source headers + parent POM)
72+
- ✅ Source header licenses (MIT, BSD) combined with parent POM licenses (Apache-2.0)
73+
- ✅ Automatic detection - no user configuration needed
74+
- ✅ Transparent tracking with `license_source` metadata field
75+
- ✅ Enhanced summary indicates when parent POM was used
76+
- ✅ Falls back gracefully if parent POM resolution fails
77+
1078
## [1.5.8] - 2025-01-13
1179

1280
### Fixed & Redesigned

examples/strands-agent-ollama/requirements.txt

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22
# Python 3.10+ required
33

44
# MCP server with SEMCL.ONE compliance tools
5-
mcp-semclone>=1.5.8
5+
mcp-semclone>=1.6.0
66

77
# MCP SDK for connecting to MCP servers
88
mcp>=1.0.0

mcp_semclone/__init__.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
"""MCP SEMCL.ONE - Model Context Protocol server for OSS compliance."""
22

3-
__version__ = "1.5.8"
3+
__version__ = "1.6.0"
44
__author__ = "Oscar Valenzuela B."
55
__email__ = "oscar.valenzuela.b@gmail.com"

mcp_semclone/server.py

Lines changed: 51 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1841,14 +1841,16 @@ async def download_and_scan_package(
18411841
**Workflow (tries methods in order until sufficient data is collected):**
18421842
1. **Primary**: Use purl2notices to download and analyze (fastest, most comprehensive)
18431843
2. **Deep scan**: If incomplete, use purl2src to get download URL → download artifact → run osslili for deep license scanning + upmex for metadata
1844+
- **Maven-specific**: If license still missing for Maven packages, uses upmex with --registry --api clearlydefined to resolve parent POM licenses
18441845
3. **Online fallback**: If still incomplete, use upmex --api clearlydefined/purldb for online metadata
18451846
18461847
**What this tool does:**
18471848
- Downloads the actual package source code from npm/PyPI/Maven/etc registries
18481849
- Performs comprehensive license and copyright analysis
18491850
- Extracts package metadata (name, version, homepage, description)
1850-
- Scans ALL source files for embedded licenses (not just package.json/setup.py)
1851+
- Scans ALL source files for embedded licenses (not just package.json/setup.py/pom.xml)
18511852
- Returns copyright statements found in actual source code
1853+
- **Maven packages**: Automatically resolves parent POM licenses when not declared in package POM
18521854
18531855
**When to use this tool:**
18541856
- Package metadata is incomplete or missing (e.g., "UNKNOWN" license in PyPI)
@@ -2054,10 +2056,57 @@ async def download_and_scan_package(
20542056
result["declared_license"] = upmex_data["license"]
20552057
logger.info(f"upmex metadata extracted")
20562058

2059+
# MAVEN SPECIFIC: Check parent POM for declared license
2060+
# License can be in:
2061+
# 1. Source file headers (already checked by osslili → detected_licenses)
2062+
# 2. Package POM (already checked by upmex → declared_license)
2063+
# 3. Parent POM (need to check with --registry --api clearlydefined)
2064+
#
2065+
# We check parent POM if:
2066+
# - No declared_license found in package POM, OR
2067+
# - We have detected_licenses from source but no official declaration
2068+
if purl.startswith("pkg:maven/") and not result["declared_license"]:
2069+
logger.info(f"Maven package missing declared license (may have detected licenses from source), checking parent POM")
2070+
try:
2071+
upmex_maven_result = _run_tool("upmex", [
2072+
"extract",
2073+
str(download_file),
2074+
"--format", "json",
2075+
"--registry",
2076+
"--api", "clearlydefined"
2077+
])
2078+
2079+
if upmex_maven_result.returncode == 0 and upmex_maven_result.stdout:
2080+
maven_data = json.loads(upmex_maven_result.stdout)
2081+
if maven_data.get("license"):
2082+
result["declared_license"] = maven_data["license"]
2083+
result["metadata"]["license"] = maven_data["license"]
2084+
result["metadata"]["license_source"] = "parent_pom_via_clearlydefined"
2085+
2086+
# Add to detected_licenses if not already there
2087+
if maven_data["license"] not in result["detected_licenses"]:
2088+
result["detected_licenses"].append(maven_data["license"])
2089+
2090+
logger.info(f"Maven parent POM license found: {maven_data['license']}")
2091+
if result["detected_licenses"]:
2092+
logger.info(f"Combined with source header licenses: {result['detected_licenses']}")
2093+
except Exception as e:
2094+
logger.warning(f"Maven parent POM resolution failed: {e}")
2095+
20572096
# If we got data from deep scan, mark as successful
20582097
if result["detected_licenses"] or result["metadata"]:
20592098
result["method_used"] = "deep_scan"
2060-
result["scan_summary"] = f"Deep scan completed. Downloaded and analyzed with osslili + upmex. Found {len(result['detected_licenses'])} licenses and {len(result['copyright_statements'])} copyrights."
2099+
2100+
# Build summary showing license sources
2101+
summary_parts = ["Deep scan completed"]
2102+
if result["detected_licenses"]:
2103+
summary_parts.append(f"found {len(result['detected_licenses'])} licenses")
2104+
if result["metadata"].get("license_source") == "parent_pom_via_clearlydefined":
2105+
summary_parts.append("(includes parent POM license)")
2106+
if result["copyright_statements"]:
2107+
summary_parts.append(f"{len(result['copyright_statements'])} copyrights")
2108+
2109+
result["scan_summary"] = ". ".join(summary_parts) + "."
20612110
return result
20622111

20632112
elif fallback_cmd:

pyproject.toml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@ build-backend = "setuptools.build_meta"
44

55
[project]
66
name = "mcp-semclone"
7-
version = "1.5.8"
7+
version = "1.6.0"
88
description = "Model Context Protocol server for SEMCL.ONE OSS compliance toolchain"
99
readme = "README.md"
1010
requires-python = ">=3.10"

tests/test_server.py

Lines changed: 139 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -727,3 +727,142 @@ async def test_keep_download(self):
727727

728728
# Verify cleanup was NOT called (keep_download=True)
729729
mock_rmtree.assert_not_called()
730+
731+
@pytest.mark.asyncio
732+
async def test_maven_parent_pom_resolution(self):
733+
"""Test Maven parent POM license resolution when package POM has no license."""
734+
# Mock purl2src output for Maven package
735+
purl2src_data = [{
736+
"purl": "pkg:maven/org.example/library@1.0.0",
737+
"download_url": "https://repo1.maven.org/maven2/org/example/library/1.0.0/library-1.0.0.jar",
738+
"validated": True
739+
}]
740+
741+
# Mock osslili output (no licenses found in JAR)
742+
osslili_data = {
743+
"components": []
744+
}
745+
746+
# Mock upmex output (no license in package POM)
747+
upmex_no_license = {
748+
"name": "library",
749+
"version": "1.0.0"
750+
# No license field
751+
}
752+
753+
# Mock upmex with --registry --api clearlydefined (finds parent POM license)
754+
upmex_with_parent = {
755+
"name": "library",
756+
"version": "1.0.0",
757+
"license": "Apache-2.0" # Found in parent POM
758+
}
759+
760+
with patch("mcp_semclone.server._run_tool") as mock_run, \
761+
patch("urllib.request.urlretrieve") as mock_download, \
762+
patch("tempfile.mkdtemp", return_value="/tmp/test"), \
763+
patch("pathlib.Path.exists", return_value=True), \
764+
patch("shutil.rmtree"):
765+
766+
call_count = {"upmex": 0}
767+
768+
def run_tool_side_effect(tool_name, args, *a, **kw):
769+
if tool_name == "purl2notices":
770+
# purl2notices fails
771+
return MagicMock(returncode=1, stdout="", stderr="failed")
772+
elif tool_name == "purl2src":
773+
return MagicMock(returncode=0, stdout=json.dumps(purl2src_data))
774+
elif tool_name == "osslili":
775+
return MagicMock(returncode=0, stdout=json.dumps(osslili_data))
776+
elif tool_name == "upmex":
777+
call_count["upmex"] += 1
778+
# First call: no license
779+
if call_count["upmex"] == 1:
780+
return MagicMock(returncode=0, stdout=json.dumps(upmex_no_license))
781+
# Second call with --registry --api: has license from parent POM
782+
elif "--registry" in args and "--api" in args:
783+
return MagicMock(returncode=0, stdout=json.dumps(upmex_with_parent))
784+
return MagicMock(returncode=1, stdout="")
785+
786+
mock_run.side_effect = run_tool_side_effect
787+
788+
result = await server_module.download_and_scan_package(
789+
purl="pkg:maven/org.example/library@1.0.0"
790+
)
791+
792+
# Verify Maven parent POM resolution was triggered
793+
assert result["declared_license"] == "Apache-2.0"
794+
assert result["metadata"].get("license") == "Apache-2.0"
795+
assert result["metadata"].get("license_source") == "parent_pom_via_clearlydefined"
796+
797+
# Verify upmex was called twice (once normal, once with --registry --api)
798+
assert call_count["upmex"] == 2
799+
800+
@pytest.mark.asyncio
801+
async def test_maven_combined_source_and_parent_pom_licenses(self):
802+
"""Test Maven package with licenses in both source headers AND parent POM."""
803+
# Mock purl2src output
804+
purl2src_data = [{
805+
"purl": "pkg:maven/org.example/library@1.0.0",
806+
"download_url": "https://repo1.maven.org/maven2/org/example/library/1.0.0/library-1.0.0.jar",
807+
"validated": True
808+
}]
809+
810+
# Mock osslili output (finds MIT in source file headers)
811+
osslili_data = {
812+
"components": [{
813+
"licenses": [{"license": {"id": "MIT"}}],
814+
"properties": [{"name": "copyright", "value": "Copyright 2024"}]
815+
}]
816+
}
817+
818+
# Mock upmex output (no license in package POM)
819+
upmex_no_license = {
820+
"name": "library",
821+
"version": "1.0.0"
822+
}
823+
824+
# Mock upmex with --registry --api (finds Apache-2.0 in parent POM)
825+
upmex_with_parent = {
826+
"name": "library",
827+
"version": "1.0.0",
828+
"license": "Apache-2.0"
829+
}
830+
831+
with patch("mcp_semclone.server._run_tool") as mock_run, \
832+
patch("urllib.request.urlretrieve"), \
833+
patch("tempfile.mkdtemp", return_value="/tmp/test"), \
834+
patch("pathlib.Path.exists", return_value=True), \
835+
patch("shutil.rmtree"):
836+
837+
call_count = {"upmex": 0}
838+
839+
def run_tool_side_effect(tool_name, args, *a, **kw):
840+
if tool_name == "purl2notices":
841+
return MagicMock(returncode=1, stdout="", stderr="failed")
842+
elif tool_name == "purl2src":
843+
return MagicMock(returncode=0, stdout=json.dumps(purl2src_data))
844+
elif tool_name == "osslili":
845+
return MagicMock(returncode=0, stdout=json.dumps(osslili_data))
846+
elif tool_name == "upmex":
847+
call_count["upmex"] += 1
848+
if call_count["upmex"] == 1:
849+
return MagicMock(returncode=0, stdout=json.dumps(upmex_no_license))
850+
elif "--registry" in args and "--api" in args:
851+
return MagicMock(returncode=0, stdout=json.dumps(upmex_with_parent))
852+
return MagicMock(returncode=1, stdout="")
853+
854+
mock_run.side_effect = run_tool_side_effect
855+
856+
result = await server_module.download_and_scan_package(
857+
purl="pkg:maven/org.example/library@1.0.0"
858+
)
859+
860+
# Verify BOTH licenses are present
861+
assert result["declared_license"] == "Apache-2.0" # From parent POM
862+
assert "MIT" in result["detected_licenses"] # From source headers
863+
assert "Apache-2.0" in result["detected_licenses"] # Added from parent POM
864+
assert len(result["detected_licenses"]) == 2 # Both licenses
865+
assert result["metadata"].get("license_source") == "parent_pom_via_clearlydefined"
866+
867+
# Verify summary mentions parent POM
868+
assert "parent pom" in result["scan_summary"].lower()

0 commit comments

Comments
 (0)