feat: rewrite license-checker skill to use Fedora License Data#104
feat: rewrite license-checker skill to use Fedora License Data#104
Conversation
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughThis pull request updates docs/data.json for the public entry "python-packaging-license-checker": the description now references Fedora License Data as the authoritative policy source and the output as a structured six-field verdict; allowed_tools changed from "WebFetch" to "Bash Skill WebFetch". It also substantially reworks helpers/skills/python-packaging-license-checker/SKILL.md: replaces SPDX-centric logic with a Fedora-data-driven, stepwise workflow (locate/clone source, read license, Fedora lookup with fallback, verdict refinement, compliance checks), adds Inputs, normalization/fallback behavior, a fallback table, failure modes, remediation guidance, Red Hat Vendor Agreements, and a Quick Reference. Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Actionable Issues
🚥 Pre-merge checks | ✅ 2 | ❌ 2❌ Failed checks (2 warnings)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. 📝 Coding Plan
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment Tip CodeRabbit can suggest fixes for GitHub Check annotations.Configure the |
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (3)
helpers/skills/python-packaging-license-checker/SKILL.md (3)
43-45: GitLab artifacts URL is fragile.The URL
.../-/jobs/artifacts/main/raw/fedora-licenses.json?job=jsondepends on:
- A CI job named exactly
jsonexisting in the pipeline- That job producing the artifact at the expected path
If Fedora legal renames the job or restructures their CI, this breaks silently. Consider:
- Linking to a tagged release instead of
main- Adding a fallback URL to a mirrored/cached copy
- Documenting how to verify the URL is still valid
🔧 Add language identifier per markdownlint MD040
-``` +```text WebFetch: https://gitlab.com/fedora/legal/fedora-license-data/-/jobs/artifacts/main/raw/fedora-licenses.json?job=json</details> <details> <summary>🤖 Prompt for AI Agents</summary>Verify each finding against the current code and only fix it if needed.
In
@helpers/skills/python-packaging-license-checker/SKILL.mdaround lines 43 -
45, The GitLab artifacts URL string
https://gitlab.com/fedora/legal/fedora-license-data/-/jobs/artifacts/main/raw/fedora-licenses.json?job=json
is fragile; update SKILL.md to point to a stable release/tag URL (replacemain
with a specific tag or release link), add a secondary fallback/mirror URL for
cached copies, and include a short note on how to verify the URL (e.g., check
the pipeline job name and artifact path) so consumers can detect breakages; also
wrap the WebFetch block with a language identifier (```text) to satisfy
markdownlint MD040.</details> --- `135-142`: **Add language identifier to code block.** Per markdownlint MD040, fenced code blocks should specify a language for syntax highlighting and tooling compatibility. <details> <summary>🔧 Proposed fix</summary> ```diff -``` +```text _License:_ [SPDX expression, or "non-SPDX: [raw string]" if unmappable] _Source verified:_ YES | NO | MISMATCH (PyPI: [X], repo: [Y]) _Verdict:_ ALLOWED | ALLOWED-WITH-CAVEAT | NEEDS-HUMAN-REVIEW | ESCALATE-TO-LEGAL | BLOCKED _Build compliance:_ OK | BUILD-COMPLIANCE-FLAG _Export compliance:_ OK | EXPORT-COMPLIANCE-FLAG _Notes:_ [Actionable guidance. Omit if verdict is ALLOWED and all flags are OK.] ``` ``` </details> <details> <summary>🤖 Prompt for AI Agents</summary>Verify each finding against the current code and only fix it if needed.
In
@helpers/skills/python-packaging-license-checker/SKILL.mdaround lines 135 -
142, The fenced code block that starts with "License: [SPDX expression, or
"non-SPDX: [raw string]" if unmappable]" should include a language identifier
for markdownlint MD040; change the opening triple backticks fromtotext
so the block becomes a labeled text block (i.e., update the code fence
surrounding the lines starting with "License:", "Source verified:",
"Verdict:", etc., to use ```text).</details> --- `166-173`: **Consider adding WebFetch timeout/failure handling.** Error handling covers Fedora data unreachable and source repo inaccessible, but doesn't address WebFetch tool failures (network timeout, rate limiting, etc.). Currently it's implied by "Fedora data unreachable" but could be made explicit for clarity. <details> <summary>🤖 Prompt for AI Agents</summary> ``` Verify each finding against the current code and only fix it if needed. In `@helpers/skills/python-packaging-license-checker/SKILL.md` around lines 166 - 173, Update the "## Error Handling" section to explicitly call out WebFetch failures by name: add a bullet for "WebFetch timeout/failure" that describes treating network timeouts, HTTP errors (429/5xx), and rate-limit responses as transient fetch failures, use the same fallback table for Fedora data, append a clear note that the license check used fallback data and that source verification is UNKNOWN if repo fetch failed via WebFetch, and instruct marking verdicts as NEEDS-HUMAN-REVIEW when WebFetch prevents reliable license/source determination; reference the existing bullets "Fedora data unreachable", "Source repo private/inaccessible", and the WebFetch tool so reviewers can find and apply the same fallback/annotation behavior. ``` </details> </blockquote></details> </blockquote></details> <details> <summary>🤖 Prompt for all review comments with AI agents</summary>Verify each finding against the current code and only fix it if needed.
Inline comments:
In@helpers/skills/python-packaging-license-checker/SKILL.md:
- Around line 23-24: Update the SKILL.md entry for the upstream_org field to
explicitly define "primary country" (referencing the upstream_org symbol) used
in OFAC/sanctions checks: replace the ambiguous phrase with a precise rule such
as "primary country: country of incorporation or, if incorporation is not
applicable, the country of the principal executive office (headquarters)," and
add a short parenthetical guidance about which to prefer when both exist so Step
5 sanctions logic can consistently use that value.- Around line 34-36: The fuzzy mapping rule in "Normalise to SPDX ID" currently
hardcodes ambiguous mappings (e.g., "BSD" → BSD-3-Clause, "GPLv3" →
GPL-3.0-or-later) which can produce incorrect SPDX IDs; update the skill so that
ambiguous short forms are either flagged for manual/source verification in Step
3 instead of being auto-mapped, or explicitly document and justify the chosen
policy (e.g., "most conservative interpretation") in SKILL.md; locate the
normalization logic or mapping table referenced by the "Normalise to SPDX ID"
section and replace direct ambiguous mappings with a warning/flag return or add
a clear policy note beside each ambiguous example.
Nitpick comments:
In@helpers/skills/python-packaging-license-checker/SKILL.md:
- Around line 43-45: The GitLab artifacts URL string
https://gitlab.com/fedora/legal/fedora-license-data/-/jobs/artifacts/main/raw/fedora-licenses.json?job=json
is fragile; update SKILL.md to point to a stable release/tag URL (replacemain
with a specific tag or release link), add a secondary fallback/mirror URL for
cached copies, and include a short note on how to verify the URL (e.g., check
the pipeline job name and artifact path) so consumers can detect breakages; also
wrap the WebFetch block with a language identifier (```text) to satisfy
markdownlint MD040.- Around line 135-142: The fenced code block that starts with "License: [SPDX
expression, or "non-SPDX: [raw string]" if unmappable]" should include a
language identifier for markdownlint MD040; change the opening triple backticks
fromtotext so the block becomes a labeled text block (i.e., update the
code fence surrounding the lines starting with "License:", "Source
verified:", "Verdict:", etc., to use ```text).- Around line 166-173: Update the "## Error Handling" section to explicitly call
out WebFetch failures by name: add a bullet for "WebFetch timeout/failure" that
describes treating network timeouts, HTTP errors (429/5xx), and rate-limit
responses as transient fetch failures, use the same fallback table for Fedora
data, append a clear note that the license check used fallback data and that
source verification is UNKNOWN if repo fetch failed via WebFetch, and instruct
marking verdicts as NEEDS-HUMAN-REVIEW when WebFetch prevents reliable
license/source determination; reference the existing bullets "Fedora data
unreachable", "Source repo private/inaccessible", and the WebFetch tool so
reviewers can find and apply the same fallback/annotation behavior.</details> --- <details> <summary>ℹ️ Review info</summary> <details> <summary>⚙️ Run configuration</summary> **Configuration used**: Repository YAML (base), Central YAML (inherited), Organization UI (inherited) **Review profile**: CHILL **Plan**: Pro **Run ID**: `2a126445-739a-4ac4-82c0-6cb03491637b` </details> <details> <summary>📥 Commits</summary> Reviewing files that changed from the base of the PR and between fdcdc36681a7ffd11f6134a083feac2129da51f3 and d1b9a9b9672e94efb121c6930314ed6b43db4365. </details> <details> <summary>📒 Files selected for processing (2)</summary> * `docs/data.json` * `helpers/skills/python-packaging-license-checker/SKILL.md` </details> </details> <!-- This is an auto-generated comment by CodeRabbit for review status -->
There was a problem hiding this comment.
Actionable comments posted: 5
♻️ Duplicate comments (1)
helpers/skills/python-packaging-license-checker/SKILL.md (1)
33-36:⚠️ Potential issue | 🟠 MajorAmbiguous fuzzy mappings can produce incorrect SPDX IDs.
Line 34-Line 35 maps ambiguous short forms (
BSD,GPLv3) to specific SPDX variants. This can drive incorrect Fedora policy outcomes. Route ambiguous aliases to source verification instead of hard-mapping.Proposed fix
- "Apache 2" -> Apache-2.0, "GPL v2" -> GPL-2.0-only, "BSD" -> BSD-3-Clause, - "GPLv3" -> GPL-3.0-or-later, "MIT License" -> MIT, "LGPLv2.1" -> LGPL-2.1-only + "Apache 2" -> Apache-2.0, "GPL v2" -> GPL-2.0-only, + "MIT License" -> MIT, "LGPLv2.1" -> LGPL-2.1-only + Ambiguous aliases requiring Step 3 verification: "BSD", "GPLv3".As per coding guidelines, prioritize bug-prone patterns and error-handling gaps.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@helpers/skills/python-packaging-license-checker/SKILL.md` around lines 33 - 36, The current normalization step that fuzzy-maps ambiguous short forms (e.g., "BSD", "GPLv3", "Apache 2", "GPL v2", "LGPLv2.1", "MIT License") to specific SPDX IDs should be changed to avoid hard-mapping ambiguous aliases; instead, treat unmappable or ambiguous short forms as "unmappable" and route them to source verification flow, setting "Source verified: NO" and skipping automatic SPDX assignment; update the mapping logic that performs "exact then fuzzy matching" to recognize ambiguous tokens (like "BSD" and "GPLv3") as ambiguous and fall back to the verification step rather than assigning a concrete SPDX identifier.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@helpers/skills/python-packaging-license-checker/SKILL.md`:
- Around line 42-44: The markdown in SKILL.md uses triple-fenced code blocks
without language tags which triggers MD040; update the fences around the
"WebFetch: https://gitlab.com/..." block and the license/template block to
include an explicit language (e.g., use ```text) and do the same for the other
occurrence referenced (lines ~134-141) so both fenced blocks are annotated with
a language tag; locate the fences by searching for the "WebFetch:" line and the
template starting with "_License:_", then replace the opening ``` with ```text
for each.
- Around line 27-31: The "Compound expressions" guidance for `AND` is incorrect:
instead of picking a single governing license, require that every conjunctive
component's license be acceptable (all must pass) and document each component's
license and its evaluation in Notes; update the "Compound expressions" section
(the `AND` handling text) to state that `AND` semantics enforce all components,
describe the evaluation steps (evaluate each conjunct against the importable
library API rule), and record per-component results rather than selecting one
license.
- Around line 71-73: The code directly WebFetches the attacker-controlled
source_repo_url to read LICENSE files, enabling SSRF; change this by
rejecting/normalizing unsafe URLs before fetching: only allow HTTPS and enforce
a hosts allowlist (e.g., github.com, gitlab.com, bitbucket.org) or perform a
safe git clone via a vetted library instead of arbitrary HTTP fetch;
additionally perform DNS resolution of source_repo_url and block any resolved
IPs in private/loopback/metadata ranges (RFC1918, 127.0.0.0/8, 169.254.169.254,
::1, fc00::/7), follow a short redirect limit, and set strict timeouts and
response size limits when reading the first 30 lines of LICENSE/LICENCE/ COPYING
files for SPDX detection (refer to WebFetch and source_repo_url and the LICENSE/
LICENSE.txt/ LICENSE.md/ COPYING file list).
- Around line 46-51: The code currently treats the Fedora `status` field as a
single string and implicitly accepts unknown values; update the logic that
processes entries matched by `spdx_abbrev` so it iterates the `status` array (do
not treat as scalar), validates every element against the six documented values
(`allowed`, `allowed-content`, `allowed-documentation`, `allowed-fonts`,
`allowed-firmware`, `not-allowed`), and if any element is unrecognized mark the
entry as an error requiring human review (do not silently accept or treat as
allowed); adjust the decision path that previously checked `status` to aggregate
results from the array (e.g., if any `not-allowed` exists the verdict is
BLOCKED; otherwise refine verdict per allowed-subtype), and add explicit error
handling/reporting for unknown statuses when reading `fedora-licenses.json` or
when matching `spdx_abbrev`.
- Around line 122-123: Update the export-compliance section to explicitly list
the sanctioned-source endpoints used (OFAC SDN API URL and EU Financial
Sanctions Database URL as provided) and log the exact source URL and retrieval
timestamp for each lookup; in the sanction-check logic (reference the _Export
compliance: EXPORT-COMPLIANCE-FLAG and _Verdict: NEEDS-HUMAN-REVIEW markers)
implement retry-with-timeout for HTTP fetches and, on persistent failure, set
EXPORT-COMPLIANCE-FLAG and set _Verdict to NEEDS-HUMAN-REVIEW as the defined
fallback; also add brief prose explaining why each endpoint is chosen and ensure
the skill records source URL and timestamp in its lookup output for
auditability.
---
Duplicate comments:
In `@helpers/skills/python-packaging-license-checker/SKILL.md`:
- Around line 33-36: The current normalization step that fuzzy-maps ambiguous
short forms (e.g., "BSD", "GPLv3", "Apache 2", "GPL v2", "LGPLv2.1", "MIT
License") to specific SPDX IDs should be changed to avoid hard-mapping ambiguous
aliases; instead, treat unmappable or ambiguous short forms as "unmappable" and
route them to source verification flow, setting "Source verified: NO" and
skipping automatic SPDX assignment; update the mapping logic that performs
"exact then fuzzy matching" to recognize ambiguous tokens (like "BSD" and
"GPLv3") as ambiguous and fall back to the verification step rather than
assigning a concrete SPDX identifier.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository YAML (base), Central YAML (inherited), Organization UI (inherited)
Review profile: CHILL
Plan: Pro
Run ID: 0bf1a0b3-adf5-4c9d-a4d0-6d64214779b2
📒 Files selected for processing (2)
docs/data.jsonhelpers/skills/python-packaging-license-checker/SKILL.md
c7e8791 to
aa57fd6
Compare
There was a problem hiding this comment.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
docs/data.json (1)
145-150:⚠️ Potential issue | 🟠 MajorRegenerate plugin registry metadata to fix CI drift.
Line 145 was edited, but the registry is generated and now mismatches plugin metadata (lint already fails). Run
make updateand commit the regenerateddocs/data.jsonso this entry stays source-of-truth aligned (including tool metadata).
As per coding guidelines, prioritize bug-prone patterns and error handling gaps.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@docs/data.json` around lines 145 - 150, The plugin registry metadata is out of sync; regenerate and commit the canonical registry so docs/data.json matches the source-of-truth: run the repository's generation step (e.g., `make update`) to rebuild docs/data.json, verify the entry with id "python-packaging-license-checker" and its "allowed_tools" and other fields reflect current plugin metadata (including tool metadata), then commit the regenerated docs/data.json so CI/lint no longer fails.
♻️ Duplicate comments (7)
helpers/skills/python-packaging-license-checker/SKILL.md (7)
55-57:⚠️ Potential issue | 🟠 MajorAmbiguous fuzzy mappings can produce incorrect SPDX outcomes.
Lines 56-57 hard-map ambiguous labels (
BSD,GPLv3) to specific SPDX variants, which can materially alter policy decisions. Require explicit disambiguation or mark asNEEDS-HUMAN-REVIEWinstead of auto-resolving.As per coding guidelines, prioritize bug-prone patterns and error handling gaps.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@helpers/skills/python-packaging-license-checker/SKILL.md` around lines 55 - 57, The "Normalise to SPDX ID" fuzzy-mapping step currently auto-resolves ambiguous labels like "BSD" and "GPLv3" to specific SPDX variants; change this behavior so the normaliser detects ambiguous inputs (e.g., "BSD", "GPLv3", "GPL v2", any non-exact matches listed in the mapping block) and does not auto-select a variant; instead flag those cases as "NEEDS-HUMAN-REVIEW" (or emit a disambiguation request) and only auto-normalise exact matches (e.g., "Apache 2" -> Apache-2.0, "MIT License" -> MIT) to avoid incorrect SPDX outcomes.
21-21:⚠️ Potential issue | 🟡 Minor“Primary country” is ambiguous for sanctions screening input.
Line 21 / Line 144 rely on “primary country” without a deterministic definition. Define it explicitly (e.g., country of incorporation, else principal executive office) to avoid inconsistent flagging.
As per coding guidelines, prioritize bug-prone patterns and error handling gaps.
Also applies to: 144-145
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@helpers/skills/python-packaging-license-checker/SKILL.md` at line 21, The README uses the ambiguous field "upstream_org: primary country"; update the SKILL.md descriptions for the upstream_org field (both occurrences) to explicitly define "primary country" as a deterministic value — e.g., "primary country: country of incorporation; if unavailable, use the country of the principal executive office or registered office" — and add a short note to prefer legal domicile over operational presence to support consistent sanctions screening and downstream flagging.
27-29:⚠️ Potential issue | 🟡 MinorFenced code blocks are missing language tags (markdownlint MD040).
Static analysis flags these fences; add explicit languages (e.g.,
text,bash) to satisfy lint and keep docs tooling-clean.As per coding guidelines, prioritize bug-prone patterns and error handling gaps.
Also applies to: 33-35, 77-79, 156-163
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@helpers/skills/python-packaging-license-checker/SKILL.md` around lines 27 - 29, Update the fenced code blocks that currently show content like `Skill: python-packaging-source-finder` (and the other similar blocks at the noted locations) to include explicit language tags for markdownlint MD040 compliance; e.g., replace ``` with ```text or ```bash as appropriate for each block so the fences at the occurrences of the `Skill: python-packaging-source-finder` snippet and the other code blocks (around the other noted occurrences) have explicit language identifiers.
59-63:⚠️ Potential issue | 🟠 Major
ANDhandling under-enforces conjunctive obligations (CWE-840).Lines 60-61 still allow selecting a single “governing” component for
AND. Conjunctive expressions require all components to pass; if any component is not-allowed, verdict must beBLOCKED.As per coding guidelines, prioritize architectural issues and bug-prone patterns.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@helpers/skills/python-packaging-license-checker/SKILL.md` around lines 59 - 63, The "Compound expressions" guidance currently allows choosing a single “governing” component for AND; change it so AND expressions require all conjuncts to be allowed—if any component is not-allowed the overall verdict must be BLOCKED (do not permit selecting one component as the governing license). Update the AND handling text in the "Compound expressions" section to state that each conjunct must be evaluated and all obligations enforced, document the split in Notes, and remove the wording about applying the license governing the importable library API for AND expressions so it cannot under-enforce conjunctive obligations.
23-35:⚠️ Potential issue | 🔴 CriticalCritical SSRF risk in repository acquisition flow (CWE-918).
Lines 25-35 instruct cloning a URL derived from package metadata without a trust boundary. Severity: Critical. Exploit scenario: a malicious package can point
source_repo_urlto internal/metadata endpoints and force network access during analysis.Remediation patch (policy text)
## Step 1: Locate and clone the source repository +Before cloning, validate `source_repo_url`: +- Scheme must be `https` +- Host must be allowlisted (e.g., `github.com`, `gitlab.com`, `bitbucket.org`) +- Reject localhost, IP literals, and private/link-local/metadata ranges +- Enforce redirect limit and request timeout + Use the python-packaging-source-finder skill to find the upstream repo URL:As per coding guidelines, prioritize security vulnerabilities first.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@helpers/skills/python-packaging-license-checker/SKILL.md` around lines 23 - 35, The pipeline currently takes a source_repo_url from python-packaging-source-finder and passes it to git-shallow-clone without a trust boundary, introducing a critical SSRF risk; fix by validating and sanitizing source_repo_url before any network call in the flow: enforce allowed schemes (https, git+https), resolve and check the hostname against a configurable whitelist or compare it to the package registry host, block private/IP ranges (RFC1918, localhost, link-local), perform DNS resolution and reject mismatches, and require explicit operator approval or an internal proxy/sandbox for unknown hosts; apply this validation in the code that consumes python-packaging-source-finder output (reference symbol source_repo_url) and only call git-shallow-clone after these checks.
143-150:⚠️ Potential issue | 🟠 MajorExport-compliance decision path is non-deterministic without explicit data sources/fallback.
Lines 144-150 define sanctions checks but not concrete machine-readable endpoints, fetch behavior, or failure fallback. This creates inconsistent outcomes across runs and stale-data risk.
Define source endpoints, record retrieval timestamp/source URL in notes, and require: on persistent source failure =>
EXPORT-COMPLIANCE-FLAG+NEEDS-HUMAN-REVIEW.What are the official machine-readable endpoints and update behavior for OFAC sanctions data and the EU consolidated sanctions list, and what fallback is recommended when those sources are unavailable?As per coding guidelines, prioritize architectural issues and error handling gaps.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@helpers/skills/python-packaging-license-checker/SKILL.md` around lines 143 - 150, Update the Export compliance section to specify the exact machine-readable endpoints to check (e.g., OFAC SDN list JSON/XML URL and the EU Consolidated List CSV/JSON endpoint), state the fetch and caching behavior (fetch on run, cache with timestamp and TTL, record retrieval timestamp and source URL into Notes), and implement a clear fallback: if sources are unreachable after retries or cached data is stale, set "Export compliance: EXPORT-COMPLIANCE-FLAG" and append "NEEDS-HUMAN-REVIEW" to Notes; ensure Notes include the source URL(s) and retrieval timestamp to make decisions reproducible and auditable.
77-86:⚠️ Potential issue | 🟠 MajorFedora
statushandling is structurally incorrect and unsafe-by-default.Lines 82-86 treat
statusas scalarallowed|not-allowed; Fedora data uses a broader status model and this logic can silently misclassify entries. Treat unknown/unexpected statuses as no permission implied and route to human review.In Fedora fedora-license-data (fedora-licenses.json), what is the exact type and documented value set for the `license.status` field, and what behavior is recommended for unknown status values?As per coding guidelines, prioritize bug-prone patterns and error handling gaps.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@helpers/skills/python-packaging-license-checker/SKILL.md` around lines 77 - 86, The current logic that treats Fedora `license.status` as a simple scalar "allowed" | "not-allowed" is unsafe—update the decision logic that checks the `status` (the field named `status` in entries matched by `spdx_abbrev` from the fedora-licenses.json URL) so that only the documented explicit "allowed" value (case-insensitive) yields progression to Step 4 (verdict refinement); any other documented values or unknown/unexpected statuses must be treated as no permission implied and return Verdict: NEEDS-HUMAN-REVIEW (do not auto-BLOCK unless the status explicitly equals a documented blocking value and consult docs), and log/handle unknown statuses defensively so they do not silently misclassify licenses.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@helpers/skills/python-packaging-license-checker/SKILL.md`:
- Around line 41-46: Replace the nondeterministic find + head pipeline that
searches for license files with a deterministic check that iterates the declared
priority list (LICENSE, LICENSE.txt, LICENSE.md, COPYING, COPYING.txt, LICENCE,
LICENCE.txt) and returns the first existing file; specifically, remove the use
of the wildcard find command and head in the SKILL.md example and instead
implement a loop that tests each filename under the given clone_path (using the
existing clone_path variable) with an existence check and echoes/returns the
first match, then breaks out to ensure the declared priority order is honored.
- Around line 69-70: The line that runs the unsafe removal command "rm -rf
<clone_parent_dir>" should be replaced with a guarded delete: ensure the
<clone_parent_dir> variable is non-empty, restrict deletions to an allowed path
pattern (e.g., only under /tmp or other explicit safe parent), and call rm with
the "--" separator; if the path is empty or does not match the safe pattern,
exit with an error and refuse deletion. Update the SKILL.md snippet to describe
these checks and the guarded delete flow around the "rm -rf <clone_parent_dir>"
operation.
---
Outside diff comments:
In `@docs/data.json`:
- Around line 145-150: The plugin registry metadata is out of sync; regenerate
and commit the canonical registry so docs/data.json matches the source-of-truth:
run the repository's generation step (e.g., `make update`) to rebuild
docs/data.json, verify the entry with id "python-packaging-license-checker" and
its "allowed_tools" and other fields reflect current plugin metadata (including
tool metadata), then commit the regenerated docs/data.json so CI/lint no longer
fails.
---
Duplicate comments:
In `@helpers/skills/python-packaging-license-checker/SKILL.md`:
- Around line 55-57: The "Normalise to SPDX ID" fuzzy-mapping step currently
auto-resolves ambiguous labels like "BSD" and "GPLv3" to specific SPDX variants;
change this behavior so the normaliser detects ambiguous inputs (e.g., "BSD",
"GPLv3", "GPL v2", any non-exact matches listed in the mapping block) and does
not auto-select a variant; instead flag those cases as "NEEDS-HUMAN-REVIEW" (or
emit a disambiguation request) and only auto-normalise exact matches (e.g.,
"Apache 2" -> Apache-2.0, "MIT License" -> MIT) to avoid incorrect SPDX
outcomes.
- Line 21: The README uses the ambiguous field "upstream_org: primary country";
update the SKILL.md descriptions for the upstream_org field (both occurrences)
to explicitly define "primary country" as a deterministic value — e.g., "primary
country: country of incorporation; if unavailable, use the country of the
principal executive office or registered office" — and add a short note to
prefer legal domicile over operational presence to support consistent sanctions
screening and downstream flagging.
- Around line 27-29: Update the fenced code blocks that currently show content
like `Skill: python-packaging-source-finder` (and the other similar blocks at
the noted locations) to include explicit language tags for markdownlint MD040
compliance; e.g., replace ``` with ```text or ```bash as appropriate for each
block so the fences at the occurrences of the `Skill:
python-packaging-source-finder` snippet and the other code blocks (around the
other noted occurrences) have explicit language identifiers.
- Around line 59-63: The "Compound expressions" guidance currently allows
choosing a single “governing” component for AND; change it so AND expressions
require all conjuncts to be allowed—if any component is not-allowed the overall
verdict must be BLOCKED (do not permit selecting one component as the governing
license). Update the AND handling text in the "Compound expressions" section to
state that each conjunct must be evaluated and all obligations enforced,
document the split in Notes, and remove the wording about applying the license
governing the importable library API for AND expressions so it cannot
under-enforce conjunctive obligations.
- Around line 23-35: The pipeline currently takes a source_repo_url from
python-packaging-source-finder and passes it to git-shallow-clone without a
trust boundary, introducing a critical SSRF risk; fix by validating and
sanitizing source_repo_url before any network call in the flow: enforce allowed
schemes (https, git+https), resolve and check the hostname against a
configurable whitelist or compare it to the package registry host, block
private/IP ranges (RFC1918, localhost, link-local), perform DNS resolution and
reject mismatches, and require explicit operator approval or an internal
proxy/sandbox for unknown hosts; apply this validation in the code that consumes
python-packaging-source-finder output (reference symbol source_repo_url) and
only call git-shallow-clone after these checks.
- Around line 143-150: Update the Export compliance section to specify the exact
machine-readable endpoints to check (e.g., OFAC SDN list JSON/XML URL and the EU
Consolidated List CSV/JSON endpoint), state the fetch and caching behavior
(fetch on run, cache with timestamp and TTL, record retrieval timestamp and
source URL into Notes), and implement a clear fallback: if sources are
unreachable after retries or cached data is stale, set "Export compliance:
EXPORT-COMPLIANCE-FLAG" and append "NEEDS-HUMAN-REVIEW" to Notes; ensure Notes
include the source URL(s) and retrieval timestamp to make decisions reproducible
and auditable.
- Around line 77-86: The current logic that treats Fedora `license.status` as a
simple scalar "allowed" | "not-allowed" is unsafe—update the decision logic that
checks the `status` (the field named `status` in entries matched by
`spdx_abbrev` from the fedora-licenses.json URL) so that only the documented
explicit "allowed" value (case-insensitive) yields progression to Step 4
(verdict refinement); any other documented values or unknown/unexpected statuses
must be treated as no permission implied and return Verdict: NEEDS-HUMAN-REVIEW
(do not auto-BLOCK unless the status explicitly equals a documented blocking
value and consult docs), and log/handle unknown statuses defensively so they do
not silently misclassify licenses.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository YAML (base), Central YAML (inherited), Organization UI (inherited)
Review profile: CHILL
Plan: Pro
Run ID: 7e53148b-05cf-46f9-84ef-fe0f9d68b2a2
📒 Files selected for processing (2)
docs/data.jsonhelpers/skills/python-packaging-license-checker/SKILL.md
|
Tested: |
There was a problem hiding this comment.
♻️ Duplicate comments (7)
helpers/skills/python-packaging-license-checker/SKILL.md (7)
59-63:⚠️ Potential issue | 🟠 Major
ANDsemantics are under-enforced and can produce false ALLOWED verdicts.Lines 60-61 allow selecting a governing license for
AND. Conjunctive licensing requires all components to pass; onenot-allowedcomponent must block.Policy text fix
-- `AND`: split and evaluate each component. Apply the license governing the - importable library API (not a bundled CLI). Document the split in Notes. +- `AND`: split and evaluate each component. + - If any component is not-allowed -> Verdict: BLOCKED. + - If all components are allowed -> continue. + - If any component is unresolved -> Verdict: NEEDS-HUMAN-REVIEW. + Document per-component outcomes in Notes.As per coding guidelines, REVIEW PRIORITIES: 3. Bug-prone patterns and error handling gaps.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@helpers/skills/python-packaging-license-checker/SKILL.md` around lines 59 - 63, The "Compound expressions" section currently permits choosing a single governing license for `AND` clauses which is incorrect; change the wording so that for `AND` you must split the expression and require every component to be individually allowed (any single `not-allowed` component makes the whole verdict `not-allowed`), document the split in Notes, and remove/replace the phrase about selecting a governing license; keep the existing `OR` guidance (choose most permissive allowed or best individual verdict) intact.
69-71:⚠️ Potential issue | 🔴 CriticalCritical: unsafe recursive delete can remove unintended paths (CWE-73, CWE-20).
Line 70 executes
rm -rf <clone_parent_dir>without any guard. If the resolved value is empty/malformed or attacker-influenced, deletion can escape the intended temp workspace.Remediation patch
```bash -rm -rf <clone_parent_dir> +test -n "<clone_parent_dir>" || { echo "Empty clone path" >&2; exit 1; } +case "<clone_parent_dir>" in + /tmp/*) rm -rf -- "<clone_parent_dir>" ;; + *) echo "Refusing unsafe delete path: <clone_parent_dir>" >&2; exit 1 ;; +esac</details> As per coding guidelines, REVIEW PRIORITIES: 1. Security vulnerabilities (provide severity, exploit scenario, and remediation code). <details> <summary>🤖 Prompt for AI Agents</summary>Verify each finding against the current code and only fix it if needed.
In
@helpers/skills/python-packaging-license-checker/SKILL.mdaround lines 69 -
71, Therm -rf <clone_parent_dir>invocation is unsafe; add input validation
and path restrictions before deletion: ensure the variable referenced by
<clone_parent_dir>is non-empty (e.g., test -n), reject or exit on
empty/malformed values, restrict deletions to an allowed temp-space pattern
(e.g., only allow paths under /tmp/*) and use a safe rm invocation with--to
avoid treating hyphen-prefixed names as options; update the step around therm -rf <clone_parent_dir>line in SKILL.md to perform these checks and fail fast
with an explanatory error if the path is refused.</details> --- `27-29`: _⚠️ Potential issue_ | _🟡 Minor_ **Fenced code blocks are missing language tags (MD040).** These blocks should declare a language (`text`/`bash`) to satisfy lint and keep docs consistently parseable. <details> <summary>Patch sketch</summary> ```diff -``` +```text Skill: python-packaging-source-finder-
+text
Skill: git-shallow-clone-``` +```text WebFetch: https://gitlab.com/fedora/legal/fedora-license-data/-/jobs/artifacts/main/raw/fedora-licenses.json?job=json-
+text
License: [SPDX expression, or "non-SPDX: [raw string]" if unmappable]
...As per coding guidelines, REVIEW PRIORITIES: 3. Bug-prone patterns and error handling gaps.
Also applies to: 33-35, 77-79, 156-163
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@helpers/skills/python-packaging-license-checker/SKILL.md` around lines 27 - 29, Multiple fenced code blocks in SKILL.md (e.g., the block containing "Skill: python-packaging-source-finder", "Skill: git-shallow-clone", and the "WebFetch: https://..." block) lack language tags; update each triple-backtick fence to include an explicit language (e.g., text or bash) so they satisfy MD040 and remain parseable—locate the blocks around the strings "Skill: python-packaging-source-finder", "Skill: git-shallow-clone", and the WebFetch URL and prefix their opening fences with ```text (or ```bash if command examples) consistently, and apply the same fix to the other instances noted (lines ~33-35, 77-79, 156-163).
82-87:⚠️ Potential issue | 🟠 MajorFedora
statushandling is modeled as scalar; this is schema-fragile.Lines 82-87 treat
statusas only"allowed"/"not-allowed"scalar values. This risks wrong verdicts when the data contains multiple status entries or unfamiliar values. Parse as a collection and fail closed on unknowns (NEEDS-HUMAN-REVIEW).#!/bin/bash set -euo pipefail URL='https://gitlab.com/fedora/legal/fedora-license-data/-/jobs/artifacts/main/raw/fedora-licenses.json?job=json' curl -fsSL "$URL" | jq -r ' .. | objects | select(has("status")) | .status | if type=="array" then .[] else . end ' | sort -uAs per coding guidelines, REVIEW PRIORITIES: 3. Bug-prone patterns and error handling gaps.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@helpers/skills/python-packaging-license-checker/SKILL.md` around lines 82 - 87, Update the Fedora `status` handling to treat the `.status` field as a collection rather than a scalar: when computing the verdict in the logic that inspects the `status` field (the block that maps `"allowed"` / `"not-allowed"` to next steps and verdicts), normalize `.status` to an array, iterate all entries, and apply a fail-closed rule—if any entry is `"not-allowed"` => Verdict: BLOCKED, if at least one entry is `"allowed"` and no `"not-allowed"` entries => proceed to Step 4 (verdict refinement), and if any entry is unknown/unrecognized or the field is missing => Verdict: NEEDS-HUMAN-REVIEW; ensure all references to `status`, `allowed`, `not-allowed`, NEEDS-HUMAN-REVIEW, and Step 4 are updated accordingly.
41-46:⚠️ Potential issue | 🟠 MajorDeclared license priority is not actually enforced.
Line 45 uses
find ... | head -5, which can return files in traversal order, not the explicit priority in Lines 41-42. The-ochain also weakens the-maxdepthintent due to precedence, making selection bug-prone.Proposed deterministic fix
-```bash -find <clone_path> -maxdepth 1 -iname "license*" -o -iname "copying*" -o -iname "licence*" | head -5 -``` +```bash +for f in LICENSE LICENSE.txt LICENSE.md COPYING COPYING.txt LICENCE LICENCE.txt; do + if [ -f "<clone_path>/$f" ]; then + echo "<clone_path>/$f" + break + fi +done +```As per coding guidelines, REVIEW PRIORITIES: 3. Bug-prone patterns and error handling gaps.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@helpers/skills/python-packaging-license-checker/SKILL.md` around lines 41 - 46, The documented search using the `find ... | head -5` pipeline (the `find` invocation with `-iname "license*" -o -iname "copying*" -o -iname "licence*"` and `head -5`) does not enforce the declared priority and misuses `-maxdepth` due to `-o` precedence; replace it with a deterministic ordered check that iterates the explicit priority list (LICENSE, LICENSE.txt, LICENSE.md, COPYING, COPYING.txt, LICENCE, LICENCE.txt), tests each file with a simple existence check (e.g., `-f`) and echoes the first match then exits the loop — update SKILL.md to show this loop-based approach instead of the `find | head` pipeline.
143-150:⚠️ Potential issue | 🟠 MajorExport compliance step is non-deterministic without explicit data-source/failure rules.
Lines 144-145 name policy lists but do not define exact endpoints, retrieval timestamp requirements, or failure fallback. This makes outcomes inconsistent across runs.
Define source URLs, record fetch timestamp/source in Notes, and on persistent fetch failure set
_Export compliance: EXPORT-COMPLIANCE-FLAGplus_Verdict: NEEDS-HUMAN-REVIEW.As per coding guidelines, REVIEW PRIORITIES: 2. Architectural issues and anti-patterns.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@helpers/skills/python-packaging-license-checker/SKILL.md` around lines 143 - 150, The "Export compliance" step must be made deterministic: update the SKILL.md's Export compliance section to list exact source URLs/endpoints for the US OFAC list, the EU Consolidated Sanctions List, and Red Hat geopolitical flags, require recording the fetch timestamp and the source URL in the Notes field (e.g., append "Source: <URL>; Fetched: <ISO8601 timestamp>"), and add an explicit failure policy that after N retries or persistent fetch failure you set "Export compliance: EXPORT-COMPLIANCE-FLAG" and append "_Verdict: NEEDS-HUMAN-REVIEW" to Notes; reference the existing identifiers "EXPORT-COMPLIANCE-FLAG", "Export compliance", "Notes", and "_Verdict: NEEDS-HUMAN-REVIEW" so reviewers and automation can find and enforce these rules.
56-58:⚠️ Potential issue | 🟠 MajorAmbiguous fuzzy mappings can misclassify legal outcome.
Lines 56-57 map ambiguous forms (
BSD,GPLv3) to specific SPDX variants, which can flip Fedora-policy decisions. Treat ambiguous tokens as unresolved and force source/manual verification instead of auto-mapping.As per coding guidelines, REVIEW PRIORITIES: 3. Bug-prone patterns and error handling gaps.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@helpers/skills/python-packaging-license-checker/SKILL.md` around lines 56 - 58, The mapping that auto-converts ambiguous license tokens ("BSD" -> BSD-3-Clause, "GPLv3" -> GPL-3.0-or-later) is unsafe; update the mapping logic (the list/constant containing the entries "Apache 2" -> Apache-2.0, "GPL v2" -> GPL-2.0-only, "BSD" -> BSD-3-Clause, "GPLv3" -> GPL-3.0-or-later, "MIT License" -> MIT, "LGPLv2.1" -> LGPL-2.1-only) to stop auto-mapping ambiguous tokens ("BSD" and "GPLv3") and instead mark them as unresolved/require_manual_verification (or emit a verification flag/exception) so downstream policy decisions must use human-confirmed SPDX identifiers; leave deterministic mappings (e.g., "Apache 2" -> Apache-2.0, "GPL v2" -> GPL-2.0-only, "MIT License" -> MIT, "LGPLv2.1" -> LGPL-2.1-only) intact.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Duplicate comments:
In `@helpers/skills/python-packaging-license-checker/SKILL.md`:
- Around line 59-63: The "Compound expressions" section currently permits
choosing a single governing license for `AND` clauses which is incorrect; change
the wording so that for `AND` you must split the expression and require every
component to be individually allowed (any single `not-allowed` component makes
the whole verdict `not-allowed`), document the split in Notes, and
remove/replace the phrase about selecting a governing license; keep the existing
`OR` guidance (choose most permissive allowed or best individual verdict)
intact.
- Around line 69-71: The `rm -rf <clone_parent_dir>` invocation is unsafe; add
input validation and path restrictions before deletion: ensure the variable
referenced by `<clone_parent_dir>` is non-empty (e.g., test -n), reject or exit
on empty/malformed values, restrict deletions to an allowed temp-space pattern
(e.g., only allow paths under /tmp/*) and use a safe rm invocation with `--` to
avoid treating hyphen-prefixed names as options; update the step around the `rm
-rf <clone_parent_dir>` line in SKILL.md to perform these checks and fail fast
with an explanatory error if the path is refused.
- Around line 27-29: Multiple fenced code blocks in SKILL.md (e.g., the block
containing "Skill: python-packaging-source-finder", "Skill: git-shallow-clone",
and the "WebFetch: https://..." block) lack language tags; update each
triple-backtick fence to include an explicit language (e.g., text or bash) so
they satisfy MD040 and remain parseable—locate the blocks around the strings
"Skill: python-packaging-source-finder", "Skill: git-shallow-clone", and the
WebFetch URL and prefix their opening fences with ```text (or ```bash if command
examples) consistently, and apply the same fix to the other instances noted
(lines ~33-35, 77-79, 156-163).
- Around line 82-87: Update the Fedora `status` handling to treat the `.status`
field as a collection rather than a scalar: when computing the verdict in the
logic that inspects the `status` field (the block that maps `"allowed"` /
`"not-allowed"` to next steps and verdicts), normalize `.status` to an array,
iterate all entries, and apply a fail-closed rule—if any entry is
`"not-allowed"` => Verdict: BLOCKED, if at least one entry is `"allowed"` and no
`"not-allowed"` entries => proceed to Step 4 (verdict refinement), and if any
entry is unknown/unrecognized or the field is missing => Verdict:
NEEDS-HUMAN-REVIEW; ensure all references to `status`, `allowed`, `not-allowed`,
NEEDS-HUMAN-REVIEW, and Step 4 are updated accordingly.
- Around line 41-46: The documented search using the `find ... | head -5`
pipeline (the `find` invocation with `-iname "license*" -o -iname "copying*" -o
-iname "licence*"` and `head -5`) does not enforce the declared priority and
misuses `-maxdepth` due to `-o` precedence; replace it with a deterministic
ordered check that iterates the explicit priority list (LICENSE, LICENSE.txt,
LICENSE.md, COPYING, COPYING.txt, LICENCE, LICENCE.txt), tests each file with a
simple existence check (e.g., `-f`) and echoes the first match then exits the
loop — update SKILL.md to show this loop-based approach instead of the `find |
head` pipeline.
- Around line 143-150: The "Export compliance" step must be made deterministic:
update the SKILL.md's Export compliance section to list exact source
URLs/endpoints for the US OFAC list, the EU Consolidated Sanctions List, and Red
Hat geopolitical flags, require recording the fetch timestamp and the source URL
in the Notes field (e.g., append "Source: <URL>; Fetched: <ISO8601 timestamp>"),
and add an explicit failure policy that after N retries or persistent fetch
failure you set "Export compliance: EXPORT-COMPLIANCE-FLAG" and append
"_Verdict: NEEDS-HUMAN-REVIEW" to Notes; reference the existing identifiers
"EXPORT-COMPLIANCE-FLAG", "Export compliance", "Notes", and "_Verdict:
NEEDS-HUMAN-REVIEW" so reviewers and automation can find and enforce these
rules.
- Around line 56-58: The mapping that auto-converts ambiguous license tokens
("BSD" -> BSD-3-Clause, "GPLv3" -> GPL-3.0-or-later) is unsafe; update the
mapping logic (the list/constant containing the entries "Apache 2" ->
Apache-2.0, "GPL v2" -> GPL-2.0-only, "BSD" -> BSD-3-Clause, "GPLv3" ->
GPL-3.0-or-later, "MIT License" -> MIT, "LGPLv2.1" -> LGPL-2.1-only) to stop
auto-mapping ambiguous tokens ("BSD" and "GPLv3") and instead mark them as
unresolved/require_manual_verification (or emit a verification flag/exception)
so downstream policy decisions must use human-confirmed SPDX identifiers; leave
deterministic mappings (e.g., "Apache 2" -> Apache-2.0, "GPL v2" ->
GPL-2.0-only, "MIT License" -> MIT, "LGPLv2.1" -> LGPL-2.1-only) intact.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository YAML (base), Central YAML (inherited), Organization UI (inherited)
Review profile: CHILL
Plan: Pro
Run ID: 5350d90d-db2b-483d-a770-a7e0608d5507
📒 Files selected for processing (2)
docs/data.jsonhelpers/skills/python-packaging-license-checker/SKILL.md
There was a problem hiding this comment.
Actionable comments posted: 3
♻️ Duplicate comments (1)
helpers/skills/python-packaging-license-checker/SKILL.md (1)
91-97:⚠️ Potential issue | 🟠 Major
statusdecision logic is underspecified; explicitly handle all documented Fedora status values.Current text only promotes
"allowed"and treats the rest generically. For deterministic policy behavior, enumerate known statuses and define explicit outcomes per status (especially non-code categories), while still failing closed on unknown values.In Fedora fedora-license-data (fedora-licenses.json), what status values are currently documented for license entries, and how should consumers handle unrecognized status strings?As per coding guidelines, prioritize bug-prone patterns and error handling gaps.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@helpers/skills/python-packaging-license-checker/SKILL.md` around lines 91 - 97, Update the "status" decision logic to explicitly enumerate Fedora's documented license statuses (e.g., "allowed", "not-allowed", "deprecated", "unencumbered", "discontinued", "unknown" — replace with the exact current set from fedora-licenses.json) and map each to a deterministic outcome instead of treating non-allowed values generically: normalize the `status` field to an array, apply fail-closed rules (any "not-allowed" -> Verdict: BLOCKED; all entries "allowed" and none "not-allowed" -> proceed to Step 4), map other documented statuses to defined outcomes (for example, "deprecated" and "discontinued" -> NEEDS-HUMAN-REVIEW or a specific decision you choose), and treat missing, empty, or truly unrecognized status strings -> Verdict: NEEDS-HUMAN-REVIEW; document the exact mapping in the SKILL.md so consumers of `status` have deterministic policy behavior.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@helpers/skills/python-packaging-license-checker/SKILL.md`:
- Line 199: The table row containing the license globs "GPL-*, LGPL-*" triggers
markdownlint MD049 due to the asterisks being interpreted as emphasis; update
the row in SKILL.md to escape the asterisks or wrap each glob in backticks
(e.g., `GPL-*`, `LGPL-*`) so the table cell reads a literal glob string instead
of emphasis; locate the table line that contains the exact text "GPL-*, LGPL-*"
and replace those tokens accordingly to satisfy markdownlint.
- Line 50: The current logic that "Read the first 30 lines of the found file and
map to an SPDX identifier" is too small and may miss valid SPDX cues; change the
license detection behavior (the routine that reads and maps file contents to
SPDX identifiers) to scan a larger bounded window or the entire file up to a
reasonable size cap (e.g., 64–256KB) instead of only the first 30 lines, and
ensure the mapping function returns a failure only after scanning the expanded
content; also add error handling for oversized files and timeouts so detection
remains safe.
- Around line 157-158: Replace the portal URLs for the US OFAC SDN List and the
EU Consolidated Sanctions in SKILL.md (the entries currently pointing to
sanctionssearch.ofac.treas.gov and the data.europa.eu dataset page) with the
official machine-readable endpoints:
https://sanctionslistservice.ofac.treas.gov/api/PublicationPreview/exports/SDN.XML
for OFAC and the EC FSF tokenized endpoints/RSS feed (e.g.,
https://webgate.ec.europa.eu/europeaid/fsd/fsf/public/rss) for the EU
consolidated list; additionally update the doc text around those entries (also
update the similar entries at the later block mentioned) to require per-check
metadata capture: source URL, ISO8601 retrieval timestamp, fetch outcome
(success/failure), and retry count so automated checks record auditable
retrieval info.
---
Duplicate comments:
In `@helpers/skills/python-packaging-license-checker/SKILL.md`:
- Around line 91-97: Update the "status" decision logic to explicitly enumerate
Fedora's documented license statuses (e.g., "allowed", "not-allowed",
"deprecated", "unencumbered", "discontinued", "unknown" — replace with the exact
current set from fedora-licenses.json) and map each to a deterministic outcome
instead of treating non-allowed values generically: normalize the `status` field
to an array, apply fail-closed rules (any "not-allowed" -> Verdict: BLOCKED; all
entries "allowed" and none "not-allowed" -> proceed to Step 4), map other
documented statuses to defined outcomes (for example, "deprecated" and
"discontinued" -> NEEDS-HUMAN-REVIEW or a specific decision you choose), and
treat missing, empty, or truly unrecognized status strings -> Verdict:
NEEDS-HUMAN-REVIEW; document the exact mapping in the SKILL.md so consumers of
`status` have deterministic policy behavior.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository YAML (base), Central YAML (inherited), Organization UI (inherited)
Review profile: CHILL
Plan: Pro
Run ID: 0976c4a3-61b0-4e3c-8eba-cc18dba72b92
📒 Files selected for processing (2)
docs/data.jsonhelpers/skills/python-packaging-license-checker/SKILL.md
🚧 Files skipped from review as they are similar to previous changes (1)
- docs/data.json
Replace SPDX OSI-approval classification with Fedora License Data as the authoritative policy source for Red Hat redistribution decisions. License detection is now deterministic: the skill clones the source repo via python-packaging-source-finder and git-shallow-clone, then reads LICENSE files directly instead of relying on PyPI metadata. This addresses seven documented gaps: correct policy basis, AGPL-3.0 caveat handling, dual-license AND/OR parsing, source repo verification with mismatch detection, proprietary license escalation paths, build compliance flagging, and export compliance flagging. Output is a structured six-field verdict block. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: Emilien Macchi <emacchi@redhat.com>
Pull Request Description
Summary
Replace SPDX OSI-approval classification with Fedora License Data as
the authoritative policy source for Red Hat redistribution decisions.
License detection is now deterministic: the skill clones the source
repo via python-packaging-source-finder and git-shallow-clone, then
reads LICENSE files directly instead of relying on PyPI metadata.
This addresses seven documented gaps: correct policy basis, AGPL-3.0
caveat handling, dual-license AND/OR parsing, source repo verification
with mismatch detection, proprietary license escalation paths, build
compliance flagging, and export compliance flagging. Output is a
structured six-field verdict block.
Type of Contribution
Summary by CodeRabbit