Skip to content

feat: rewrite license-checker skill to use Fedora License Data#104

Merged
EmilienM merged 1 commit intomainfrom
license-rh
Mar 16, 2026
Merged

feat: rewrite license-checker skill to use Fedora License Data#104
EmilienM merged 1 commit intomainfrom
license-rh

Conversation

@EmilienM
Copy link
Contributor

@EmilienM EmilienM commented Mar 13, 2026

Pull Request Description

Summary

Replace SPDX OSI-approval classification with Fedora License Data as
the authoritative policy source for Red Hat redistribution decisions.
License detection is now deterministic: the skill clones the source
repo via python-packaging-source-finder and git-shallow-clone, then
reads LICENSE files directly instead of relying on PyPI metadata.

This addresses seven documented gaps: correct policy basis, AGPL-3.0
caveat handling, dual-license AND/OR parsing, source repo verification
with mismatch detection, proprietary license escalation paths, build
compliance flagging, and export compliance flagging. Output is a
structured six-field verdict block.

Type of Contribution

  • 📝 New command
  • 🎯 New skill
  • 🤖 New agent
  • 💎 New Gemini Gem
  • 📚 Documentation update
  • 🐛 Bug fix
  • 🔄 Refactoring/cleanup
  • 🏗️ Infrastructure/tooling
  • 📋 New category addition

Summary by CodeRabbit

  • Documentation
    • Revised Python packaging license checker docs to use Fedora License Data as the authoritative policy, replacing the SPDX-centric workflow with a policy-driven, multi-step process.
    • Added Inputs/Outputs, a standardized six-field verdict format with escalation guidance, clearer fallback behavior when live data is unavailable, updated failure modes, remediation guidance, Quick Reference scenarios, and vendor agreement notes.

@coderabbitai
Copy link

coderabbitai bot commented Mar 13, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

This pull request updates docs/data.json for the public entry "python-packaging-license-checker": the description now references Fedora License Data as the authoritative policy source and the output as a structured six-field verdict; allowed_tools changed from "WebFetch" to "Bash Skill WebFetch". It also substantially reworks helpers/skills/python-packaging-license-checker/SKILL.md: replaces SPDX-centric logic with a Fedora-data-driven, stepwise workflow (locate/clone source, read license, Fedora lookup with fallback, verdict refinement, compliance checks), adds Inputs, normalization/fallback behavior, a fallback table, failure modes, remediation guidance, Red Hat Vendor Agreements, and a Quick Reference.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Actionable Issues

  • Specify Fedora WebFetch timeout, retry/backoff policy and exact failure conditions (HTTP status thresholds, retry counts). (Action)
  • Define repository access/auth: allowed auth methods, token handling, minimal scopes, secure secret storage and masking in logs. Map to CWE-522. (Action)
  • Require explicit Fedora endpoint(s), response schema validation, strict TLS verification and certificate pinning for fetches. Map to CWE-295. (Action)
  • Ensure Fedora data authenticity (signed responses or verified checksums) and validate fallback table integrity to prevent policy cache poisoning. Map to CWE-347. (Action)
  • Tighten input validation: normalize and strictly validate SPDX IDs and license strings; reject or sanitize unexpected values to avoid malformed decisions. Map to CWE-20. (Action)
  • Mitigate SSRF when fetching source_repo_url or other URLs: implement server-side allowlist, DNS/IP restrictions, and block private ranges. Map to CWE-918. (Action)
  • Specify deterministic escalation rules: exact recipients, SLA thresholds, and automated decision criteria for each failure mode to avoid ambiguous manual escalation. (Action)
  • Reduce log exposure: prohibit logging of full license texts, tokens, private repo URLs; define redaction rules and retention limits. Map to CWE-200. (Action)
🚥 Pre-merge checks | ✅ 2 | ❌ 2

❌ Failed checks (2 warnings)

Check name Status Explanation Resolution
Description check ⚠️ Warning Description covers summary and type of contribution but omits critical sections: Skill Details (name, use case, dependencies), Testing and Validation, Categorization, Documentation, and Dependencies/Requirements. Complete the Skill Details section with skill name, primary use case, and dependencies; add Testing and Validation section with required checks; confirm tool categorization and documentation updates.
Agentskills.Io Specification Compliance ⚠️ Warning Pull request violates agentskills.io specification compliance: (1) unescaped asterisks in markdown tables cause MD049 emphasis-style violations; (2) 30-line license file scan window misses SPDX identifiers; (3) non-deterministic portal URLs break audit trail requirements. Escape glob pattern asterisks in backticks, extend license scan window to 100-200 lines, replace portal URLs with official API endpoints (OFAC SDN.XML, EU Sanctions RSS) with timestamp/retry audit logging.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed Title directly and specifically describes the main change: rewriting the license-checker skill to use Fedora License Data instead of SPDX-based assessment.
Ethical Compliance ✅ Passed SKILL.md contains no instructions using real people's names to replicate style, voice, or persona. No violations of ETHICS.md guidelines detected.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

📝 Coding Plan
  • Generate coding plan for human review comments

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Tip

CodeRabbit can suggest fixes for GitHub Check annotations.

Configure the reviews.tools.github-checks setting to adjust the time to wait for GitHub Checks to complete.

@EmilienM EmilienM changed the title feat: Rewrite license-checker skill to use Fedora License Data feat: rewrite license-checker skill to use Fedora License Data Mar 13, 2026
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (3)
helpers/skills/python-packaging-license-checker/SKILL.md (3)

43-45: GitLab artifacts URL is fragile.

The URL .../-/jobs/artifacts/main/raw/fedora-licenses.json?job=json depends on:

  1. A CI job named exactly json existing in the pipeline
  2. That job producing the artifact at the expected path

If Fedora legal renames the job or restructures their CI, this breaks silently. Consider:

  • Linking to a tagged release instead of main
  • Adding a fallback URL to a mirrored/cached copy
  • Documenting how to verify the URL is still valid
🔧 Add language identifier per markdownlint MD040
-```
+```text
 WebFetch: https://gitlab.com/fedora/legal/fedora-license-data/-/jobs/artifacts/main/raw/fedora-licenses.json?job=json
</details>

<details>
<summary>🤖 Prompt for AI Agents</summary>

Verify each finding against the current code and only fix it if needed.

In @helpers/skills/python-packaging-license-checker/SKILL.md around lines 43 -
45, The GitLab artifacts URL string
https://gitlab.com/fedora/legal/fedora-license-data/-/jobs/artifacts/main/raw/fedora-licenses.json?job=json
is fragile; update SKILL.md to point to a stable release/tag URL (replace main
with a specific tag or release link), add a secondary fallback/mirror URL for
cached copies, and include a short note on how to verify the URL (e.g., check
the pipeline job name and artifact path) so consumers can detect breakages; also
wrap the WebFetch block with a language identifier (```text) to satisfy
markdownlint MD040.


</details>

---

`135-142`: **Add language identifier to code block.**

Per markdownlint MD040, fenced code blocks should specify a language for syntax highlighting and tooling compatibility.

<details>
<summary>🔧 Proposed fix</summary>

```diff
-```
+```text
 _License:_ [SPDX expression, or "non-SPDX: [raw string]" if unmappable]
 _Source verified:_ YES | NO | MISMATCH (PyPI: [X], repo: [Y])
 _Verdict:_ ALLOWED | ALLOWED-WITH-CAVEAT | NEEDS-HUMAN-REVIEW | ESCALATE-TO-LEGAL | BLOCKED
 _Build compliance:_ OK | BUILD-COMPLIANCE-FLAG
 _Export compliance:_ OK | EXPORT-COMPLIANCE-FLAG
 _Notes:_ [Actionable guidance. Omit if verdict is ALLOWED and all flags are OK.]
 ```
```
</details>

<details>
<summary>🤖 Prompt for AI Agents</summary>

Verify each finding against the current code and only fix it if needed.

In @helpers/skills/python-packaging-license-checker/SKILL.md around lines 135 -
142, The fenced code block that starts with "License: [SPDX expression, or
"non-SPDX: [raw string]" if unmappable]" should include a language identifier
for markdownlint MD040; change the opening triple backticks from totext
so the block becomes a labeled text block (i.e., update the code fence
surrounding the lines starting with "License:", "Source verified:",
"Verdict:", etc., to use ```text).


</details>

---

`166-173`: **Consider adding WebFetch timeout/failure handling.**

Error handling covers Fedora data unreachable and source repo inaccessible, but doesn't address WebFetch tool failures (network timeout, rate limiting, etc.). Currently it's implied by "Fedora data unreachable" but could be made explicit for clarity.

<details>
<summary>🤖 Prompt for AI Agents</summary>

```
Verify each finding against the current code and only fix it if needed.

In `@helpers/skills/python-packaging-license-checker/SKILL.md` around lines 166 -
173, Update the "## Error Handling" section to explicitly call out WebFetch
failures by name: add a bullet for "WebFetch timeout/failure" that describes
treating network timeouts, HTTP errors (429/5xx), and rate-limit responses as
transient fetch failures, use the same fallback table for Fedora data, append a
clear note that the license check used fallback data and that source
verification is UNKNOWN if repo fetch failed via WebFetch, and instruct marking
verdicts as NEEDS-HUMAN-REVIEW when WebFetch prevents reliable license/source
determination; reference the existing bullets "Fedora data unreachable", "Source
repo private/inaccessible", and the WebFetch tool so reviewers can find and
apply the same fallback/annotation behavior.
```

</details>

</blockquote></details>

</blockquote></details>

<details>
<summary>🤖 Prompt for all review comments with AI agents</summary>

Verify each finding against the current code and only fix it if needed.

Inline comments:
In @helpers/skills/python-packaging-license-checker/SKILL.md:

  • Around line 23-24: Update the SKILL.md entry for the upstream_org field to
    explicitly define "primary country" (referencing the upstream_org symbol) used
    in OFAC/sanctions checks: replace the ambiguous phrase with a precise rule such
    as "primary country: country of incorporation or, if incorporation is not
    applicable, the country of the principal executive office (headquarters)," and
    add a short parenthetical guidance about which to prefer when both exist so Step
    5 sanctions logic can consistently use that value.
  • Around line 34-36: The fuzzy mapping rule in "Normalise to SPDX ID" currently
    hardcodes ambiguous mappings (e.g., "BSD" → BSD-3-Clause, "GPLv3" →
    GPL-3.0-or-later) which can produce incorrect SPDX IDs; update the skill so that
    ambiguous short forms are either flagged for manual/source verification in Step
    3 instead of being auto-mapped, or explicitly document and justify the chosen
    policy (e.g., "most conservative interpretation") in SKILL.md; locate the
    normalization logic or mapping table referenced by the "Normalise to SPDX ID"
    section and replace direct ambiguous mappings with a warning/flag return or add
    a clear policy note beside each ambiguous example.

Nitpick comments:
In @helpers/skills/python-packaging-license-checker/SKILL.md:

  • Around line 43-45: The GitLab artifacts URL string
    https://gitlab.com/fedora/legal/fedora-license-data/-/jobs/artifacts/main/raw/fedora-licenses.json?job=json
    is fragile; update SKILL.md to point to a stable release/tag URL (replace main
    with a specific tag or release link), add a secondary fallback/mirror URL for
    cached copies, and include a short note on how to verify the URL (e.g., check
    the pipeline job name and artifact path) so consumers can detect breakages; also
    wrap the WebFetch block with a language identifier (```text) to satisfy
    markdownlint MD040.
  • Around line 135-142: The fenced code block that starts with "License: [SPDX
    expression, or "non-SPDX: [raw string]" if unmappable]" should include a
    language identifier for markdownlint MD040; change the opening triple backticks
    from totext so the block becomes a labeled text block (i.e., update the
    code fence surrounding the lines starting with "License:", "Source
    verified:
    ", "Verdict:", etc., to use ```text).
  • Around line 166-173: Update the "## Error Handling" section to explicitly call
    out WebFetch failures by name: add a bullet for "WebFetch timeout/failure" that
    describes treating network timeouts, HTTP errors (429/5xx), and rate-limit
    responses as transient fetch failures, use the same fallback table for Fedora
    data, append a clear note that the license check used fallback data and that
    source verification is UNKNOWN if repo fetch failed via WebFetch, and instruct
    marking verdicts as NEEDS-HUMAN-REVIEW when WebFetch prevents reliable
    license/source determination; reference the existing bullets "Fedora data
    unreachable", "Source repo private/inaccessible", and the WebFetch tool so
    reviewers can find and apply the same fallback/annotation behavior.

</details>

---

<details>
<summary>ℹ️ Review info</summary>

<details>
<summary>⚙️ Run configuration</summary>

**Configuration used**: Repository YAML (base), Central YAML (inherited), Organization UI (inherited)

**Review profile**: CHILL

**Plan**: Pro

**Run ID**: `2a126445-739a-4ac4-82c0-6cb03491637b`

</details>

<details>
<summary>📥 Commits</summary>

Reviewing files that changed from the base of the PR and between fdcdc36681a7ffd11f6134a083feac2129da51f3 and d1b9a9b9672e94efb121c6930314ed6b43db4365.

</details>

<details>
<summary>📒 Files selected for processing (2)</summary>

* `docs/data.json`
* `helpers/skills/python-packaging-license-checker/SKILL.md`

</details>

</details>

<!-- This is an auto-generated comment by CodeRabbit for review status -->

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

♻️ Duplicate comments (1)
helpers/skills/python-packaging-license-checker/SKILL.md (1)

33-36: ⚠️ Potential issue | 🟠 Major

Ambiguous fuzzy mappings can produce incorrect SPDX IDs.

Line 34-Line 35 maps ambiguous short forms (BSD, GPLv3) to specific SPDX variants. This can drive incorrect Fedora policy outcomes. Route ambiguous aliases to source verification instead of hard-mapping.

Proposed fix
- "Apache 2" -> Apache-2.0, "GPL v2" -> GPL-2.0-only, "BSD" -> BSD-3-Clause,
- "GPLv3" -> GPL-3.0-or-later, "MIT License" -> MIT, "LGPLv2.1" -> LGPL-2.1-only
+ "Apache 2" -> Apache-2.0, "GPL v2" -> GPL-2.0-only,
+ "MIT License" -> MIT, "LGPLv2.1" -> LGPL-2.1-only
+ Ambiguous aliases requiring Step 3 verification: "BSD", "GPLv3".

As per coding guidelines, prioritize bug-prone patterns and error-handling gaps.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@helpers/skills/python-packaging-license-checker/SKILL.md` around lines 33 -
36, The current normalization step that fuzzy-maps ambiguous short forms (e.g.,
"BSD", "GPLv3", "Apache 2", "GPL v2", "LGPLv2.1", "MIT License") to specific
SPDX IDs should be changed to avoid hard-mapping ambiguous aliases; instead,
treat unmappable or ambiguous short forms as "unmappable" and route them to
source verification flow, setting "Source verified: NO" and skipping automatic
SPDX assignment; update the mapping logic that performs "exact then fuzzy
matching" to recognize ambiguous tokens (like "BSD" and "GPLv3") as ambiguous
and fall back to the verification step rather than assigning a concrete SPDX
identifier.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@helpers/skills/python-packaging-license-checker/SKILL.md`:
- Around line 42-44: The markdown in SKILL.md uses triple-fenced code blocks
without language tags which triggers MD040; update the fences around the
"WebFetch: https://gitlab.com/..." block and the license/template block to
include an explicit language (e.g., use ```text) and do the same for the other
occurrence referenced (lines ~134-141) so both fenced blocks are annotated with
a language tag; locate the fences by searching for the "WebFetch:" line and the
template starting with "_License:_", then replace the opening ``` with ```text
for each.
- Around line 27-31: The "Compound expressions" guidance for `AND` is incorrect:
instead of picking a single governing license, require that every conjunctive
component's license be acceptable (all must pass) and document each component's
license and its evaluation in Notes; update the "Compound expressions" section
(the `AND` handling text) to state that `AND` semantics enforce all components,
describe the evaluation steps (evaluate each conjunct against the importable
library API rule), and record per-component results rather than selecting one
license.
- Around line 71-73: The code directly WebFetches the attacker-controlled
source_repo_url to read LICENSE files, enabling SSRF; change this by
rejecting/normalizing unsafe URLs before fetching: only allow HTTPS and enforce
a hosts allowlist (e.g., github.com, gitlab.com, bitbucket.org) or perform a
safe git clone via a vetted library instead of arbitrary HTTP fetch;
additionally perform DNS resolution of source_repo_url and block any resolved
IPs in private/loopback/metadata ranges (RFC1918, 127.0.0.0/8, 169.254.169.254,
::1, fc00::/7), follow a short redirect limit, and set strict timeouts and
response size limits when reading the first 30 lines of LICENSE/LICENCE/ COPYING
files for SPDX detection (refer to WebFetch and source_repo_url and the LICENSE/
LICENSE.txt/ LICENSE.md/ COPYING file list).
- Around line 46-51: The code currently treats the Fedora `status` field as a
single string and implicitly accepts unknown values; update the logic that
processes entries matched by `spdx_abbrev` so it iterates the `status` array (do
not treat as scalar), validates every element against the six documented values
(`allowed`, `allowed-content`, `allowed-documentation`, `allowed-fonts`,
`allowed-firmware`, `not-allowed`), and if any element is unrecognized mark the
entry as an error requiring human review (do not silently accept or treat as
allowed); adjust the decision path that previously checked `status` to aggregate
results from the array (e.g., if any `not-allowed` exists the verdict is
BLOCKED; otherwise refine verdict per allowed-subtype), and add explicit error
handling/reporting for unknown statuses when reading `fedora-licenses.json` or
when matching `spdx_abbrev`.
- Around line 122-123: Update the export-compliance section to explicitly list
the sanctioned-source endpoints used (OFAC SDN API URL and EU Financial
Sanctions Database URL as provided) and log the exact source URL and retrieval
timestamp for each lookup; in the sanction-check logic (reference the _Export
compliance: EXPORT-COMPLIANCE-FLAG and _Verdict: NEEDS-HUMAN-REVIEW markers)
implement retry-with-timeout for HTTP fetches and, on persistent failure, set
EXPORT-COMPLIANCE-FLAG and set _Verdict to NEEDS-HUMAN-REVIEW as the defined
fallback; also add brief prose explaining why each endpoint is chosen and ensure
the skill records source URL and timestamp in its lookup output for
auditability.

---

Duplicate comments:
In `@helpers/skills/python-packaging-license-checker/SKILL.md`:
- Around line 33-36: The current normalization step that fuzzy-maps ambiguous
short forms (e.g., "BSD", "GPLv3", "Apache 2", "GPL v2", "LGPLv2.1", "MIT
License") to specific SPDX IDs should be changed to avoid hard-mapping ambiguous
aliases; instead, treat unmappable or ambiguous short forms as "unmappable" and
route them to source verification flow, setting "Source verified: NO" and
skipping automatic SPDX assignment; update the mapping logic that performs
"exact then fuzzy matching" to recognize ambiguous tokens (like "BSD" and
"GPLv3") as ambiguous and fall back to the verification step rather than
assigning a concrete SPDX identifier.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository YAML (base), Central YAML (inherited), Organization UI (inherited)

Review profile: CHILL

Plan: Pro

Run ID: 0bf1a0b3-adf5-4c9d-a4d0-6d64214779b2

📥 Commits

Reviewing files that changed from the base of the PR and between d1b9a9b and 5eeaa20.

📒 Files selected for processing (2)
  • docs/data.json
  • helpers/skills/python-packaging-license-checker/SKILL.md

@EmilienM EmilienM force-pushed the license-rh branch 2 times, most recently from c7e8791 to aa57fd6 Compare March 13, 2026 18:25
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
docs/data.json (1)

145-150: ⚠️ Potential issue | 🟠 Major

Regenerate plugin registry metadata to fix CI drift.

Line 145 was edited, but the registry is generated and now mismatches plugin metadata (lint already fails). Run make update and commit the regenerated docs/data.json so this entry stays source-of-truth aligned (including tool metadata).
As per coding guidelines, prioritize bug-prone patterns and error handling gaps.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/data.json` around lines 145 - 150, The plugin registry metadata is out
of sync; regenerate and commit the canonical registry so docs/data.json matches
the source-of-truth: run the repository's generation step (e.g., `make update`)
to rebuild docs/data.json, verify the entry with id
"python-packaging-license-checker" and its "allowed_tools" and other fields
reflect current plugin metadata (including tool metadata), then commit the
regenerated docs/data.json so CI/lint no longer fails.
♻️ Duplicate comments (7)
helpers/skills/python-packaging-license-checker/SKILL.md (7)

55-57: ⚠️ Potential issue | 🟠 Major

Ambiguous fuzzy mappings can produce incorrect SPDX outcomes.

Lines 56-57 hard-map ambiguous labels (BSD, GPLv3) to specific SPDX variants, which can materially alter policy decisions. Require explicit disambiguation or mark as NEEDS-HUMAN-REVIEW instead of auto-resolving.

As per coding guidelines, prioritize bug-prone patterns and error handling gaps.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@helpers/skills/python-packaging-license-checker/SKILL.md` around lines 55 -
57, The "Normalise to SPDX ID" fuzzy-mapping step currently auto-resolves
ambiguous labels like "BSD" and "GPLv3" to specific SPDX variants; change this
behavior so the normaliser detects ambiguous inputs (e.g., "BSD", "GPLv3", "GPL
v2", any non-exact matches listed in the mapping block) and does not auto-select
a variant; instead flag those cases as "NEEDS-HUMAN-REVIEW" (or emit a
disambiguation request) and only auto-normalise exact matches (e.g., "Apache 2"
-> Apache-2.0, "MIT License" -> MIT) to avoid incorrect SPDX outcomes.

21-21: ⚠️ Potential issue | 🟡 Minor

“Primary country” is ambiguous for sanctions screening input.

Line 21 / Line 144 rely on “primary country” without a deterministic definition. Define it explicitly (e.g., country of incorporation, else principal executive office) to avoid inconsistent flagging.

As per coding guidelines, prioritize bug-prone patterns and error handling gaps.

Also applies to: 144-145

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@helpers/skills/python-packaging-license-checker/SKILL.md` at line 21, The
README uses the ambiguous field "upstream_org: primary country"; update the
SKILL.md descriptions for the upstream_org field (both occurrences) to
explicitly define "primary country" as a deterministic value — e.g., "primary
country: country of incorporation; if unavailable, use the country of the
principal executive office or registered office" — and add a short note to
prefer legal domicile over operational presence to support consistent sanctions
screening and downstream flagging.

27-29: ⚠️ Potential issue | 🟡 Minor

Fenced code blocks are missing language tags (markdownlint MD040).

Static analysis flags these fences; add explicit languages (e.g., text, bash) to satisfy lint and keep docs tooling-clean.

As per coding guidelines, prioritize bug-prone patterns and error handling gaps.

Also applies to: 33-35, 77-79, 156-163

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@helpers/skills/python-packaging-license-checker/SKILL.md` around lines 27 -
29, Update the fenced code blocks that currently show content like `Skill:
python-packaging-source-finder` (and the other similar blocks at the noted
locations) to include explicit language tags for markdownlint MD040 compliance;
e.g., replace ``` with ```text or ```bash as appropriate for each block so the
fences at the occurrences of the `Skill: python-packaging-source-finder` snippet
and the other code blocks (around the other noted occurrences) have explicit
language identifiers.

59-63: ⚠️ Potential issue | 🟠 Major

AND handling under-enforces conjunctive obligations (CWE-840).

Lines 60-61 still allow selecting a single “governing” component for AND. Conjunctive expressions require all components to pass; if any component is not-allowed, verdict must be BLOCKED.

As per coding guidelines, prioritize architectural issues and bug-prone patterns.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@helpers/skills/python-packaging-license-checker/SKILL.md` around lines 59 -
63, The "Compound expressions" guidance currently allows choosing a single
“governing” component for AND; change it so AND expressions require all
conjuncts to be allowed—if any component is not-allowed the overall verdict must
be BLOCKED (do not permit selecting one component as the governing license).
Update the AND handling text in the "Compound expressions" section to state that
each conjunct must be evaluated and all obligations enforced, document the split
in Notes, and remove the wording about applying the license governing the
importable library API for AND expressions so it cannot under-enforce
conjunctive obligations.

23-35: ⚠️ Potential issue | 🔴 Critical

Critical SSRF risk in repository acquisition flow (CWE-918).

Lines 25-35 instruct cloning a URL derived from package metadata without a trust boundary. Severity: Critical. Exploit scenario: a malicious package can point source_repo_url to internal/metadata endpoints and force network access during analysis.

Remediation patch (policy text)
 ## Step 1: Locate and clone the source repository
+Before cloning, validate `source_repo_url`:
+- Scheme must be `https`
+- Host must be allowlisted (e.g., `github.com`, `gitlab.com`, `bitbucket.org`)
+- Reject localhost, IP literals, and private/link-local/metadata ranges
+- Enforce redirect limit and request timeout
+
 Use the python-packaging-source-finder skill to find the upstream repo URL:

As per coding guidelines, prioritize security vulnerabilities first.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@helpers/skills/python-packaging-license-checker/SKILL.md` around lines 23 -
35, The pipeline currently takes a source_repo_url from
python-packaging-source-finder and passes it to git-shallow-clone without a
trust boundary, introducing a critical SSRF risk; fix by validating and
sanitizing source_repo_url before any network call in the flow: enforce allowed
schemes (https, git+https), resolve and check the hostname against a
configurable whitelist or compare it to the package registry host, block
private/IP ranges (RFC1918, localhost, link-local), perform DNS resolution and
reject mismatches, and require explicit operator approval or an internal
proxy/sandbox for unknown hosts; apply this validation in the code that consumes
python-packaging-source-finder output (reference symbol source_repo_url) and
only call git-shallow-clone after these checks.

143-150: ⚠️ Potential issue | 🟠 Major

Export-compliance decision path is non-deterministic without explicit data sources/fallback.

Lines 144-150 define sanctions checks but not concrete machine-readable endpoints, fetch behavior, or failure fallback. This creates inconsistent outcomes across runs and stale-data risk.

Define source endpoints, record retrieval timestamp/source URL in notes, and require: on persistent source failure => EXPORT-COMPLIANCE-FLAG + NEEDS-HUMAN-REVIEW.

What are the official machine-readable endpoints and update behavior for OFAC sanctions data and the EU consolidated sanctions list, and what fallback is recommended when those sources are unavailable?

As per coding guidelines, prioritize architectural issues and error handling gaps.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@helpers/skills/python-packaging-license-checker/SKILL.md` around lines 143 -
150, Update the Export compliance section to specify the exact machine-readable
endpoints to check (e.g., OFAC SDN list JSON/XML URL and the EU Consolidated
List CSV/JSON endpoint), state the fetch and caching behavior (fetch on run,
cache with timestamp and TTL, record retrieval timestamp and source URL into
Notes), and implement a clear fallback: if sources are unreachable after retries
or cached data is stale, set "Export compliance: EXPORT-COMPLIANCE-FLAG" and
append "NEEDS-HUMAN-REVIEW" to Notes; ensure Notes include the source URL(s) and
retrieval timestamp to make decisions reproducible and auditable.

77-86: ⚠️ Potential issue | 🟠 Major

Fedora status handling is structurally incorrect and unsafe-by-default.

Lines 82-86 treat status as scalar allowed|not-allowed; Fedora data uses a broader status model and this logic can silently misclassify entries. Treat unknown/unexpected statuses as no permission implied and route to human review.

In Fedora fedora-license-data (fedora-licenses.json), what is the exact type and documented value set for the `license.status` field, and what behavior is recommended for unknown status values?

As per coding guidelines, prioritize bug-prone patterns and error handling gaps.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@helpers/skills/python-packaging-license-checker/SKILL.md` around lines 77 -
86, The current logic that treats Fedora `license.status` as a simple scalar
"allowed" | "not-allowed" is unsafe—update the decision logic that checks the
`status` (the field named `status` in entries matched by `spdx_abbrev` from the
fedora-licenses.json URL) so that only the documented explicit "allowed" value
(case-insensitive) yields progression to Step 4 (verdict refinement); any other
documented values or unknown/unexpected statuses must be treated as no
permission implied and return Verdict: NEEDS-HUMAN-REVIEW (do not auto-BLOCK
unless the status explicitly equals a documented blocking value and consult
docs), and log/handle unknown statuses defensively so they do not silently
misclassify licenses.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@helpers/skills/python-packaging-license-checker/SKILL.md`:
- Around line 41-46: Replace the nondeterministic find + head pipeline that
searches for license files with a deterministic check that iterates the declared
priority list (LICENSE, LICENSE.txt, LICENSE.md, COPYING, COPYING.txt, LICENCE,
LICENCE.txt) and returns the first existing file; specifically, remove the use
of the wildcard find command and head in the SKILL.md example and instead
implement a loop that tests each filename under the given clone_path (using the
existing clone_path variable) with an existence check and echoes/returns the
first match, then breaks out to ensure the declared priority order is honored.
- Around line 69-70: The line that runs the unsafe removal command "rm -rf
<clone_parent_dir>" should be replaced with a guarded delete: ensure the
<clone_parent_dir> variable is non-empty, restrict deletions to an allowed path
pattern (e.g., only under /tmp or other explicit safe parent), and call rm with
the "--" separator; if the path is empty or does not match the safe pattern,
exit with an error and refuse deletion. Update the SKILL.md snippet to describe
these checks and the guarded delete flow around the "rm -rf <clone_parent_dir>"
operation.

---

Outside diff comments:
In `@docs/data.json`:
- Around line 145-150: The plugin registry metadata is out of sync; regenerate
and commit the canonical registry so docs/data.json matches the source-of-truth:
run the repository's generation step (e.g., `make update`) to rebuild
docs/data.json, verify the entry with id "python-packaging-license-checker" and
its "allowed_tools" and other fields reflect current plugin metadata (including
tool metadata), then commit the regenerated docs/data.json so CI/lint no longer
fails.

---

Duplicate comments:
In `@helpers/skills/python-packaging-license-checker/SKILL.md`:
- Around line 55-57: The "Normalise to SPDX ID" fuzzy-mapping step currently
auto-resolves ambiguous labels like "BSD" and "GPLv3" to specific SPDX variants;
change this behavior so the normaliser detects ambiguous inputs (e.g., "BSD",
"GPLv3", "GPL v2", any non-exact matches listed in the mapping block) and does
not auto-select a variant; instead flag those cases as "NEEDS-HUMAN-REVIEW" (or
emit a disambiguation request) and only auto-normalise exact matches (e.g.,
"Apache 2" -> Apache-2.0, "MIT License" -> MIT) to avoid incorrect SPDX
outcomes.
- Line 21: The README uses the ambiguous field "upstream_org: primary country";
update the SKILL.md descriptions for the upstream_org field (both occurrences)
to explicitly define "primary country" as a deterministic value — e.g., "primary
country: country of incorporation; if unavailable, use the country of the
principal executive office or registered office" — and add a short note to
prefer legal domicile over operational presence to support consistent sanctions
screening and downstream flagging.
- Around line 27-29: Update the fenced code blocks that currently show content
like `Skill: python-packaging-source-finder` (and the other similar blocks at
the noted locations) to include explicit language tags for markdownlint MD040
compliance; e.g., replace ``` with ```text or ```bash as appropriate for each
block so the fences at the occurrences of the `Skill:
python-packaging-source-finder` snippet and the other code blocks (around the
other noted occurrences) have explicit language identifiers.
- Around line 59-63: The "Compound expressions" guidance currently allows
choosing a single “governing” component for AND; change it so AND expressions
require all conjuncts to be allowed—if any component is not-allowed the overall
verdict must be BLOCKED (do not permit selecting one component as the governing
license). Update the AND handling text in the "Compound expressions" section to
state that each conjunct must be evaluated and all obligations enforced,
document the split in Notes, and remove the wording about applying the license
governing the importable library API for AND expressions so it cannot
under-enforce conjunctive obligations.
- Around line 23-35: The pipeline currently takes a source_repo_url from
python-packaging-source-finder and passes it to git-shallow-clone without a
trust boundary, introducing a critical SSRF risk; fix by validating and
sanitizing source_repo_url before any network call in the flow: enforce allowed
schemes (https, git+https), resolve and check the hostname against a
configurable whitelist or compare it to the package registry host, block
private/IP ranges (RFC1918, localhost, link-local), perform DNS resolution and
reject mismatches, and require explicit operator approval or an internal
proxy/sandbox for unknown hosts; apply this validation in the code that consumes
python-packaging-source-finder output (reference symbol source_repo_url) and
only call git-shallow-clone after these checks.
- Around line 143-150: Update the Export compliance section to specify the exact
machine-readable endpoints to check (e.g., OFAC SDN list JSON/XML URL and the EU
Consolidated List CSV/JSON endpoint), state the fetch and caching behavior
(fetch on run, cache with timestamp and TTL, record retrieval timestamp and
source URL into Notes), and implement a clear fallback: if sources are
unreachable after retries or cached data is stale, set "Export compliance:
EXPORT-COMPLIANCE-FLAG" and append "NEEDS-HUMAN-REVIEW" to Notes; ensure Notes
include the source URL(s) and retrieval timestamp to make decisions reproducible
and auditable.
- Around line 77-86: The current logic that treats Fedora `license.status` as a
simple scalar "allowed" | "not-allowed" is unsafe—update the decision logic that
checks the `status` (the field named `status` in entries matched by
`spdx_abbrev` from the fedora-licenses.json URL) so that only the documented
explicit "allowed" value (case-insensitive) yields progression to Step 4
(verdict refinement); any other documented values or unknown/unexpected statuses
must be treated as no permission implied and return Verdict: NEEDS-HUMAN-REVIEW
(do not auto-BLOCK unless the status explicitly equals a documented blocking
value and consult docs), and log/handle unknown statuses defensively so they do
not silently misclassify licenses.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository YAML (base), Central YAML (inherited), Organization UI (inherited)

Review profile: CHILL

Plan: Pro

Run ID: 7e53148b-05cf-46f9-84ef-fe0f9d68b2a2

📥 Commits

Reviewing files that changed from the base of the PR and between 5eeaa20 and c7e8791.

📒 Files selected for processing (2)
  • docs/data.json
  • helpers/skills/python-packaging-license-checker/SKILL.md

@EmilienM
Copy link
Contributor Author

Tested:

podman run -it --rm \
    --pull newer \
    --userns=keep-id \
    -e CURSOR_API_KEY="${CURSOR_API_KEY}" \
    -v "$(pwd):$(pwd):z" \
    -w "$(pwd)" localhost/ai-helpers-cursor:latest "using python-packaging-license-checker skill, check license for pydeck"
I now have all the information needed to produce the final verdict. Let me apply the skill's rules:

- **License**: Apache-2.0 (from `bindings/pydeck/LICENSE.txt`, confirmed by `setup.cfg`)
- **Fedora status**: allowed
- **Repo root vs pydeck**: Root is MIT, pydeck subdirectory has its own Apache-2.0 LICENSE.txt. Since the pydeck package ships with Apache-2.0, that's the authoritative license for the distributed package.
- **Source verified**: The PyPI metadata doesn't specify an explicit license string (it's dynamic), but `setup.cfg` points to `LICENSE.txt` which is Apache-2.0. This is consistent.
- **Rule C** applies: allowed license, no caveats needed.
- **Build compliance**: Source is available (it's on GitHub with setup.cfg/pyproject.toml) -- OK
- **Export compliance**: vis.gl is a Linux Foundation project, US-based -- OK

Here is the result:

_License:_ Apache-2.0
_Source verified:_ YES
_Verdict:_ ALLOWED
_Build compliance:_ OK
_Export compliance:_ OK
_Notes:_ pydeck is a subdirectory of the visgl/deck.gl repository. The repo root uses MIT, but pydeck ships its own LICENSE.txt (Apache-2.0), referenced by setup.cfg license_file. Apache-2.0 is the authoritative license for the distributed Python package.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (7)
helpers/skills/python-packaging-license-checker/SKILL.md (7)

59-63: ⚠️ Potential issue | 🟠 Major

AND semantics are under-enforced and can produce false ALLOWED verdicts.

Lines 60-61 allow selecting a governing license for AND. Conjunctive licensing requires all components to pass; one not-allowed component must block.

Policy text fix
-- `AND`: split and evaluate each component. Apply the license governing the
-  importable library API (not a bundled CLI). Document the split in Notes.
+- `AND`: split and evaluate each component.
+  - If any component is not-allowed -> Verdict: BLOCKED.
+  - If all components are allowed -> continue.
+  - If any component is unresolved -> Verdict: NEEDS-HUMAN-REVIEW.
+  Document per-component outcomes in Notes.

As per coding guidelines, REVIEW PRIORITIES: 3. Bug-prone patterns and error handling gaps.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@helpers/skills/python-packaging-license-checker/SKILL.md` around lines 59 -
63, The "Compound expressions" section currently permits choosing a single
governing license for `AND` clauses which is incorrect; change the wording so
that for `AND` you must split the expression and require every component to be
individually allowed (any single `not-allowed` component makes the whole verdict
`not-allowed`), document the split in Notes, and remove/replace the phrase about
selecting a governing license; keep the existing `OR` guidance (choose most
permissive allowed or best individual verdict) intact.

69-71: ⚠️ Potential issue | 🔴 Critical

Critical: unsafe recursive delete can remove unintended paths (CWE-73, CWE-20).

Line 70 executes rm -rf <clone_parent_dir> without any guard. If the resolved value is empty/malformed or attacker-influenced, deletion can escape the intended temp workspace.

Remediation patch
 ```bash
-rm -rf <clone_parent_dir>
+test -n "<clone_parent_dir>" || { echo "Empty clone path" >&2; exit 1; }
+case "<clone_parent_dir>" in
+  /tmp/*) rm -rf -- "<clone_parent_dir>" ;;
+  *) echo "Refusing unsafe delete path: <clone_parent_dir>" >&2; exit 1 ;;
+esac
</details>

    
As per coding guidelines, REVIEW PRIORITIES: 1. Security vulnerabilities (provide severity, exploit scenario, and remediation code).

<details>
<summary>🤖 Prompt for AI Agents</summary>

Verify each finding against the current code and only fix it if needed.

In @helpers/skills/python-packaging-license-checker/SKILL.md around lines 69 -
71, The rm -rf <clone_parent_dir> invocation is unsafe; add input validation
and path restrictions before deletion: ensure the variable referenced by
<clone_parent_dir> is non-empty (e.g., test -n), reject or exit on
empty/malformed values, restrict deletions to an allowed temp-space pattern
(e.g., only allow paths under /tmp/*) and use a safe rm invocation with -- to
avoid treating hyphen-prefixed names as options; update the step around the rm -rf <clone_parent_dir> line in SKILL.md to perform these checks and fail fast
with an explanatory error if the path is refused.


</details>

---

`27-29`: _⚠️ Potential issue_ | _🟡 Minor_

**Fenced code blocks are missing language tags (MD040).**

These blocks should declare a language (`text`/`bash`) to satisfy lint and keep docs consistently parseable.

<details>
<summary>Patch sketch</summary>

```diff
-```
+```text
 Skill: python-packaging-source-finder

- +text
Skill: git-shallow-clone


-```
+```text
WebFetch: https://gitlab.com/fedora/legal/fedora-license-data/-/jobs/artifacts/main/raw/fedora-licenses.json?job=json

- +text
License: [SPDX expression, or "non-SPDX: [raw string]" if unmappable]
...

As per coding guidelines, REVIEW PRIORITIES: 3. Bug-prone patterns and error handling gaps.

Also applies to: 33-35, 77-79, 156-163

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@helpers/skills/python-packaging-license-checker/SKILL.md` around lines 27 -
29, Multiple fenced code blocks in SKILL.md (e.g., the block containing "Skill:
python-packaging-source-finder", "Skill: git-shallow-clone", and the "WebFetch:
https://..." block) lack language tags; update each triple-backtick fence to
include an explicit language (e.g., text or bash) so they satisfy MD040 and
remain parseable—locate the blocks around the strings "Skill:
python-packaging-source-finder", "Skill: git-shallow-clone", and the WebFetch
URL and prefix their opening fences with ```text (or ```bash if command
examples) consistently, and apply the same fix to the other instances noted
(lines ~33-35, 77-79, 156-163).

82-87: ⚠️ Potential issue | 🟠 Major

Fedora status handling is modeled as scalar; this is schema-fragile.

Lines 82-87 treat status as only "allowed"/"not-allowed" scalar values. This risks wrong verdicts when the data contains multiple status entries or unfamiliar values. Parse as a collection and fail closed on unknowns (NEEDS-HUMAN-REVIEW).

#!/bin/bash
set -euo pipefail
URL='https://gitlab.com/fedora/legal/fedora-license-data/-/jobs/artifacts/main/raw/fedora-licenses.json?job=json'
curl -fsSL "$URL" | jq -r '
  .. | objects | select(has("status")) | .status |
  if type=="array" then .[] else . end
' | sort -u

As per coding guidelines, REVIEW PRIORITIES: 3. Bug-prone patterns and error handling gaps.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@helpers/skills/python-packaging-license-checker/SKILL.md` around lines 82 -
87, Update the Fedora `status` handling to treat the `.status` field as a
collection rather than a scalar: when computing the verdict in the logic that
inspects the `status` field (the block that maps `"allowed"` / `"not-allowed"`
to next steps and verdicts), normalize `.status` to an array, iterate all
entries, and apply a fail-closed rule—if any entry is `"not-allowed"` =>
Verdict: BLOCKED, if at least one entry is `"allowed"` and no `"not-allowed"`
entries => proceed to Step 4 (verdict refinement), and if any entry is
unknown/unrecognized or the field is missing => Verdict: NEEDS-HUMAN-REVIEW;
ensure all references to `status`, `allowed`, `not-allowed`, NEEDS-HUMAN-REVIEW,
and Step 4 are updated accordingly.

41-46: ⚠️ Potential issue | 🟠 Major

Declared license priority is not actually enforced.

Line 45 uses find ... | head -5, which can return files in traversal order, not the explicit priority in Lines 41-42. The -o chain also weakens the -maxdepth intent due to precedence, making selection bug-prone.

Proposed deterministic fix
-```bash
-find <clone_path> -maxdepth 1 -iname "license*" -o -iname "copying*" -o -iname "licence*" | head -5
-```
+```bash
+for f in LICENSE LICENSE.txt LICENSE.md COPYING COPYING.txt LICENCE LICENCE.txt; do
+  if [ -f "<clone_path>/$f" ]; then
+    echo "<clone_path>/$f"
+    break
+  fi
+done
+```

As per coding guidelines, REVIEW PRIORITIES: 3. Bug-prone patterns and error handling gaps.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@helpers/skills/python-packaging-license-checker/SKILL.md` around lines 41 -
46, The documented search using the `find ... | head -5` pipeline (the `find`
invocation with `-iname "license*" -o -iname "copying*" -o -iname "licence*"`
and `head -5`) does not enforce the declared priority and misuses `-maxdepth`
due to `-o` precedence; replace it with a deterministic ordered check that
iterates the explicit priority list (LICENSE, LICENSE.txt, LICENSE.md, COPYING,
COPYING.txt, LICENCE, LICENCE.txt), tests each file with a simple existence
check (e.g., `-f`) and echoes the first match then exits the loop — update
SKILL.md to show this loop-based approach instead of the `find | head` pipeline.

143-150: ⚠️ Potential issue | 🟠 Major

Export compliance step is non-deterministic without explicit data-source/failure rules.

Lines 144-145 name policy lists but do not define exact endpoints, retrieval timestamp requirements, or failure fallback. This makes outcomes inconsistent across runs.

Define source URLs, record fetch timestamp/source in Notes, and on persistent fetch failure set _Export compliance: EXPORT-COMPLIANCE-FLAG plus _Verdict: NEEDS-HUMAN-REVIEW.

As per coding guidelines, REVIEW PRIORITIES: 2. Architectural issues and anti-patterns.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@helpers/skills/python-packaging-license-checker/SKILL.md` around lines 143 -
150, The "Export compliance" step must be made deterministic: update the
SKILL.md's Export compliance section to list exact source URLs/endpoints for the
US OFAC list, the EU Consolidated Sanctions List, and Red Hat geopolitical
flags, require recording the fetch timestamp and the source URL in the Notes
field (e.g., append "Source: <URL>; Fetched: <ISO8601 timestamp>"), and add an
explicit failure policy that after N retries or persistent fetch failure you set
"Export compliance: EXPORT-COMPLIANCE-FLAG" and append "_Verdict:
NEEDS-HUMAN-REVIEW" to Notes; reference the existing identifiers
"EXPORT-COMPLIANCE-FLAG", "Export compliance", "Notes", and "_Verdict:
NEEDS-HUMAN-REVIEW" so reviewers and automation can find and enforce these
rules.

56-58: ⚠️ Potential issue | 🟠 Major

Ambiguous fuzzy mappings can misclassify legal outcome.

Lines 56-57 map ambiguous forms (BSD, GPLv3) to specific SPDX variants, which can flip Fedora-policy decisions. Treat ambiguous tokens as unresolved and force source/manual verification instead of auto-mapping.

As per coding guidelines, REVIEW PRIORITIES: 3. Bug-prone patterns and error handling gaps.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@helpers/skills/python-packaging-license-checker/SKILL.md` around lines 56 -
58, The mapping that auto-converts ambiguous license tokens ("BSD" ->
BSD-3-Clause, "GPLv3" -> GPL-3.0-or-later) is unsafe; update the mapping logic
(the list/constant containing the entries "Apache 2" -> Apache-2.0, "GPL v2" ->
GPL-2.0-only, "BSD" -> BSD-3-Clause, "GPLv3" -> GPL-3.0-or-later, "MIT License"
-> MIT, "LGPLv2.1" -> LGPL-2.1-only) to stop auto-mapping ambiguous tokens
("BSD" and "GPLv3") and instead mark them as
unresolved/require_manual_verification (or emit a verification flag/exception)
so downstream policy decisions must use human-confirmed SPDX identifiers; leave
deterministic mappings (e.g., "Apache 2" -> Apache-2.0, "GPL v2" ->
GPL-2.0-only, "MIT License" -> MIT, "LGPLv2.1" -> LGPL-2.1-only) intact.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@helpers/skills/python-packaging-license-checker/SKILL.md`:
- Around line 59-63: The "Compound expressions" section currently permits
choosing a single governing license for `AND` clauses which is incorrect; change
the wording so that for `AND` you must split the expression and require every
component to be individually allowed (any single `not-allowed` component makes
the whole verdict `not-allowed`), document the split in Notes, and
remove/replace the phrase about selecting a governing license; keep the existing
`OR` guidance (choose most permissive allowed or best individual verdict)
intact.
- Around line 69-71: The `rm -rf <clone_parent_dir>` invocation is unsafe; add
input validation and path restrictions before deletion: ensure the variable
referenced by `<clone_parent_dir>` is non-empty (e.g., test -n), reject or exit
on empty/malformed values, restrict deletions to an allowed temp-space pattern
(e.g., only allow paths under /tmp/*) and use a safe rm invocation with `--` to
avoid treating hyphen-prefixed names as options; update the step around the `rm
-rf <clone_parent_dir>` line in SKILL.md to perform these checks and fail fast
with an explanatory error if the path is refused.
- Around line 27-29: Multiple fenced code blocks in SKILL.md (e.g., the block
containing "Skill: python-packaging-source-finder", "Skill: git-shallow-clone",
and the "WebFetch: https://..." block) lack language tags; update each
triple-backtick fence to include an explicit language (e.g., text or bash) so
they satisfy MD040 and remain parseable—locate the blocks around the strings
"Skill: python-packaging-source-finder", "Skill: git-shallow-clone", and the
WebFetch URL and prefix their opening fences with ```text (or ```bash if command
examples) consistently, and apply the same fix to the other instances noted
(lines ~33-35, 77-79, 156-163).
- Around line 82-87: Update the Fedora `status` handling to treat the `.status`
field as a collection rather than a scalar: when computing the verdict in the
logic that inspects the `status` field (the block that maps `"allowed"` /
`"not-allowed"` to next steps and verdicts), normalize `.status` to an array,
iterate all entries, and apply a fail-closed rule—if any entry is
`"not-allowed"` => Verdict: BLOCKED, if at least one entry is `"allowed"` and no
`"not-allowed"` entries => proceed to Step 4 (verdict refinement), and if any
entry is unknown/unrecognized or the field is missing => Verdict:
NEEDS-HUMAN-REVIEW; ensure all references to `status`, `allowed`, `not-allowed`,
NEEDS-HUMAN-REVIEW, and Step 4 are updated accordingly.
- Around line 41-46: The documented search using the `find ... | head -5`
pipeline (the `find` invocation with `-iname "license*" -o -iname "copying*" -o
-iname "licence*"` and `head -5`) does not enforce the declared priority and
misuses `-maxdepth` due to `-o` precedence; replace it with a deterministic
ordered check that iterates the explicit priority list (LICENSE, LICENSE.txt,
LICENSE.md, COPYING, COPYING.txt, LICENCE, LICENCE.txt), tests each file with a
simple existence check (e.g., `-f`) and echoes the first match then exits the
loop — update SKILL.md to show this loop-based approach instead of the `find |
head` pipeline.
- Around line 143-150: The "Export compliance" step must be made deterministic:
update the SKILL.md's Export compliance section to list exact source
URLs/endpoints for the US OFAC list, the EU Consolidated Sanctions List, and Red
Hat geopolitical flags, require recording the fetch timestamp and the source URL
in the Notes field (e.g., append "Source: <URL>; Fetched: <ISO8601 timestamp>"),
and add an explicit failure policy that after N retries or persistent fetch
failure you set "Export compliance: EXPORT-COMPLIANCE-FLAG" and append
"_Verdict: NEEDS-HUMAN-REVIEW" to Notes; reference the existing identifiers
"EXPORT-COMPLIANCE-FLAG", "Export compliance", "Notes", and "_Verdict:
NEEDS-HUMAN-REVIEW" so reviewers and automation can find and enforce these
rules.
- Around line 56-58: The mapping that auto-converts ambiguous license tokens
("BSD" -> BSD-3-Clause, "GPLv3" -> GPL-3.0-or-later) is unsafe; update the
mapping logic (the list/constant containing the entries "Apache 2" ->
Apache-2.0, "GPL v2" -> GPL-2.0-only, "BSD" -> BSD-3-Clause, "GPLv3" ->
GPL-3.0-or-later, "MIT License" -> MIT, "LGPLv2.1" -> LGPL-2.1-only) to stop
auto-mapping ambiguous tokens ("BSD" and "GPLv3") and instead mark them as
unresolved/require_manual_verification (or emit a verification flag/exception)
so downstream policy decisions must use human-confirmed SPDX identifiers; leave
deterministic mappings (e.g., "Apache 2" -> Apache-2.0, "GPL v2" ->
GPL-2.0-only, "MIT License" -> MIT, "LGPLv2.1" -> LGPL-2.1-only) intact.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository YAML (base), Central YAML (inherited), Organization UI (inherited)

Review profile: CHILL

Plan: Pro

Run ID: 5350d90d-db2b-483d-a770-a7e0608d5507

📥 Commits

Reviewing files that changed from the base of the PR and between c7e8791 and aa57fd6.

📒 Files selected for processing (2)
  • docs/data.json
  • helpers/skills/python-packaging-license-checker/SKILL.md

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

♻️ Duplicate comments (1)
helpers/skills/python-packaging-license-checker/SKILL.md (1)

91-97: ⚠️ Potential issue | 🟠 Major

status decision logic is underspecified; explicitly handle all documented Fedora status values.

Current text only promotes "allowed" and treats the rest generically. For deterministic policy behavior, enumerate known statuses and define explicit outcomes per status (especially non-code categories), while still failing closed on unknown values.

In Fedora fedora-license-data (fedora-licenses.json), what status values are currently documented for license entries, and how should consumers handle unrecognized status strings?

As per coding guidelines, prioritize bug-prone patterns and error handling gaps.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@helpers/skills/python-packaging-license-checker/SKILL.md` around lines 91 -
97, Update the "status" decision logic to explicitly enumerate Fedora's
documented license statuses (e.g., "allowed", "not-allowed", "deprecated",
"unencumbered", "discontinued", "unknown" — replace with the exact current set
from fedora-licenses.json) and map each to a deterministic outcome instead of
treating non-allowed values generically: normalize the `status` field to an
array, apply fail-closed rules (any "not-allowed" -> Verdict: BLOCKED; all
entries "allowed" and none "not-allowed" -> proceed to Step 4), map other
documented statuses to defined outcomes (for example, "deprecated" and
"discontinued" -> NEEDS-HUMAN-REVIEW or a specific decision you choose), and
treat missing, empty, or truly unrecognized status strings -> Verdict:
NEEDS-HUMAN-REVIEW; document the exact mapping in the SKILL.md so consumers of
`status` have deterministic policy behavior.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@helpers/skills/python-packaging-license-checker/SKILL.md`:
- Line 199: The table row containing the license globs "GPL-*, LGPL-*" triggers
markdownlint MD049 due to the asterisks being interpreted as emphasis; update
the row in SKILL.md to escape the asterisks or wrap each glob in backticks
(e.g., `GPL-*`, `LGPL-*`) so the table cell reads a literal glob string instead
of emphasis; locate the table line that contains the exact text "GPL-*, LGPL-*"
and replace those tokens accordingly to satisfy markdownlint.
- Line 50: The current logic that "Read the first 30 lines of the found file and
map to an SPDX identifier" is too small and may miss valid SPDX cues; change the
license detection behavior (the routine that reads and maps file contents to
SPDX identifiers) to scan a larger bounded window or the entire file up to a
reasonable size cap (e.g., 64–256KB) instead of only the first 30 lines, and
ensure the mapping function returns a failure only after scanning the expanded
content; also add error handling for oversized files and timeouts so detection
remains safe.
- Around line 157-158: Replace the portal URLs for the US OFAC SDN List and the
EU Consolidated Sanctions in SKILL.md (the entries currently pointing to
sanctionssearch.ofac.treas.gov and the data.europa.eu dataset page) with the
official machine-readable endpoints:
https://sanctionslistservice.ofac.treas.gov/api/PublicationPreview/exports/SDN.XML
for OFAC and the EC FSF tokenized endpoints/RSS feed (e.g.,
https://webgate.ec.europa.eu/europeaid/fsd/fsf/public/rss) for the EU
consolidated list; additionally update the doc text around those entries (also
update the similar entries at the later block mentioned) to require per-check
metadata capture: source URL, ISO8601 retrieval timestamp, fetch outcome
(success/failure), and retry count so automated checks record auditable
retrieval info.

---

Duplicate comments:
In `@helpers/skills/python-packaging-license-checker/SKILL.md`:
- Around line 91-97: Update the "status" decision logic to explicitly enumerate
Fedora's documented license statuses (e.g., "allowed", "not-allowed",
"deprecated", "unencumbered", "discontinued", "unknown" — replace with the exact
current set from fedora-licenses.json) and map each to a deterministic outcome
instead of treating non-allowed values generically: normalize the `status` field
to an array, apply fail-closed rules (any "not-allowed" -> Verdict: BLOCKED; all
entries "allowed" and none "not-allowed" -> proceed to Step 4), map other
documented statuses to defined outcomes (for example, "deprecated" and
"discontinued" -> NEEDS-HUMAN-REVIEW or a specific decision you choose), and
treat missing, empty, or truly unrecognized status strings -> Verdict:
NEEDS-HUMAN-REVIEW; document the exact mapping in the SKILL.md so consumers of
`status` have deterministic policy behavior.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository YAML (base), Central YAML (inherited), Organization UI (inherited)

Review profile: CHILL

Plan: Pro

Run ID: 0976c4a3-61b0-4e3c-8eba-cc18dba72b92

📥 Commits

Reviewing files that changed from the base of the PR and between aa57fd6 and 5195460.

📒 Files selected for processing (2)
  • docs/data.json
  • helpers/skills/python-packaging-license-checker/SKILL.md
🚧 Files skipped from review as they are similar to previous changes (1)
  • docs/data.json

Replace SPDX OSI-approval classification with Fedora License Data as
the authoritative policy source for Red Hat redistribution decisions.
License detection is now deterministic: the skill clones the source
repo via python-packaging-source-finder and git-shallow-clone, then
reads LICENSE files directly instead of relying on PyPI metadata.

This addresses seven documented gaps: correct policy basis, AGPL-3.0
caveat handling, dual-license AND/OR parsing, source repo verification
with mismatch detection, proprietary license escalation paths, build
compliance flagging, and export compliance flagging. Output is a
structured six-field verdict block.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: Emilien Macchi <emacchi@redhat.com>
@EmilienM EmilienM merged commit 34baca7 into main Mar 16, 2026
2 checks passed
@EmilienM EmilienM deleted the license-rh branch March 16, 2026 19:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants