Skip to content

fix: SPDX license handling#111

Merged
dwhswenson merged 19 commits intomainfrom
fix/spdx
Mar 17, 2026
Merged

fix: SPDX license handling#111
dwhswenson merged 19 commits intomainfrom
fix/spdx

Conversation

@ethanholz
Copy link
Copy Markdown
Collaborator

This pull request introduces a comprehensive overhaul of how software licenses are handled, validated, and displayed throughout the codebase. The main focus is on adopting the SPDX standard for license identifiers, improving form validation, updating test cases, and ensuring consistency across the software catalog. The changes include the addition of a curated SPDX license list, stricter schema validation, and updates to both frontend components and test suites.

SPDX License Integration and Validation

  • Added spdx-license-ids as a dependency and introduced src/lib/spdx.ts to provide all SPDX license identifiers (ALL_SPDX_IDS) and a curated subset (COMMON_LICENSES) for use in forms and validation. [1] [2]
  • Updated the Zod schema in src/schemas.ts to require license arrays to use valid SPDX identifiers or LicenseRef- custom strings, unless a known non-SPDX sentinel ("Proprietary", "Non-AI") is present, in which case validation is skipped. [1] [2]
  • Replaced the previous hardcoded license list in the submission form (src/components/Form.svelte) with the curated SPDX subset (COMMON_LICENSES). [1] [2]

Catalog and Test Suite Updates

  • Updated all software YAML entries to use precise SPDX identifiers (e.g., LGPL-2.1-only, GPL-2.0-only, MIT), replacing ambiguous or abbreviated forms. [1] [2] [3] [4] [5] [6] [7] [8] [9]
  • Added a comprehensive test suite in src/test/lib/utils/spdx.test.ts to verify SPDX list integrity, curated subset correctness, and schema validation behavior for various license array scenarios.
  • Updated all affected tests and fixtures to use correct SPDX identifiers (e.g., BSD-3-Clause instead of BSD-3). [1] [2] [3] [4] [5]

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR standardizes license handling across the OMSF directory by adopting SPDX identifiers end-to-end (schema validation, form options, catalog YAML, and test fixtures), backed by a new SPDX helper module and tests.

Changes:

  • Introduces SPDX utilities (ALL_SPDX_IDS, curated COMMON_LICENSES) via spdx-license-ids and uses them throughout the app.
  • Tightens SoftwareSchemaObject license validation to accept SPDX IDs or LicenseRef-*, with opt-out sentinels (Proprietary, Non-AI).
  • Updates catalog YAML entries and tests to use canonical SPDX identifiers (e.g., BSD-3-Clause, LGPL-2.1-or-later, GPL-2.0-only).

Reviewed changes

Copilot reviewed 18 out of 19 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
src/lib/spdx.ts Adds SPDX ID list import + curated common-license subset for UI and validation.
src/schemas.ts Enforces SPDX/LicenseRef license validation and re-exports curated license list.
src/components/Form.svelte Switches the license picker from the legacy list to COMMON_LICENSES.
src/pages/llms-full.txt.ts Updates markdown output from singular license to plural licenses.
src/test/lib/utils/spdx.test.ts Adds tests for SPDX list integrity, curated subset, and schema behavior.
src/test/lib/utils/yamlRender.svelte.test.ts Updates fixtures to canonical SPDX identifiers (e.g., BSD-3-Clause).
src/test/lib/utils/tagNormalization.test.ts Updates expected normalized licenses to canonical SPDX IDs.
software/westpa.yaml Normalizes license entry formatting.
software/psi4.yaml Replaces ambiguous license value with a canonical SPDX ID.
software/packmol.yaml Normalizes license entry formatting.
software/openmm.yaml Replaces ambiguous license value with a canonical SPDX ID.
software/mdtraj.yaml Replaces ambiguous license value with a canonical SPDX ID.
software/mdanalysis.yaml Replaces ambiguous license value with a canonical SPDX ID.
software/lammps.yaml Replaces ambiguous license value with a canonical SPDX ID.
software/gromacs.yaml Replaces ambiguous license value with a canonical SPDX ID.
software/bioemu.yaml Normalizes license entry formatting.
package.json Adds spdx-license-ids dependency.
bun.lock Locks new dependency and reflects lockfile updates.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Mar 7, 2026

This PR has been staged at https://20944b0f.omsf-directory.pages.dev.

It has been updated for commit 25660c2.

ethanholz and others added 2 commits March 6, 2026 23:09
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Copy link
Copy Markdown
Contributor

Copilot AI commented Mar 7, 2026

@ethanholz I've opened a new pull request, #112, to work on those changes. Once the pull request is ready, I'll request review from you.

Copy link
Copy Markdown
Contributor

Copilot AI commented Mar 7, 2026

@ethanholz I've opened a new pull request, #113, to work on those changes. Once the pull request is ready, I'll request review from you.

Copilot AI and others added 5 commits March 7, 2026 06:12
…t.ts

Co-authored-by: ethanholz <6245717+ethanholz@users.noreply.github.com>
Co-authored-by: ethanholz <6245717+ethanholz@users.noreply.github.com>
Fix `[undefined](undefined)` output for optional `docs`/`link` fields in llms-full.txt
perf: precompute SPDX ID Set for O(1) license validation lookups
@ethanholz ethanholz requested a review from dwhswenson March 7, 2026 06:19
@ethanholz
Copy link
Copy Markdown
Collaborator Author

@dwhswenson if you could just take a look at this and let me know what you think, I would appreciate it!

Copy link
Copy Markdown
Member

@dwhswenson dwhswenson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor issue around sentinels to opt out of SPDX validation. Ideally, we'd properly validate everything, but that's way more code that I think we want to maintain for this purpose. Instead, I suggest using a single sentinel "Custom", and then allowing non-SPDX identifiers like "Proprietary" or "Non-AI-BSD-3-Clause" to clarify.

src/schemas.ts Outdated
// SPDX validation. "Proprietary" covers closed-source software; "Non-AI"
// covers the BSD-3-Clause Non-AI variant used by e.g. ForceBalance, which
// adds an AI-training restriction not present in any SPDX identifier.
const NON_SPDX_SENTINELS = ["Proprietary", "Non-AI"] as const;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like a bit of a hack that is contrary to SPDX in order to handle the ForceBalance non-AI license. It looks to me like the ForceBalance license is the NON-AI-BSD3 from non-ai-licenses/non-ai-licenses

I can't find an official SPDX entry for that, or for the non-AI exception, but I feel like we should either do something more in line with SPDX or have a more general thing like ["Custom", "Non-AI BSD-3-Clause"], where we don't validate on the "Custom" sentinel (rather than having 2 sentinels).

Ideally, we could easily use SPDX expressions, but that seems like way more code to build out than it is worth for our needs.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like the SPDX way to do this is using a LicenseRef prefix on the license. See this discussion: CycloneDX/specification#713. This would realistically look like we just do LicenseRef-BSD-3-Clause-NonAI and when rendering we remove the LicenseRef- part. This would also apply to the Proprietary license as well.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That could work too. I guess the question is, which is easier for contributors and visitors to understand? I feel like LicenseRef- without the actual reference (URL and/or text as would be provided in an SBOM) is kind of odd. A sentinel tag like "Custom" or "Other" seems more human-friendly (and we could even omit it from presentation pretty easily, right?)

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My thought is that we transform Licenses that have the LicenseRef- tag to a Custom tag on the frontend. We need some way to show, there is a license here and it is not standard. The goal is that I do want to highlight the non-SPDX licenses when we can.

@ethanholz ethanholz requested a review from dwhswenson March 17, 2026 15:47
@ethanholz
Copy link
Copy Markdown
Collaborator Author

@dwhswenson I have updated to use the LicenseRef spec, can you double check that it looks okay? Would also like a review of the form for how we handle it.

Copy link
Copy Markdown
Member

@dwhswenson dwhswenson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One question and one change request (change is that the add button on the form should validate or possibly auto-add LicenseRef).

Comment on lines +22 to +24
if (license.startsWith("LicenseRef-")) {
return "Custom";
}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question on this (not blocking): I originally thought the intent was to have the card display license include the text after LicenseRef-, but to have all such things be called "Custom" in the license filter. Currently it actually displays "Custom" -- is this intended?

For example, if I set the license as LicenceRef-Foo, I thought I would see "Foo" on the card, not "Custom" (or maybe Custom: Foo) but in the filters "Foo" would be shown when I clicked on "Custom".

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we could make it so you would Custom and Foo.

description="Select SPDX licenses or add custom LicenseRef- identifiers"
required={true}
placeholder="Enter custom license..."
placeholder="Enter custom LicenseRef-..."
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not in the diff here, but the "Add" button for this should probably validate. I can add license "Foo" instead of "LicenseRef-Foo".

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair enough. I think this should really be more like, check if the license is a valid SPDX license. If it is, great! If not add LicenseRef.

@ethanholz ethanholz requested a review from dwhswenson March 17, 2026 17:36
Comment on lines +50 to +53
const canonicalSpdx = spdxCanonicalMap.get(value.toLowerCase());
if (canonicalSpdx) {
return canonicalSpdx;
}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something doesn't seem to be working here. Manual testing in the preview deployment (https://b3c6e846.omsf-directory.pages.dev/new/). My interpretation of the text (and, at a skim, of the code here) is that I should be able to type in an SPDX license ID (e.g., "MIT") and it would add that to the preview card. It does not. Works fine for any non-SPDX ID (adds with "LicenseRef-" prefix, e.g., I get "LicenseRef-MI" or "LicenseRef-MITE", for "MI" or "MITE", respectively), but for "MIT" nothing happens.

Copy link
Copy Markdown
Member

@dwhswenson dwhswenson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like that last commit fixed my remaining concern! Looks good; maybe someday we'll want the text box to auto-filter the license selection (so that people can type to see the correct standard SPDX expression), but that's way out of scope for this PR.

Merging!

@dwhswenson dwhswenson merged commit 47b0954 into main Mar 17, 2026
8 checks passed
@dwhswenson dwhswenson deleted the fix/spdx branch March 17, 2026 18:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants