Conversation
There was a problem hiding this comment.
Pull request overview
This PR standardizes license handling across the OMSF directory by adopting SPDX identifiers end-to-end (schema validation, form options, catalog YAML, and test fixtures), backed by a new SPDX helper module and tests.
Changes:
- Introduces SPDX utilities (
ALL_SPDX_IDS, curatedCOMMON_LICENSES) viaspdx-license-idsand uses them throughout the app. - Tightens
SoftwareSchemaObjectlicense validation to accept SPDX IDs orLicenseRef-*, with opt-out sentinels (Proprietary,Non-AI). - Updates catalog YAML entries and tests to use canonical SPDX identifiers (e.g.,
BSD-3-Clause,LGPL-2.1-or-later,GPL-2.0-only).
Reviewed changes
Copilot reviewed 18 out of 19 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| src/lib/spdx.ts | Adds SPDX ID list import + curated common-license subset for UI and validation. |
| src/schemas.ts | Enforces SPDX/LicenseRef license validation and re-exports curated license list. |
| src/components/Form.svelte | Switches the license picker from the legacy list to COMMON_LICENSES. |
| src/pages/llms-full.txt.ts | Updates markdown output from singular license to plural licenses. |
| src/test/lib/utils/spdx.test.ts | Adds tests for SPDX list integrity, curated subset, and schema behavior. |
| src/test/lib/utils/yamlRender.svelte.test.ts | Updates fixtures to canonical SPDX identifiers (e.g., BSD-3-Clause). |
| src/test/lib/utils/tagNormalization.test.ts | Updates expected normalized licenses to canonical SPDX IDs. |
| software/westpa.yaml | Normalizes license entry formatting. |
| software/psi4.yaml | Replaces ambiguous license value with a canonical SPDX ID. |
| software/packmol.yaml | Normalizes license entry formatting. |
| software/openmm.yaml | Replaces ambiguous license value with a canonical SPDX ID. |
| software/mdtraj.yaml | Replaces ambiguous license value with a canonical SPDX ID. |
| software/mdanalysis.yaml | Replaces ambiguous license value with a canonical SPDX ID. |
| software/lammps.yaml | Replaces ambiguous license value with a canonical SPDX ID. |
| software/gromacs.yaml | Replaces ambiguous license value with a canonical SPDX ID. |
| software/bioemu.yaml | Normalizes license entry formatting. |
| package.json | Adds spdx-license-ids dependency. |
| bun.lock | Locks new dependency and reflects lockfile updates. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
This PR has been staged at https://20944b0f.omsf-directory.pages.dev. It has been updated for commit 25660c2. |
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
|
@ethanholz I've opened a new pull request, #112, to work on those changes. Once the pull request is ready, I'll request review from you. |
|
@ethanholz I've opened a new pull request, #113, to work on those changes. Once the pull request is ready, I'll request review from you. |
…t.ts Co-authored-by: ethanholz <6245717+ethanholz@users.noreply.github.com>
Co-authored-by: ethanholz <6245717+ethanholz@users.noreply.github.com>
Fix `[undefined](undefined)` output for optional `docs`/`link` fields in llms-full.txt
perf: precompute SPDX ID Set for O(1) license validation lookups
|
@dwhswenson if you could just take a look at this and let me know what you think, I would appreciate it! |
dwhswenson
left a comment
There was a problem hiding this comment.
Minor issue around sentinels to opt out of SPDX validation. Ideally, we'd properly validate everything, but that's way more code that I think we want to maintain for this purpose. Instead, I suggest using a single sentinel "Custom", and then allowing non-SPDX identifiers like "Proprietary" or "Non-AI-BSD-3-Clause" to clarify.
src/schemas.ts
Outdated
| // SPDX validation. "Proprietary" covers closed-source software; "Non-AI" | ||
| // covers the BSD-3-Clause Non-AI variant used by e.g. ForceBalance, which | ||
| // adds an AI-training restriction not present in any SPDX identifier. | ||
| const NON_SPDX_SENTINELS = ["Proprietary", "Non-AI"] as const; |
There was a problem hiding this comment.
This seems like a bit of a hack that is contrary to SPDX in order to handle the ForceBalance non-AI license. It looks to me like the ForceBalance license is the NON-AI-BSD3 from non-ai-licenses/non-ai-licenses
I can't find an official SPDX entry for that, or for the non-AI exception, but I feel like we should either do something more in line with SPDX or have a more general thing like ["Custom", "Non-AI BSD-3-Clause"], where we don't validate on the "Custom" sentinel (rather than having 2 sentinels).
Ideally, we could easily use SPDX expressions, but that seems like way more code to build out than it is worth for our needs.
There was a problem hiding this comment.
It seems like the SPDX way to do this is using a LicenseRef prefix on the license. See this discussion: CycloneDX/specification#713. This would realistically look like we just do LicenseRef-BSD-3-Clause-NonAI and when rendering we remove the LicenseRef- part. This would also apply to the Proprietary license as well.
There was a problem hiding this comment.
That could work too. I guess the question is, which is easier for contributors and visitors to understand? I feel like LicenseRef- without the actual reference (URL and/or text as would be provided in an SBOM) is kind of odd. A sentinel tag like "Custom" or "Other" seems more human-friendly (and we could even omit it from presentation pretty easily, right?)
There was a problem hiding this comment.
My thought is that we transform Licenses that have the LicenseRef- tag to a Custom tag on the frontend. We need some way to show, there is a license here and it is not standard. The goal is that I do want to highlight the non-SPDX licenses when we can.
|
@dwhswenson I have updated to use the LicenseRef spec, can you double check that it looks okay? Would also like a review of the form for how we handle it. |
dwhswenson
left a comment
There was a problem hiding this comment.
One question and one change request (change is that the add button on the form should validate or possibly auto-add LicenseRef).
src/components/Card.svelte
Outdated
| if (license.startsWith("LicenseRef-")) { | ||
| return "Custom"; | ||
| } |
There was a problem hiding this comment.
Question on this (not blocking): I originally thought the intent was to have the card display license include the text after LicenseRef-, but to have all such things be called "Custom" in the license filter. Currently it actually displays "Custom" -- is this intended?
For example, if I set the license as LicenceRef-Foo, I thought I would see "Foo" on the card, not "Custom" (or maybe Custom: Foo) but in the filters "Foo" would be shown when I clicked on "Custom".
There was a problem hiding this comment.
Maybe we could make it so you would Custom and Foo.
src/components/Form.svelte
Outdated
| description="Select SPDX licenses or add custom LicenseRef- identifiers" | ||
| required={true} | ||
| placeholder="Enter custom license..." | ||
| placeholder="Enter custom LicenseRef-..." |
There was a problem hiding this comment.
Not in the diff here, but the "Add" button for this should probably validate. I can add license "Foo" instead of "LicenseRef-Foo".
There was a problem hiding this comment.
Fair enough. I think this should really be more like, check if the license is a valid SPDX license. If it is, great! If not add LicenseRef.
| const canonicalSpdx = spdxCanonicalMap.get(value.toLowerCase()); | ||
| if (canonicalSpdx) { | ||
| return canonicalSpdx; | ||
| } |
There was a problem hiding this comment.
Something doesn't seem to be working here. Manual testing in the preview deployment (https://b3c6e846.omsf-directory.pages.dev/new/). My interpretation of the text (and, at a skim, of the code here) is that I should be able to type in an SPDX license ID (e.g., "MIT") and it would add that to the preview card. It does not. Works fine for any non-SPDX ID (adds with "LicenseRef-" prefix, e.g., I get "LicenseRef-MI" or "LicenseRef-MITE", for "MI" or "MITE", respectively), but for "MIT" nothing happens.
dwhswenson
left a comment
There was a problem hiding this comment.
Looks like that last commit fixed my remaining concern! Looks good; maybe someday we'll want the text box to auto-filter the license selection (so that people can type to see the correct standard SPDX expression), but that's way out of scope for this PR.
Merging!
This pull request introduces a comprehensive overhaul of how software licenses are handled, validated, and displayed throughout the codebase. The main focus is on adopting the SPDX standard for license identifiers, improving form validation, updating test cases, and ensuring consistency across the software catalog. The changes include the addition of a curated SPDX license list, stricter schema validation, and updates to both frontend components and test suites.
SPDX License Integration and Validation
spdx-license-idsas a dependency and introducedsrc/lib/spdx.tsto provide all SPDX license identifiers (ALL_SPDX_IDS) and a curated subset (COMMON_LICENSES) for use in forms and validation. [1] [2]src/schemas.tsto require license arrays to use valid SPDX identifiers orLicenseRef-custom strings, unless a known non-SPDX sentinel ("Proprietary", "Non-AI") is present, in which case validation is skipped. [1] [2]src/components/Form.svelte) with the curated SPDX subset (COMMON_LICENSES). [1] [2]Catalog and Test Suite Updates
LGPL-2.1-only,GPL-2.0-only,MIT), replacing ambiguous or abbreviated forms. [1] [2] [3] [4] [5] [6] [7] [8] [9]src/test/lib/utils/spdx.test.tsto verify SPDX list integrity, curated subset correctness, and schema validation behavior for various license array scenarios.BSD-3-Clauseinstead ofBSD-3). [1] [2] [3] [4] [5]