Skip to content

Invalid file information in SPDX documents #1240

@armintaenzertng

Description

@armintaenzertng

Note: This uses the new version of the SPDX generation introduced in #1233. The old version sports the same errors and a few more that have been already fixed in the new version.

Describe the bug
SPDX outputs with file information have a number of validation issues:

  • some files don't have a checksum (maybe this is only the case for empty files, so currently this resorts to the SHA1 of the empty string so that the SpdxDocument can at least be generated)
  • some files have invalid SpdxIds like SPDXRef-None-None or SPDXRef-v2"
  • some license references from LicenseInfoInFile are not present in the ExtractedLicensingInfo section

To Reproduce
I used tern report -i golang:1.12-alpine -f spdxjson -sv 2.3 -o output.json to produce the output and then ran pyspdxtools -i output.json on it (note that the validation takes a while due to large SPDX document).
I'm not sure whether -x scancode would also be required as I recall that the above command used to not produce any file information before. In case there are problems, I attached my output.json as output.txt (JSON format is not supported by GitHub, it seems).

Error in terminal
Here are the validation issues:

Unrecognized license reference: LicenseRef-21495e9. license_expression must only use IDs from the license list or extracted licensing info, but is: LicenseRef-21495e9
Unrecognized license reference: LicenseRef-1c734cf. license_expression must only use IDs from the license list or extracted licensing info, but is: LicenseRef-1c734cf
Unrecognized license reference: LicenseRef-1b79b75. license_expression must only use IDs from the license list or extracted licensing info, but is: LicenseRef-1b79b75
Unrecognized license reference: LicenseRef-fa9fd02. license_expression must only use IDs from the license list or extracted licensing info, but is: LicenseRef-fa9fd02
Unrecognized license reference: LicenseRef-39c3ee0. license_expression must only use IDs from the license list or extracted licensing info, but is: LicenseRef-39c3ee0
Unrecognized license reference: LicenseRef-21495e9. license_expression must only use IDs from the license list or extracted licensing info, but is: LicenseRef-21495e9
Unrecognized license reference: LicenseRef-4ccf56f. license_expression must only use IDs from the license list or extracted licensing info, but is: LicenseRef-4ccf56f
Unrecognized license reference: LicenseRef-45c771b. license_expression must only use IDs from the license list or extracted licensing info, but is: LicenseRef-45c771b
Unrecognized license reference: LicenseRef-ca2312b. license_expression must only use IDs from the license list or extracted licensing info, but is: LicenseRef-ca2312b
spdx_id must only contain letters, numbers, "." and "-" and must begin with "SPDXRef-", but is: SPDXRef-v2"-None
spdx_id must only contain letters, numbers, "." and "-" and must begin with "SPDXRef-", but is: SPDXRef-v2"-None
did not find the referenced spdx_id "SPDXRef-None-None" in the SPDX document

Expected behavior
Tern's generated SPDX documents with file information should be valid.

Environment you are running Tern on
Enter all that apply

  • tern at 047e1cb
  • Ubuntu 22.0.4
  • Python 3.10.6

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions