Skip to content

Commit d3dd148

Browse files
committed
Accomodate updated Scancode attribute names
Scancode v31.0.0 includes changes[1] to JSON output attribute names which was causing processing KeyErrors when Tern would run with Scancode. Scancode v32.0.0 also includes changes[2] to license_detection output which was similarly causing parsing KeyErrors when Tern ran with Scancode. This commit adds code that can accomodate the new attribute property names in the newer versions of Scancode, as well as the older value names (in case we have users still using older Scancode versions). At some point in the future, it probably makes sense to re-visit some of these changes and see if we want to continue to support older versions of scancode. This commit also has small changes that updated the README instructions for how to install newer Scancode versions on M1/ARM hardware and also fixes a small bug that was causing purl generation to fail when Scancode doesn't detect a package format. [1]https://github.com/nexB/scancode-toolkit/blob/e3099637b195daca54942df9f695f58990097896/CHANGELOG.rst#v3100---2022-08-17 [2]https://github.com/nexB/scancode-toolkit/blob/e3099637b195daca54942df9f695f58990097896/CHANGELOG.rst#license-detection Resolves #1202 Signed-off-by: Rose Judge <[email protected]>
1 parent 852af8c commit d3dd148

File tree

3 files changed

+55
-14
lines changed

3 files changed

+55
-14
lines changed

README.md

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -348,6 +348,8 @@ NOTE: Neither the Docker container nor the Vagrant image has any of the extensio
348348
## Scancode<a name="scancode">
349349
[scancode-toolkit](https://github.com/nexB/scancode-toolkit) is a license analysis tool that "detects licenses, copyrights, package manifests and direct dependencies and more both in source code and binary files". Note that Scancode currently works on Python 3.6 to 3.9. Be sure to check what python version you are using below.
350350

351+
**NOTE** Installation issues have been [reported](https://github.com/nexB/scancode-toolkit/issues/3205) on macOS on M1 and Linux on ARM for Scancode>=31.0.0. If you are wanting to run Tern + Scancode in either of these environments, you will need to install `scancode-toolkit-mini`.
352+
351353
1. Install system dependencies for Scancode (refer to the [Scancode GitHub repo](https://github.com/nexB/scancode-toolkit) for instructions)
352354

353355
2. Setup a python virtual environment
@@ -360,6 +362,10 @@ $ source bin/activate
360362
```
361363
$ pip install tern scancode-toolkit
362364
```
365+
<br> If you are using macOS on M1 or Linux on ARM, run:</br>
366+
```
367+
$ pip install tern scancode-toolkit-mini
368+
```
363369
4. Run tern with scancode
364370
```
365371
$ tern report -x scancode -i golang:1.12-alpine

tern/extensions/scancode/executor.py

Lines changed: 46 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -58,15 +58,38 @@ def get_scancode_file(file_dict):
5858
file_dict['name'], fspath, file_dict['date'], file_dict['file_type'])
5959
fd.short_file_type = get_file_type(file_dict)
6060
fd.add_checksums({'sha1': file_dict['sha1'], 'md5': file_dict['md5']})
61-
if file_dict['licenses']:
62-
fd.licenses = [li['short_name'] for li in file_dict['licenses']]
63-
fd.license_expressions = file_dict['license_expressions']
61+
try:
62+
# For scancode versions <= 32.0.0
63+
if file_dict['licenses']:
64+
fd.licenses = [li['short_name'] for li in file_dict['licenses']]
65+
fd.license_expressions = file_dict['license_expressions']
66+
except KeyError:
67+
# License detection changed for scancode version >= 32.0
68+
## https://github.com/nexB/scancode-toolkit/blob/e3099637b195daca54942df9f695f58990097896/CHANGELOG.rst#license-detection
69+
if file_dict['license_detections']:
70+
fd.licenses = [li['license_expression'] for li in file_dict['license_detections']]
71+
fd.license_expressions = file_dict['detected_license_expression']
72+
## Several of the scancode attribute names have changed. See:
73+
# https://github.com/nexB/scancode-toolkit/blob/e3099637b195daca54942df9f695f58990097896/CHANGELOG.rst#important-api-changes-1
74+
# The following try/except statements accomodate metadata from scancode versions
75+
# prior to this scancode JSON output change as well as after the change was made.
6476
if file_dict['copyrights']:
65-
fd.copyrights = [c['value'] for c in file_dict['copyrights']]
77+
try:
78+
# For scancode versions <=30.*
79+
fd.copyrights = [c['value'] for c in file_dict['copyrights']]
80+
except KeyError:
81+
# Data structure fields changed in scancode >= 31.0.0
82+
fd.copyrights = [c['copyright'] for c in file_dict['copyrights']]
6683
if file_dict['urls']:
6784
fd.urls = [u['url'] for u in file_dict['urls']]
68-
fd.packages = file_dict['packages']
69-
fd.authors = [a['value'] for a in file_dict['authors']]
85+
try:
86+
fd.packages = file_dict['packages']
87+
except KeyError:
88+
fd.packages = file_dict['package_data']
89+
try:
90+
fd.authors = [a['value'] for a in file_dict['authors']]
91+
except KeyError:
92+
fd.authors = [a['author'] for a in file_dict['authors']]
7093
if file_dict['scan_errors']:
7194
# for each scan error make a notice
7295
for err in file_dict['scan_errors']:
@@ -112,12 +135,18 @@ def get_scancode_package(package_dict):
112135
object with the results'''
113136
package = Package(package_dict['name'])
114137
package.version = package_dict['version']
115-
package.pkg_license = filter_pkg_license(package_dict['declared_license'])
138+
try:
139+
package.pkg_license = filter_pkg_license(package_dict['declared_license'])
140+
package.licenses = [package_dict['declared_license'],
141+
package_dict['license_expression']]
142+
except KeyError:
143+
## https://github.com/nexB/scancode-toolkit/blob/e3099637b195daca54942df9f695f58990097896/CHANGELOG.rst#license-detection
144+
package.pkg_license = filter_pkg_license(package_dict['extracted_license_statement'])
145+
package.licenses = [li['license_expression'] for li in package_dict['license_detections']]
146+
package.licenses.append(package_dict['extracted_license_statement'])
116147
package.copyright = package_dict['copyright']
117148
package.proj_url = package_dict['repository_homepage_url']
118149
package.download_url = package_dict['download_url']
119-
package.licenses = [package_dict['declared_license'],
120-
package_dict['license_expression']]
121150
return package
122151

123152

@@ -160,8 +189,14 @@ def collect_layer_data(layer_obj):
160189
for f in data['files']:
161190
if f['type'] == 'file' and f['size'] != 0:
162191
files.append(get_scancode_file(f))
163-
for package in f['packages']:
164-
packages.append(get_scancode_package(package))
192+
try:
193+
for package in f['packages']:
194+
packages.append(get_scancode_package(package))
195+
except KeyError:
196+
# See comment in get_scancode_file() above about attribute name changes
197+
# in newer Scancode versions
198+
for package in f['package_data']:
199+
packages.append(get_scancode_package(package))
165200
return files, packages
166201

167202

tern/formats/spdx/spdx_common.py

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -241,10 +241,10 @@ def get_purl(package_obj):
241241
purl_namespace = package_obj.pkg_supplier.split(' ')[1].lower()
242242
else:
243243
purl_namespace = package_obj.pkg_supplier.split(' ')[0].lower()
244-
# TODO- this might need adjusting for alpm. Currently can't test on M1
245-
purl = PackageURL(purl_type, purl_namespace, package_obj.name.lower(), package_obj.version,
246-
qualifiers={'arch': package_obj.arch if package_obj.arch else ''})
247244
try:
245+
# TODO- this might need adjusting for alpm. Currently can't test on M1
246+
purl = PackageURL(purl_type, purl_namespace, package_obj.name.lower(), package_obj.version,
247+
qualifiers={'arch': package_obj.arch if package_obj.arch else ''})
248248
return purl.to_string()
249249
except ValueError:
250250
return ''

0 commit comments

Comments
 (0)