Skip to content

Commit 1a1c540

Browse files
Merge pull request #58 from AyanSinhaMahapatra/pre-release-tasks
Package scancode-analyzer
2 parents c7d9b5c + 5befd36 commit 1a1c540

20 files changed

+215
-159
lines changed

CHANGELOG.rst

Lines changed: 6 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,7 @@
1-
Release notes
2-
-------------
3-
### Version 0.0.0
1+
Changelog
2+
=========
43

5-
*xxxx-xx-xx* -- Initial release.
4+
v21.4.2
5+
-------
6+
7+
Initial release.

INSTALL.rst

Lines changed: 23 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -1,9 +1,15 @@
1-
Quickstart - Scancode Plugin
2-
----------------------------
1+
Installation
2+
============
33

4-
``scancode-results-analyzer`` can be installed as a scancode post-scan plugin.
4+
The installation methods install the `scancode-analyzer` post-scan plugin, installed
5+
with `scancode`, extending it to have the `--analyze-license-results` option.
56

6-
1. Clone the Repository and navigate to the ``scancode-results-analyzer`` directory.
7+
Install Plugin from Source
8+
--------------------------
9+
10+
``scancode-analyzer`` can be installed as a scancode post-scan plugin.
11+
12+
1. Clone the Repository and navigate to the ``scancode-analyzer`` directory.
713

814
2. Configure (Installs the requirements, and scancode-toolkit with the plugin)::
915

@@ -23,13 +29,24 @@ Quickstart - Scancode Plugin
2329

2430
6. OR, import a JSON scan result and run the plugin on that scan::
2531

26-
scancode --json-pp results.json --from-json tests/data/results-test/selective-before-rules-added/only_errors.json --analyze-license-results
32+
scancode --json-pp results.json --from-json path/to/scan_result.json --analyze-license-results
2733

2834
.. note::
2935

30-
`scancode-results-analyzer` has required CLI options, as these produce attributes
36+
`scancode-analyzer` has required CLI options, as these produce attributes
3137
essential to the analysis process. These are:
3238
`--license --info --license-text --is-license-text --classify`
3339
Even when loading from json, the scan generating these json files should have
3440
been run with this options for the analysis plugin to work.
3541

42+
43+
Install plugin via `pip`
44+
------------------------
45+
46+
1. Install all `scancode` `prerequisites`_ and create a `virtualenvironment`_.
47+
48+
2. Run `pip install scancode-analyzer` to install the latest version of Scancode Analyzer.
49+
50+
51+
.. _virtualenvironment: https://scancode-toolkit.readthedocs.io/en/latest/getting-started/install.html#installation-as-a-library-via-pip
52+
.. _prerequisites: https://scancode-toolkit.readthedocs.io/en/latest/getting-started/install.html#prerequisites

README.rst

Lines changed: 20 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -1,19 +1,22 @@
1-
scancode-results-analyzer
2-
=========================
1+
scancode-analyzer
2+
=================
33

4-
.. what-is-scancode-results-analyzer
4+
.. what-is-scancode-analyzer
55
6-
What is Scancode-Results-Analyzer
7-
---------------------------------
6+
What is Scancode-Analyzer
7+
-------------------------
88

9-
`ScanCode`_ detects licenses, copyrights, package manifests and direct dependencies and more both in source code and
10-
binary files.
9+
`ScanCode`_ detects licenses, copyrights, package manifests and direct dependencies and more both in
10+
source code and binary files.
1111

12-
ScanCode license detection is using multiple techniques to accurately detect licenses based on automatons, inverted
13-
indexes and multiple sequence alignments. The detection is not always accurate enough. The goal of this project is to
14-
improve the accuracy of license detection leveraging the ClearlyDefined and other datasets, where ScanCode is used
15-
to massively scan millions of packages. It would also be available as a `ScanCode`_ ``post-scan`` plugin to use it
16-
in scans directly, or in `scancode.io`_ pipelines.
12+
ScanCode license detection is using multiple techniques to accurately detect licenses based on
13+
automatons, inverted indexes and multiple sequence alignments. As the detection supports approximate
14+
matching, there's a lot of `unknown` detections, or multiple approximate matches.
15+
16+
The goal of this project is to improve the accuracy of license detection leveraging scancode scans,
17+
18+
It is a `ScanCode`_ ``post-scan`` plugin to use it in scans directly, and in future as
19+
`scancode.io`_ pipelines, with better issue review and reporting features.
1720

1821
This project aims to:
1922

@@ -22,7 +25,7 @@ This project aims to:
2225
- Add this as a `scancode`_ post-scan plugin
2326
- Add to pipelines in `scancode.io`_
2427
- Write reusable tools and models to assist in the semi-automated reviews of scan results.
25-
- It will also create new license detection rules semi-automatically to fix the detected anomalies
28+
- It will also suggest new license detection rules semi-automatically to fix the detected anomalies
2629

2730
.. _ScanCode: https://github.com/nexB/scancode-toolkit
2831
.. _scancode.io: https://github.com/nexB/scancode.io
@@ -37,12 +40,12 @@ Refer to the installation instructions on `INSTALL.rst`_
3740
Documentation
3841
-------------
3942

40-
Documentation: https://scancode-results-analyzer.readthedocs.io/en/latest/ [WIP]
43+
Documentation: https://scancode-analyzer.readthedocs.io/en/latest/
4144

4245
Project Board
4346
-------------
4447

45-
`Project Board`_ for ``scancode-results-analyzer`` : Analysing Scancode License Detection Results.
48+
`Project Board`_ for ``scancode-analyzer`` : Analysing Scancode License Detection Results.
4649

47-
.. _INSTALL.rst: https://github.com/nexB/scancode-results-analyzer/tree/master/INSTALL.rst
48-
.. _Project Board: https://github.com/nexB/scancode-results-analyzer/projects/1
50+
.. _INSTALL.rst: https://github.com/nexB/scancode-analyzer/tree/master/INSTALL.rst
51+
.. _Project Board: https://github.com/nexB/scancode-analyzer/projects/1

docs/source/analysis-use-case/suggesting-licenses.rst

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -56,7 +56,7 @@ The steps are as follows:
5656
1. First from the list of `license expressions`, all the `license expressions` are sorted according
5757
to their occurrences.
5858

59-
2. Generic `license_expressions` like `unknown`, `warranty-disclaimer` are removed fro, this sorted
59+
2. Generic `license_expressions` like `unknown`, `warranty-disclaimer` are removed from this sorted
6060
list.
6161

6262
3. If there's only one `license_expression` with the most number of occurrences, then that is the
@@ -73,7 +73,7 @@ The steps are as follows:
7373
1. The boolean value denoting the license type, i.e. license text/notice/tag/reference is determined
7474
from their respective class of problem, which they are already divided into.
7575

76-
2. The ``ignorable`` attributes are added later by using scripts.
76+
2. The ``ignorable`` attributes could be added later by using scripts.
7777

7878
3. The possible license id (like ``mit``) is predicted as the license ID of the match with the
7979
longest ``match_coverage``. This has to be manually verified in most cases.

docs/source/api-and-outputs/json-output.rst

Lines changed: 48 additions & 31 deletions
Original file line numberDiff line numberDiff line change
@@ -1,13 +1,13 @@
11
JSON Output Format
22
==================
33

4-
`scancode-results-analyzer` is meant to be used as a post-scan Plugin for Scancode, where after
4+
`scancode-analyzer` is meant to be used as a post-scan Plugin for Scancode, where after
55
running a scan, the scan results are then analyzed for scan errors, and that information is
66
added to the scancode JSON results.
77

8-
Command Line Argument to use ``scancode-results-analyzer``: ``--analyze-license-results``
8+
Command Line Argument to use ``scancode-analyzer``: ``--analyze-license-results``
99

10-
Here's how example result-JSONs from `scancode-results-analyzer` could look like, post-analysis.
10+
Here's how example result-JSONs from `scancode-analyzer` could look like, post-analysis.
1111

1212
.. _license_detection_issues_result_json:
1313

@@ -23,13 +23,6 @@ for each resource in the codebase this list of dictionary will be added, where e
2323
is for each corresponding file-region :ref:`file_region`, having the results of the analysis for all
2424
the match(es) in that file-region.
2525

26-
.. note::
27-
28-
[WIP]
29-
There would also be a codebase-level dictionary added,
30-
1. With statistics on the license_detection issues.
31-
2. All the unique license detection issues and their occurrences.
32-
3. Header information.
3326

3427
.. code-block:: json
3528
@@ -110,6 +103,7 @@ a file-region, and containing analysis results for all the license matches in a
110103
"is_license_notice": true,
111104
"is_license_tag": false,
112105
"is_license_reference": false,
106+
"is_license_intro": false,
113107
"analysis_confidence": "high",
114108
"is_suggested_matched_text_complete": true
115109
},
@@ -159,6 +153,9 @@ location.
159153
"licenses": [
160154
{
161155
"key": "lgpl-2.0"
156+
},
157+
{
158+
"key": "gpl-3.0-plus"
162159
}
163160
],
164161
"licence_detection_issues": [
@@ -174,13 +171,19 @@ location.
174171
"is_license_notice": true,
175172
"is_license_tag": false,
176173
"is_license_reference": false,
174+
"is_license_intro": false,
177175
"analysis_confidence": "medium",
178176
"is_suggested_matched_text_complete": true
179177
},
180178
"suggested_license": {
181179
"license_expression": "lgpl-2.0-plus",
182180
"matched_text": " * licensed under the terms of the LGPL.... "
183-
}
181+
},
182+
"original_licenses": [
183+
{
184+
"key": "lgpl-2.0"
185+
}
186+
]
184187
},
185188
{
186189
"start_line": 54,
@@ -194,14 +197,19 @@ location.
194197
"is_license_notice": true,
195198
"is_license_tag": false,
196199
"is_license_reference": false,
200+
"is_license_intro": false,
197201
"analysis_confidence": "high",
198202
"is_suggested_matched_text_complete": true
199203
},
200204
"suggested_license": {
201205
"license_expression": "gpl-3.0-plus",
202206
"matched_text": "\"genshellopt is free software: you can redistribute it and/or modify it under \\\nthe terms of the GNU General Public License as published by the Free Software \\\nFoundation, either version 3 of the License, or (at your option) any later \\\nversion."
203207
},
204-
"original_licenses": []
208+
"original_licenses": [
209+
{
210+
"key": "gpl-3.0-plus"
211+
}
212+
]
205213
}
206214
]
207215
}
@@ -260,6 +268,7 @@ it is an empty list.
260268
"is_license_notice": true,
261269
"is_license_tag": false,
262270
"is_license_reference": false,
271+
"is_license_intro": false,
263272
"analysis_confidence": "medium",
264273
"is_suggested_matched_text_complete": true
265274
},
@@ -304,13 +313,19 @@ it is an empty list.
304313
"is_license_notice": true,
305314
"is_license_tag": false,
306315
"is_license_reference": false,
316+
"is_license_intro": false,
307317
"analysis_confidence": "medium",
308318
"is_suggested_matched_text_complete": true
309319
},
310320
"suggested_license": {
311321
"license_expression": "lgpl-2.0-plus",
312322
"matched_text": " * licensed under the terms of the LGPL. "
313-
}
323+
},
324+
"original_licenses": [
325+
{
326+
"key": "unknown"
327+
}
328+
]
314329
}
315330
]
316331
}
@@ -336,22 +351,24 @@ All Unique License Detection Issues
336351

337352
.. code-block:: json
338353
339-
"unique_license_detection_issues": [
340-
{
341-
"unique_identifier": 1,
342-
"files": [
343-
{
344-
"path": "1921-socat-2.0.0-error.h",
345-
"start_line": 3,
346-
"end_line": 3
354+
{
355+
"unique_license_detection_issues": [
356+
{
357+
"unique_identifier": 1,
358+
"files": [
359+
{
360+
"path": "1921-socat-2.0.0-error.h",
361+
"start_line": 3,
362+
"end_line": 3
363+
}
364+
],
365+
"license_detection_issue": {
366+
"issue_category": "imperfect-match-coverage",
367+
"issue_description": "The license detection is inconclusive with high confidence, because only a small part of the rule text is matched."
347368
}
348-
],
349-
"license_detection_issue": {
350-
"issue_category": "imperfect-match-coverage",
351-
"issue_description": "The license detection is inconclusive with high confidence, because only a small part of the rule text is matched."
352369
}
353-
}
354-
]
370+
]
371+
}
355372
356373
357374
Basic Statistics
@@ -395,7 +412,7 @@ BERT model versions used.
395412
396413
{
397414
"header": {
398-
"tool_name": "scancode-results-analyzer",
415+
"tool_name": "scancode-analyzer",
399416
"version": 0.1,
400417
"cases_version": 0.1,
401418
"ml_models": [
@@ -434,7 +451,7 @@ BERT model versions used.
434451
Related Issues
435452
--------------
436453
437-
- `nexB/scancode-results-analyzer#22 <https://github.com/nexB/scancode-results-analyzer/issues/22>`_
438-
- `nexB/scancode-results-analyzer#20 <https://github.com/nexB/scancode-results-analyzer/issues/20>`_
439-
- `nexB/scancode-results-analyzer#21 <https://github.com/nexB/scancode-results-analyzer/issues/21>`_
454+
- `nexB/scancode-analyzer#22 <https://github.com/nexB/scancode-analyzer/issues/22>`_
455+
- `nexB/scancode-analyzer#20 <https://github.com/nexB/scancode-analyzer/issues/20>`_
456+
- `nexB/scancode-analyzer#21 <https://github.com/nexB/scancode-analyzer/issues/21>`_
440457

docs/source/conf.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -17,8 +17,8 @@
1717

1818
# -- Project information -----------------------------------------------------
1919

20-
project = 'scancode-results-analyzer'
21-
copyright = '2020, nexb'
20+
project = 'scancode-analyzer'
21+
copyright = '2021, nexb'
2222
author = 'nexb'
2323

2424
# -- General configuration ---------------------------------------------------

docs/source/how-analysis-is-performed/cases-incorrect-scans.rst

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -134,6 +134,10 @@ All Issue Types
134134
- ``reference-false-positive``
135135
- A piece of code/text is incorrectly detected as a license.
136136

137+
* - ``intro``
138+
- ``intro-unknown-match``
139+
- A piece of common introduction to a license text/notice/reference is detected.
140+
137141
.. _case_lic_text:
138142

139143
License Texts

docs/source/how-analysis-is-performed/selecting-incorrect-unique.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -44,7 +44,7 @@ Why we need to divide matches in a file into file-regions:
4444

4545
2. If there are multiple matches in a region, they need to be analyzed as a whole, as even if most
4646
matches have perfect ``score`` and ``match_coverage``, only one of them with a imperfect
47-
`match_coverage`` would mean there is a issue with that whole file-region. For example one
47+
``match_coverage`` would mean there is a issue with that whole file-region. For example one
4848
license notice can be matched to a notice rule with imperfect scores, and several small
4949
license reference rules.
5050

docs/source/index.rst

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -1,18 +1,18 @@
1-
.. scancode-results-analyzer documentation master file, created by
1+
.. scancode-analyzer documentation master file, created by
22
sphinx-quickstart on Fri Oct 30 21:27:08 2020.
33
You can adapt this file completely to your liking, but it should at least
44
contain the root `toctree` directive.
55
6-
Welcome to `scancode-results-analyzer` Documentation!
7-
=====================================================
6+
Welcome to `scancode-analyzer` Documentation!
7+
=============================================
88

99

1010
.. include:: ../../README.rst
11-
:start-after: what-is-scancode-results-analyzer
11+
:start-after: what-is-scancode-analyzer
1212
:end-before: from-github-links
1313

14-
Getting Started with `scancode-results-analyzer`
15-
------------------------------------------------
14+
Getting Started with `scancode-analyzer`
15+
----------------------------------------
1616

1717
.. toctree::
1818
:maxdepth: 3

scancode-analyzer.ABOUT

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,7 @@
1+
about_resource: .
2+
name: scancode-analyzer
3+
license_expression: apache-2.0
4+
copyright: Copyright (c) nexB Inc. and others.
5+
homepage_url: https://github.com/nexB/scancode-analyzer
6+
vcs_url: git+https://github.com/nexB/scancode-analyzer
7+
bug_tracking_url: https://github.com/nexB/scancode-analyzer/issues

0 commit comments

Comments
 (0)