Skip to content

Commit a0d4d81

Browse files
authored
Merge pull request #2705 from nexB/release-preparation
Prepare new release
2 parents 6d2320c + fc33f14 commit a0d4d81

27 files changed

+4079
-3657
lines changed

CHANGELOG.rst

Lines changed: 154 additions & 26 deletions
Original file line numberDiff line numberDiff line change
@@ -1,24 +1,21 @@
11
Changelog
22
=========
33

4-
v21.x.x (next, future)
4+
31.0.0 (next, roadmap)
55
-----------------------
66

7+
78
Important API changes:
89
~~~~~~~~~~~~~~~~~~~~~~~~
910

10-
- The data structure of the JSON output is now versioned and the next version
11-
is available with a new command line option. We are also documenting a new
12-
and clear API policy and backward compatibility policy.
13-
1411
- The data structure of the JSON output has changed for copyrights, authors
1512
and holders: we now use proper name for attributes and not a generic "value".
1613

1714
- The data structure of the JSON output has changed for licenses: we now
1815
return match details once for each matched license expression rather than
1916
once for each license in a matched expression. There is a new top-level
20-
"licenses" attributes that contains the data details for each detected
21-
licenses only once. This data can contain the reference license text
17+
"license_references" attributes that contains the data details for each
18+
detected licenses only once. This data can contain the reference license text
2219
as an option.
2320

2421
- The data structure of the JSON output has changed for packages: we now
@@ -27,9 +24,9 @@ Important API changes:
2724
that contains each package instance that can be aggregating data from
2825
multiple manifests for a single package instance.
2926

30-
- The data structure for HTML output has been changed to include emails and urls under the
31-
"infos" object. Now HTML template will output holders, authors, emails, and
32-
urls into separate tables like "licenses" and "copyrights".
27+
- The data structure for HTML output has been changed to include emails and
28+
urls under the "infos" object. Now HTML template will output holders,
29+
authors, emails, and urls into separate tables like "licenses" and "copyrights".
3330

3431
Copyright detection:
3532
~~~~~~~~~~~~~~~~~~~~
@@ -39,45 +36,176 @@ Copyright detection:
3936
- Several copyright detection bugs have been fixed.
4037

4138

39+
License detection:
40+
~~~~~~~~~~~~~~~~~~~
41+
42+
- There have been significant license detection rules and licenses updates:
43+
44+
- XX new licenses have been added,
45+
- XX existing license metadata have been updated,
46+
- XXXX new license detection rules have been added, and
47+
- XXXX existing license rules have been updated.
48+
49+
50+
Package detection:
51+
~~~~~~~~~~~~~~~~~~
52+
53+
- We now support new package manifest formats:
54+
- OpenWRT packages.
55+
- Yocto/BitBake .bb recipes.
56+
57+
- We now support track the files of Package types.
58+
59+
60+
Outputs:
61+
~~~~~~~~
62+
63+
- There is a new CycloneDX 1.2 output as XML and JSON.
64+
65+
66+
67+
30.0.0 - 2021-09-23
68+
--------------------
69+
70+
This is a major release with new features, and several bug fixes and
71+
improvements including major updates to the license detection.
72+
73+
We have droped using calendar-based versions and are now switched back to semver
74+
versioning. To ensure that there is no ambiguity, the new major version has been
75+
updated from 21 to 30. The primary reason is that calver was not helping
76+
integrators to track major version changes like semver does.
77+
78+
We also have introduced a new JSON output format version based on semver to
79+
version the JSON output format data structure and have documented the new
80+
versioning approach.
81+
82+
4283
Package detection:
4384
~~~~~~~~~~~~~~~~~~
4485

45-
- Add support for OpenWRT packages.
46-
- Add support for Yocto/BitBake .bb recipes.
47-
- Add support to track installed files for each Package type.
86+
- The Debian packages declared license detection in machine readable copyright
87+
files and unstructured copyright has been significantly improved with the
88+
tracking of the detection start and end line of a license match. This is not
89+
yet exposed outside of tests but has been essential to help improve detection.
90+
4891
- Debian copyright license detection has been significantly improved with new
4992
license detection rules.
5093

94+
- Support for Windows packages has been improved (and in particular the handling
95+
of Windows packages detection in the Windows registry).
96+
97+
- Support for Cocoapod packages has been significantly revamped and is now
98+
working as expected.
99+
100+
- Support for PyPI packages has been refined, in particular package descriptions.
101+
102+
103+
104+
Copyright detection:
105+
~~~~~~~~~~~~~~~~~~~~
106+
107+
- The copyright detection accuracy has been improved and several bugs have been
108+
fixed.
109+
51110

52111
License detection:
53112
~~~~~~~~~~~~~~~~~~~
54113

55-
- There have been XXX new licenses added, YYY new license detection rules added
56-
and ZZZ updated license or rules.
114+
There have been some significant updates in license detection. We now track
115+
34,164 license and license notices:
116+
117+
- 84 new licenses have been added,
118+
- 34 existing license metadata have been updated,
119+
- 2765 new license detection rules have been added, and
120+
- 2041 existing license rules have been updated.
121+
57122

58123
- Several license detection bugs have fixed.
59124

60-
- The SPDX license list 3.14 is now supported. We also include the version
61-
of the SPDX license list in the ScanCode JSON and SPDX outputs, as well as
62-
display it with the --version command line option.
125+
- The SPDX license list 3.14 is now supported and has been synced with the
126+
licensedb. We also include the version of the SPDX license list in the
127+
ScanCode YAML, JSON and the SPDX outputs, as well as display it with the
128+
"--version" command line option.
63129

64-
- Unknown licenses have a new flag "is_unknown" to identify them
65-
beyond just the naming convention of having "unknown" as part of their name.
130+
- Unknown licenses have a new flag "is_unknown" in their metadata to identify
131+
them explicitly. Before that we were just relying on the naming convention of
132+
having "unknown" as part of a license key.
66133

67134
- Rules that match at least one unknown license have a flag "has_unknown" set
68-
in the returned match results.
69-
70-
- There is a new experimental command line option "--unknown-licenses" to
71-
detect unknown licenses and follow license references such as "See license in
72-
file COPYING". The actual data structure for this new option is evolving.
135+
and returned in the match results.
73136

137+
- Experimental: License detection can now "follow" license mentions that
138+
reference another file such as "see license in COPYING" where we can relate
139+
this mention to the actual license detected in the COPYING file. Use the new
140+
"--unknown-licenses" command line option to test this new feature.
141+
This feature will evolve significantly in the next version(s).
74142

75-
Many thanks to every contributors that made this possible and in particular:
143+
144+
Outputs:
145+
~~~~~~~~
146+
147+
- The SPDX output now has the mandatory ids attribute per SPDX spec. And we
148+
support SPDX 2.2 and SPDX license list 3.14.
149+
150+
151+
Miscellaneous
152+
~~~~~~~~~~~~~~~
153+
154+
- There is a new "--no-check-version" CLI option to scancode to bypass live,
155+
remote outdated version check on PyPI
156+
157+
- The scan results and the CLI now display an outdated version warning when
158+
the installed ScanCode version is older than 90 days. This is to warn users
159+
that they are relying on outdated, likely buggy, insecure and inaccurate scan
160+
results and encourage them to update to a newer version. This is made entirely
161+
locally based on date comparisons.
162+
163+
- We now display again the command line progressbar counters correctly.
164+
165+
- A bug has been fixed in summarization.
166+
167+
- Generated code detection has been improved with several new keywords.
168+
169+
170+
Thank you!
171+
~~~~~~~~~~~~
172+
173+
Many thanks to the many contributors that made this release possible and in
174+
particular:
76175

77176
- Akanksha Garg @akugarg
177+
- Armijn Hemel @armijnhemel
78178
- Ayan Sinha Mahapatra @AyanSinhaMahapatra
179+
- Bryan Sutula @sutula
180+
- Chin-Yeung Li @chinyeungli
181+
- Dennis Clark @DennisClark
182+
- dyh @yunhua-deng
183+
- Dr. Frank Heimes @FrankHeimes
184+
- gunaztar @gunaztar
185+
- Helio Chissini de Castro @heliocastro
186+
- Henrik Sandklef @hesa
187+
- Jiyeong Seok @dd-jy
188+
- John M. Horan @johnmhoran
79189
- Jono Yang @JonoYang
190+
- Joseph Heck @heckj
191+
- Luis Villa @tieguy
192+
- Konrad Weihmann @priv-kweihmann
193+
- mapelpapel @mapelpapel
194+
- Maximilian Huber @maxhbr
195+
- Michael Herzog @mjherzog
196+
- MMarwedel @MMarwedel
197+
- Mikko Murto @mmurto
198+
- Nishchith Shetty @inishchith
199+
- Peter Gardfjäll @petergardfjall
80200
- Philippe Ombredanne @pombredanne
201+
- Rainer Bieniek @rbieniek
202+
- Roshan Thomas @Thomshan
203+
- Sadhana @s4-2
204+
- Sarita Singh @itssingh
205+
- Siddhant Khare @Siddhant-K-code
206+
- Soim Kim @soimkim
207+
- Thorsten Godau @tgodau
208+
- Yunus Rahbar @yns88
81209

82210

83211
v21.8.4

docs/source/misc/index.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -7,3 +7,4 @@
77
faq
88
support
99
perf_report
10+
versioning

docs/source/misc/versioning.rst

Lines changed: 57 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,57 @@
1+
.. versioning:
2+
3+
4+
Versioning approach
5+
==========================
6+
7+
ScanCode is composed of code and data (mostly license data used for license
8+
detection). In the past, we have tried using calver for code versioning to also
9+
convey that the data contained in ScanCode was updated but it proved to be not
10+
as clear and as effective as planned so we are switching back to semver which is
11+
simpler and overall more useful for users. We also want to provide hints about
12+
JSON output data format changes.
13+
14+
Therefore, this is our versioning approach starting with version 30.0.0:
15+
16+
- ScanCode releases are versioned using semver as documented at
17+
https://semver.org using major.minor.patch versioning.
18+
19+
- Significant changes to the data (license or copyright detection) is considered
20+
a major version change even if there are no code changes. The rationale is
21+
that in our case the data has the same impact as the code. Using outdated data
22+
is like using old code and means that several licenses may not be detected
23+
correctly. Any data change triggers at least a minor version change.
24+
25+
- We will signal separately to users with warnings messages when ScanCode needs
26+
to be upgraded because its data and/or code are out of date.
27+
28+
29+
In addition to the main code version, we also maintain a secondary output data
30+
format version using also semver with two segments. The versioning approach is
31+
adapted for data this way:
32+
33+
- The first segment --the major version-- is incremented when data attributes
34+
that are removed, renamed, changed or moved (but not reordered) in the JSON
35+
output. Reordering the attributes of a JSON object is not considered as a
36+
change and does not trigger a version change.
37+
38+
- The second segment --the minor version-- of the output format is incremented
39+
for an addition of attributes to the JSON output.
40+
41+
- We store the output format version string in the JSON output object as the
42+
first attribute and display that also in the help.
43+
44+
- This output format versioning applies only to the JSON, pretty-printed JSON,
45+
YAML and JSON lines formats. It does not apply to CSV and any other formats.
46+
For these other formats there is no versioning and guaranteed format stability
47+
(or there may be some other rationale and convention for versioning like for
48+
SPDX).
49+
50+
- The output format version is incremented by when a new ScanCode tagged release
51+
is published
52+
53+
- We document in the CHANGELOG the output format changes in any new format version.
54+
55+
- For any format version changes, we will provide a documentation on the format
56+
and its updates using JSON examples and a comprehensive and updated data
57+
dictionary. See https://github.com/nexB/scancode-toolkit/issues/2008 for details.

src/cluecode/copyrights.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1591,7 +1591,7 @@ def from_node(
15911591
(r'^[A-Z]+[.][A-Z][a-z]+[,]?$', 'NNP'),
15921592

15931593
# proper noun with apostrophe ': D'Orleans, D'Arcy, T'so, Ts'o
1594-
(r"^[A-Z][[a-z]?['][A-Z]?[a-z]+[,.]?$", 'NNP'),
1594+
(r"^[A-Z][a-z]?['][A-Z]?[a-z]+[,.]?$", 'NNP'),
15951595

15961596
# proper noun with apostrophe ': d'Itri
15971597
(r"^[a-z]['][A-Z]?[a-z]+[,\.]?$", 'NNP'),

0 commit comments

Comments
 (0)