Skip to content

Commit e01ac70

Browse files
committed
Support equivalent words in license detection #4190
Handle similar words in license detection by allowing multiple "legalese words" to have the same token id. Regenerate the tokens ids accordingly. Convert Index.tokens_by_tid to a computed property, available on demand. Convert tokens_by_tid to a dictionary from a list. Ensure that all code relying on the tokens_by_tid is updated as needed. All locations were used only for testing and debugging. Deprecate all rules that are duplicated under this new regime, where tokens like "license" and "licence" are not treated as identical. Update test suite to test the detection of all deprecated licenses and rules as a sanity check. A rule with "relevance" set to 0 is not tested if deprecated, as some rules are deprecated because they are false positive and should no longer be detected. Also improved the validation and loading of rules relevance, including the case for zero relevance. Update ambiguous or conflicting rules as needed. In particular ensure that all rules in the style of "MIT or GPL" without a GPL version are now reported consistently as: "mit or gpl-1.0-plus" Add new rules as needed to resolve failing tests and improve accuracy. Improve deprecated support for rules and licenses, adding a new "replaced_by" list attribute that lists the new expressions that must be detected from scanning the deprecated license or rule text. Reference: #4190 Signed-off-by: Philippe Ombredanne <[email protected]>
1 parent e830934 commit e01ac70

File tree

1,945 files changed

+16176
-10594
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

1,945 files changed

+16176
-10594
lines changed

etc/scripts/licenses/buildrules.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -149,7 +149,7 @@ def rule_exists(text):
149149

150150
def all_rule_by_tokens():
151151
"""
152-
Return a mapping of {tuples of tokens: rule id}, with one item for each
152+
Return a mapping of {(tuple of token id): rule id}, with one item for each
153153
existing and added rules. Used to avoid duplicates.
154154
"""
155155
rule_tokens = {}
@@ -159,7 +159,7 @@ def all_rule_by_tokens():
159159
except Exception as e:
160160
rf = f" file://{rule.rule_file()}"
161161
raise Exception(
162-
f"Failed to to get tokens from rule:: {rule.identifier}\n" f"{rf}"
162+
f"Failed to get tokens from rule:: {rule.identifier}\n" f"{rf}"
163163
) from e
164164
return rule_tokens
165165

src/formattedcode/output_cyclonedx.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -20,9 +20,9 @@
2020
from typing import List
2121

2222
import attr
23+
from lxml import etree
2324
from commoncode.cliutils import OUTPUT_GROUP
2425
from commoncode.cliutils import PluggableCommandLineOption
25-
from lxml import etree
2626
from plugincode.output import OutputPlugin
2727
from plugincode.output import output_impl
2828

src/licensedcode/cache.py

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -276,7 +276,7 @@ def build_licensing(licenses_db=None):
276276
from licensedcode.models import load_licenses
277277

278278
licenses_db = licenses_db or load_licenses()
279-
return Licensing((LicenseSymbolLike(lic) for lic in licenses_db.values()))
279+
return Licensing(symbols=(LicenseSymbolLike(lic) for lic in licenses_db.values()))
280280

281281

282282
def build_spdx_symbols(licenses_db=None):
@@ -316,7 +316,6 @@ def get_licenses_by_spdx_key(
316316
317317
Optionally include deprecated if ``include_deprecated`` is True.
318318
319-
320319
Optionally make the keys lowercase if ``lowercase_keys`` is True.
321320
322321
Optionally include the license "other_spdx_license_keys" if present and

src/licensedcode/data/licenses/agpl-3.0-bacula.LICENSE

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,14 @@
11
---
22
key: agpl-3.0-bacula
33
is_deprecated: yes
4+
replaced_by:
5+
- bacula-exception
6+
- bsd-simplified
7+
- bsd-simplified
8+
- bsd-simplified
9+
- agpl-3.0-plus
10+
- agpl-3.0-plus
11+
- agpl-3.0
412
short_name: AGPL 3.0 with Bacula exception
513
name: AGPL 3.0 with Bacula exception
614
category: Copyleft

src/licensedcode/data/licenses/agpl-3.0-linking-exception.LICENSE

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,13 +1,15 @@
11
---
22
key: agpl-3.0-linking-exception
3+
is_deprecated: yes
4+
replaced_by:
5+
- linking-exception-agpl-3.0
36
short_name: AGPL 3.0 linking exception
47
name: AGPL 3.0 linking exception
58
category: Copyleft Limited
69
owner: Unspecified
7-
is_exception: yes
810
homepage_url: http://mo.morsi.org/blog/2009/08/13/lesser_affero_gplv3/
911
notes: renamed to linking-exception-agpl-3.0
10-
is_deprecated: yes
12+
is_exception: yes
1113
---
1214

1315
Additional permission under the GNU Affero GPL version 3 section 7:

src/licensedcode/data/licenses/agpl-3.0-openssl.LICENSE

Lines changed: 4 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,15 +1,16 @@
11
---
22
key: agpl-3.0-openssl
3+
is_deprecated: yes
4+
replaced_by:
5+
- openssl-exception-agpl-3.0
36
short_name: AGPL 3.0 with OpenSSL exception
47
name: AGPL 3.0 with OpenSSL exception
58
category: Copyleft
69
owner: MongoDB
7-
is_exception: yes
8-
is_deprecated: yes
910
notes: replaced by openssl-exception-agpl-3.0
11+
is_exception: yes
1012
---
1113

12-
1314
As a special exception, the copyright holders give permission to link the
1415
code of portions of this program with the OpenSSL library under certain
1516
conditions as described in each individual source file and distribute

src/licensedcode/data/licenses/aladdin-md5.LICENSE

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,8 @@
11
---
22
key: aladdin-md5
33
is_deprecated: yes
4+
replaced_by:
5+
- zlib
46
short_name: Aladdin MD5 License
57
name: Aladdin MD5 License
68
category: Permissive

src/licensedcode/data/licenses/aop-pd.LICENSE

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,10 @@
11
---
22
key: aop-pd
3+
is_deprecated: yes
4+
replaced_by:
5+
- cc-pd
36
short_name: AOP-PD
47
name: AOP Public Domain License
5-
is_deprecated: yes
68
category: Public Domain
79
owner: AOP Alliance Project
810
---

src/licensedcode/data/licenses/apache-2.0-linking-exception.LICENSE

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,12 +1,14 @@
11
---
22
key: apache-2.0-linking-exception
3+
is_deprecated: yes
4+
replaced_by:
5+
- compuphase-linking-exception
36
short_name: Apache 2.0 with Linking Exception
47
name: Apache 2.0 with Linking Exception
58
category: Permissive
69
owner: compuphase
710
homepage_url: https://github.com/compuphase/minIni/blob/master/LICENSE
811
is_exception: yes
9-
is_deprecated: yes
1012
---
1113

1214
EXCEPTION TO THE APACHE 2.0 LICENSE

src/licensedcode/data/licenses/apache-2.0-runtime-library-exception.LICENSE

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,8 @@
11
---
22
key: apache-2.0-runtime-library-exception
3+
is_deprecated: yes
4+
replaced_by:
5+
- apple-runtime-library-exception
36
short_name: Apache 2.0 with Runtime Library Exception
47
name: Apache 2.0 with Runtime Library Exception
58
category: Permissive
@@ -8,7 +11,6 @@ homepage_url: https://github.com/apple/swift/blob/master/LICENSE.txt#L205
811
is_exception: yes
912
other_urls:
1013
- https://swift.org/
11-
is_deprecated: yes
1214
---
1315

1416
## Runtime Library Exception to the Apache 2.0 License: ##

0 commit comments

Comments
 (0)