@@ -82,3 +82,62 @@ When scanning binaries, the line numbers are just a relative indication of where
8282a detection was found: there is no such thing as lines in a binary. The numbers
8383reported are based on the strings extracted from the binaries, typically broken
8484as new lines with each NULL character.
85+
86+
87+ How does ``--license-text`` for ScanCode works exactly?
88+ -------------------------------------------------------------
89+
90+ I have a question about how ``--license-text`` for ScanCode works exactly:
91+ Is the matched text that gets included into the result exactly the lines of text
92+ from the input file that are covered by the ``start_line`` and ``end_line``
93+ fields of the result? I.e., if I would post-process the input file and extract
94+ ``start_line`` to ``end_line`` from it, would I get exactly the ``matched_text ``
95+ contents? Or is there some more "magic" involved when populating the
96+ ``matched_text`` field?
97+
98+ ScanCode is a bit smarter than just start and end line, as matching is based on
99+ words, not lines of the actual scanned text.
100+ And a whole line may not always be matched.
101+
102+ For instance with this command::
103+
104+ $ echo "Foo is a wonder piece of code. Licensed under the GPL. For support contact [email protected] " > tst 105+ $ scancode --license --license-text --license-text-diagnostics --yaml - tst
106+ ...
107+ license_detections:
108+ - license_expression: gpl-1.0-plus
109+ license_expression_spdx: GPL-1.0-or-later
110+ matches:
111+ - license_expression: gpl-1.0-plus
112+ license_expression_spdx: GPL-1.0-or-later
113+ from_file: tst
114+ start_line: 1
115+ end_line: 1
116+ matcher: 2-aho
117+ score: '100.0'
118+ matched_length: 4
119+ match_coverage: '100.0'
120+ rule_relevance: 100
121+ rule_identifier: gpl_85.RULE
122+ rule_url: https://github.com/nexB/scancode-toolkit/tree/develop/src/licensedcode/data/rules/gpl_85.RULE
123+ matched_text: Foo is a wonder piece of code. Licensed under the GPL.
124+ For support contact [email protected] 125+ matched_text_diagnostics: Licensed under the GPL.
126+ ...
127+
128+ then:
129+
130+ - ``matched_text `` is based on ``start_line `` and ``end_line ``
131+ - ``matched_text_diagnostics `` is based on the exact matched words (and it includes "tagged" gaps or extra)
132+
133+
134+
135+
136+
137+
138+
139+
140+
141+
142+
143+
0 commit comments