@@ -82,3 +82,53 @@ When scanning binaries, the line numbers are just a relative indication of where
8282a detection was found: there is no such thing as lines in a binary. The numbers
8383reported are based on the strings extracted from the binaries, typically broken
8484as new lines with each NULL character.
85+
86+
87+ How does ``--license-text `` for ScanCode works exactly?
88+ -------------------------------------------------------------
89+
90+ Is the matched text that gets included into the result exactly the lines of text
91+ from the input file that are covered by the ``start_line `` and ``end_line ``
92+ fields of the result? I.e., if I would post-process the input file and extract
93+ ``start_line `` to ``end_line `` from it, would I get exactly the ``matched_text ``
94+ contents? Or is there some more "magic" involved when populating the
95+ ``matched_text `` field?
96+
97+ ScanCode is a bit smarter than just start and end line, as matching is based on
98+ words, not lines of the actual scanned text. And a whole line may not always be matched.
99+
100+ For instance with this command::
101+
102+ $ echo "Foo is a wonder piece of code. Licensed under the GPL. " \
103+ "For support contact [email protected] " > tst 104+ $ scancode --license --license-text --license-text-diagnostics --yaml - tst
105+ ...
106+ license_detections:
107+ - license_expression: gpl-1.0-plus
108+ license_expression_spdx: GPL-1.0-or-later
109+ matches:
110+ - license_expression: gpl-1.0-plus
111+ license_expression_spdx: GPL-1.0-or-later
112+ from_file: tst
113+ start_line: 1
114+ end_line: 1
115+ matcher: 2-aho
116+ score: '100.0'
117+ matched_length: 4
118+ match_coverage: '100.0'
119+ rule_relevance: 100
120+ rule_identifier: gpl_85.RULE
121+ rule_url: https://github.com/nexB/scancode-toolkit/tree/develop/src/licensedcode/data/rules/gpl_85.RULE
122+ matched_text: Foo is a wonder piece of code. Licensed under the GPL.
123+ For support contact [email protected] 124+ matched_text_diagnostics: Licensed under the GPL.
125+ ...
126+
127+ then:
128+
129+ - ``matched_text `` is based on ``start_line `` and ``end_line ``
130+ - ``matched_text_diagnostics `` is based on the exact matched words
131+
132+ Note that ``matched_text_diagnostics `` also includes "tagged" gaps or extra
133+ unmatched words highlighted between the matched words.
134+
0 commit comments