33Custom Pipelines
44================
55
6- A pipeline always inherits from the ``Pipeline `` base class :ref: `pipeline_base_class `
7- It define steps using the ``steps `` class method.
6+ - A pipeline is a **Python class ** that lives in a Python module as a ``.py `` **file **.
7+ - A pipeline class **always inherits ** from the ``Pipeline `` base class
8+ :ref: `pipeline_base_class `, or from another existing pipeline class, such as the
9+ :ref: `built_in_pipelines `.
10+ - It **defines steps ** using the ``steps `` classmethod.
11+
12+ See :ref: `pipelines_concept ` for more details.
813
914Pipeline registration
1015---------------------
1116
12- Built-in pipelines are located in scanpipe/pipelines/ and registered during the
13- ScanCode.io installation.
17+ Built-in pipelines are located in :guilabel: ` scanpipe/pipelines/ ` directory and
18+ registered during the ScanCode.io installation.
1419
15- Custom pipelines can be added as python files in the TBD/ directory and will be
16- automatically registered at runtime.
20+ Custom pipelines can be added as Python files ``.py `` in the directories defined in
21+ the :ref: `scancodeio_settings_pipelines_dirs ` setting and will be automatically
22+ registered at runtime.
1723
1824Create a Pipeline
1925-----------------
2026
21- Create a new Python file ``my_pipeline.py `` in the TBD/ directory.
27+ Create a new Python file ``my_pipeline.py `` in the and make sure the directory is
28+ registered in the :ref: `scancodeio_settings_pipelines_dirs ` setting.
2229
2330.. code-block :: python
2431
@@ -41,7 +48,8 @@ Create a new Python file ``my_pipeline.py`` in the TBD/ directory.
4148
4249
4350 .. tip ::
44- Have a look in the scanpipe/pipelines/ directory for more pipeline examples.
51+ Have a look in the :guilabel: `scanpipe/pipelines/ ` directory for more pipeline
52+ examples.
4553
4654Modify existing Pipelines
4755-------------------------
@@ -64,7 +72,7 @@ You may want to override existing steps, add new ones, and remove some.
6472 cls .run_scancode,
6573 cls .build_inventory_from_scan,
6674
67- # Commented-out as I'm not interested in a csv output
75+ # Commented-out as not interested in a csv output
6876 # cls.csv_output,
6977
7078 # My extra steps
@@ -77,3 +85,64 @@ You may want to override existing steps, add new ones, and remove some.
7785
7886 def extra_step2 (self ):
7987 pass
88+
89+
90+ Report step example
91+ -------------------
92+
93+ Example of a custom pipeline based on the built-in :ref: `pipeline_scan_codebase ` one
94+ with an extra reporting step.
95+
96+ Add the following content to a Python file and register its directory in the
97+ :ref: `scancodeio_settings_pipelines_dirs `.
98+
99+ .. code-block :: python
100+
101+ from collections import defaultdict
102+
103+ from jinja2 import Template
104+
105+ from scanpipe.pipelines.scan_codebase import ScanCodebase
106+
107+
108+ class ScanAndReport (ScanCodebase ):
109+ """
110+ Run the ScanCodebase built-in pipeline steps and generate a licenses report.
111+ """
112+
113+ @ classmethod
114+ def steps (cls ):
115+ return ScanCodebase.steps() + (
116+ cls .report_licenses_with_resources,
117+ )
118+
119+ # See https://jinja.palletsprojects.com/en/3.0.x/templates/ for documentation
120+ report_template = """
121+ {% for matched_text, paths in resources.items() -%}
122+ {{ matched_text }}
123+
124+ {% for path in paths -%}
125+ {{ path }}
126+ {% endfor %}
127+
128+ {% endfor %}
129+ """
130+
131+ def report_licenses_with_resources (self ):
132+ """
133+ Retrieve codebase resources filtered by license categories,
134+ Generate a licenses report file from a template.
135+ """
136+ categories = [" Commercial" , " Copyleft" ]
137+ resources = self .project.codebaseresources.licenses_categories(categories)
138+
139+ resources_by_licenses = defaultdict(list )
140+ for resource in resources:
141+ for license_data in resource.licenses:
142+ matched_text = license_data.get(" matched_text" )
143+ resources_by_licenses[matched_text].append(resource.path)
144+
145+ template = Template(self .report_template, lstrip_blocks = True , trim_blocks = True )
146+ report_stream = template.stream(resources = resources_by_licenses)
147+ report_file = self .project.get_output_file_path(" license-report" , " txt" )
148+ report_stream.dump(str (report_file))
0 commit comments