-
-
Notifications
You must be signed in to change notification settings - Fork 600
Plugin Architecture GSoC 2017 Final Report
Yash D. Saraf <[email protected]>
This project’s purpose was to create a decoupled plugin architecture for ScanCode such that it can handle plugins at different stages of a scan and can be coupled at runtime. These stages were,
1. Format:
In this stage, the plugins are supposed to run after the scanning is done and post-scan
plugins are called. These plugins could be used for
- converting the scanned output to the given format (say csv, json, etc.)
Here, a plugin needs to add an entry in the scancode_output_writers
entry point in the following format
'<format> = <module>:<function>'
-
<format>
is the format name which will be used as the command line option name (e.g csv or json). -
<module>
is a python module which implements theoutput
hook specification. -
<function>
is the function to which the scan output will be passed if this plugin is called.
The <format>
name will be automatically added to the --format
command line option and (if called) the scanned data will be passed to the plugin.
2. Post-scan:
In this stage, the plugins are supposed to run after the scanning is done. Some uses for these plugins were
-
summarization of scan outputs
e.g A post-scan plugin for markingis_source
to true for directories with ~90% of source files -
simplification of scan outputs
e.g The--only-findings
option to return files or directories with findings for the requested scans. Files and directories without findings are omitted (not considering basic file information as findings)).
This option already existed, I just ported it to a post-scan plugin.
Here, a plugin needs to add an entry in the scancode_post_scan
entry point in the following format
'<name> = <module>:<function>'
-
<name>
is the command line option name (e.g only-findings). -
<module>
is a python module which implements thepost_scan
hook specification. -
<function>
is the function to which the scanned files will be passed if this plugin is called.
The command line option for this plugin will be automatically created using the <function>
's doctring as its help text and (if called) the scanned files will be passed to the plugin.
3. Pre-scan:
In this stage, the plugins are supposed to run before the scan starts. So the potential uses for these types of plugins were to
- ignore files based on a given pattern (glob)
- ignore files based on their info i.e size, type etc.
- extract archives before scanning
Here, a plugin needs to add an entry in the scancode_pre_scan
entry point in the following format
'<name> = <module>:<class>'
-
<name>
is the command line option name (e.g ignore). -
<module>
is a python module which implements thepre_scan
hook specification. -
<class>
is the class which is instantiated and its appropriate method is invoked if this plugin is called. This needs to extend theplugincode.pre_scan.PreScanPlugin
class.
The command line option for this plugin will be automatically created using the <class>
's doctring as its help text.
Since there isn't a single spot where pre-scan
plugins can be plugged in, more methods to PreScanPlugin
class can be added which can represent different hooks, say to add or delete a scan there might be a method called process_scan
.
If a plugin's option is passed by the user, then the <class>
is instantiated with the user input and its appropriate aforementioned methods are called.
In this stage, the plugins are supposed to run before the scan starts and after the pre-scan
plugins are called. These plugins would have been used for
- adding or deleting scans
- adding dependency scans (whose data could be used in other scans)
No development has been done for this stage, but it will be quite similar to pre-scan
.
Group cli options in cli help
Here, the goal was to add command line options to pre-defined groups such that they are displayed in their respective groups when scancode -h
or scancode --help
is called. This helped to better visually represent the command line options and determine more easily what context they belong to.
Add a Resource
class to hold all scanned info for a resource Ongoing
Here, the goal was to create a Resource
class such that it holds all the scanned data for a resource (i.e a file or a directory).
This class would go on to eventually encapsulate the caching logic entirely. For now, it just holds the info
and path
of a resource.
- Pre-scan plugin for archive extraction
- Scan (proper) plugins
- More complex post-scan plugins
- Support plugins written in languages other than python
Additionally, all my commits can be found here.
See http://nexb.com for more.