@@ -8,12 +8,13 @@ Project
88
99A **project ** encapsulates the analysis of software code:
1010
11- - it has a **workspace ** which is a directory that contains the software code files under
12- analysis
13- - it is related to one or more **code analysis pipelines ** scripts to automate its analysis
14- - it tracks ``Codebase Resources `` e.g. its **code files and directories **
15- - it tracks ``Discovered Packages `` e.g. its the **system and application packages ** origin and
16- license discovered in the codebase
11+ - It has a **workspace **, which is a directory that contains the software code
12+ files under analysis.
13+ - It makes use of one or more **code analysis pipelines ** scripts to automate
14+ the code analysis process.
15+ - It tracks ``Codebase Resources ``, i.e. its **code files and directories **
16+ - It tracks ``Discovered Packages ``, i.e. **system and application packages **
17+ origin and license discovered in the codebase.
1718
1819In the database, **a project is identified by its unique name **.
1920
@@ -25,78 +26,82 @@ In the database, **a project is identified by its unique name**.
2526Project workspace
2627-----------------
2728
28- A project workspace is the root directory where **all the project files are stored **.
29+ A project workspace is the root directory where **a project's files are stored **.
2930
30- The following directories exists under this workspace directory:
31+ The following directories exist under the workspace directory:
3132
32- - :guilabel: `input/ ` contains all the original uploaded and input files used of the project.
33- For instance, it could be a codebase archive.
34- - :guilabel: `codebase/ ` contains the files and directories (aka. resources) tracked as
35- CodebaseResource records in the database.
36- - :guilabel: `output/ ` contains all output files created by the pipelines: reports,
37- scan results, etc.
38- - :guilabel: `tmp/ ` is a scratch pad for temporary files generated during the pipelines runs.
33+ - :guilabel: `input/ ` contains all uploaded files used as the input of a project,
34+ such as a codebase archive.
35+ - :guilabel: `codebase/ ` contains files and directories - i.e. resources -
36+ tracked as CodebaseResource records in the database.
37+ - :guilabel: `output/ ` contains any output files created by the pipelines,
38+ including reports, scan results, etc.
39+ - :guilabel: `tmp/ ` is a scratch pad for temporary files generated during
40+ pipelines runs.
3941
4042.. _pipelines_concept :
4143
4244Pipelines
4345---------
4446
45- A pipeline is a Python script that contains a series of steps from start to end
46- to execute in order to **perform a code analysis **.
47+ A pipeline is a Python script that contains a series of steps, which are
48+ executed sequentially to **perform a code analysis **.
4749
48- It usually starts from the uploaded input files, and may extract these then
49- generates ``CodebaseResource `` records in the database accordingly.
50+ It usually starts with the uploaded input files, which might need to be
51+ extracted first. Then, it generates ``CodebaseResource `` records in the database
52+ accordingly.
5053
5154Those resources can then be **analyzed, scanned, and matched ** as needed.
5255Analysis results and reports are eventually posted at the end of a pipeline run.
5356
54- All pipelines are located in the ``scanpipe.pipelines `` module.
55- Each pipeline consist of a Python script including one subclass of the ``Pipeline `` class.
57+ All :ref: `built_in_pipelines ` are located in the ``scanpipe.pipelines `` module.
58+ Each pipeline consists of a Python script and includes one subclass of the
59+ ``Pipeline `` class.
5660Each step is a method of the ``Pipeline `` class.
57- The execution order of the steps is declared through the ``steps `` class attribute
58- which is a sequence of steps to execute.
61+ The execution order of the steps - or the sequence of steps execution - is
62+ declared through the ``steps `` class attribute.
63+
64+ .. tip ::
65+ Refer to :ref: `custom_pipelines ` for adding pipelines to ScanCode.io.
5966
6067.. note ::
6168 One or more pipelines can be assigned to a project as a sequence.
6269
63-
6470Codebase Resources
6571------------------
6672
6773A project ``Codebase Resources `` are records of its **code files and directories **.
6874``CodebaseResource `` is a database model and each record is identified by its path
6975under the project workspace.
7076
71- Some of the ``CodebaseResource `` interesting attributes are :
77+ The following are some of the ``CodebaseResource `` attributes:
7278
73- - a **status ** used to track the analysis status for this resource.
74- - a **type ** ( such as file, directory or symlink)
75- - various attributes to track detected **copyrights **, **license expressions **,
79+ - A **status **, which is used to track the analysis status for this resource.
80+ - A **type **, such as a file, a directory or a symlink
81+ - Various attributes to track detected **copyrights **, **license expressions **,
7682 **copyright holders **, and **related packages **.
7783
7884.. note ::
79- In general the attributes and their names are the same that are used in
80- `ScanCode-toolkit <https://github.com/nexB/scancode-toolkit >`_ for files.
81-
85+ Please note that `ScanCode-toolkit <https://github.com/nexB/scancode-toolkit >`_
86+ use the same attributes and attribute names for files.
8287
8388Discovered Packages
8489-------------------
8590
8691A project ``Discovered Packages `` are records of the **system and application packages **
87- discovered in its code.
92+ discovered in the code unedr analysis .
8893``DiscoveredPackage `` is a database model and each record is identified by its ``Package URL ``.
89- ``Package URL `` is a grassroot efforts to create informative identifiers for software
90- packages such as Debian, RPM, npm, Maven, or PyPI packages.
91- See https://github.com/package-url for details.
94+ ``Package URL `` is a fundamental effort to create informative identifiers for
95+ software packages, such as Debian, RPM, npm, Maven, or PyPI packages.
96+ See https://github.com/package-url for more details.
9297
93- Some of the ``DiscoveredPackage `` interesting attributes are :
98+ The following are some of the ``DiscoveredPackage `` attributes:
9499
95- - type, name, version (all Package URL attributes)
96- - homepage_url, download_url and other URLs
97- - checksums ( such as SHA1, MD5)
98- - copyright , license_expression, declared_license
100+ - A type, name, version (all Package URL attributes)
101+ - A homepage_url, download_url, and other URLs
102+ - Checksums, such as SHA1, MD5
103+ - Copyright , license_expression, and declared_license
99104
100105.. note ::
101- In general the attributes and their names are the same that are used in
102- ` ScanCode-toolkit < https://github.com/nexB/scancode-toolkit >`_ for packages.
106+ Please note that ` ScanCode-toolkit < https://github.com/nexB/scancode-toolkit >`_
107+ use the same attributes and attribute names for packages.
0 commit comments