Skip to content

Commit 98ab0c3

Browse files
committed
Merge branch 'main' into 1840-load-resource-details
2 parents f960c82 + eb8d4bb commit 98ab0c3

File tree

7 files changed

+277
-35
lines changed

7 files changed

+277
-35
lines changed

CHANGELOG.rst

Lines changed: 39 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,12 +1,50 @@
11
Changelog
22
=========
33

4-
v35.4.1 (unreleased)
4+
v35.4.1 (2025-10-24)
55
--------------------
66

77
- Add ability to download all output results formats as a zipfile for a given project.
88
https://github.com/aboutcode-org/scancode.io/issues/1880
99

10+
- Add support for tagging inputs in the run management command
11+
Add ability to skip the SQLite auto db in combined_run
12+
Add documentation to leverage PostgreSQL service
13+
https://github.com/aboutcode-org/scancode.io/pull/1916
14+
15+
- Refine d2d pipeline for scala and kotlin.
16+
https://github.com/aboutcode-org/scancode.io/issues/1898
17+
18+
- Add utilities to create/init FederatedCode data repo.
19+
https://github.com/aboutcode-org/scancode.io/issues/1896
20+
21+
- Add a verify-project CLI management command.
22+
https://github.com/aboutcode-org/scancode.io/issues/1903
23+
24+
- Add support for multiple inputs in the run management command.
25+
https://github.com/aboutcode-org/scancode.io/issues/1916
26+
27+
- Add the django-htmx app to the stack.
28+
https://github.com/aboutcode-org/scancode.io/issues/1917
29+
30+
- Adjust the resource tree view table rendering.
31+
https://github.com/aboutcode-org/scancode.io/issues/1840
32+
33+
- Add ".." navigation option in table to navigate to parent resource.
34+
https://github.com/aboutcode-org/scancode.io/issues/1869
35+
36+
- Add ability to download all output results formats.
37+
https://github.com/aboutcode-org/scancode.io/issues/1880
38+
39+
- Update Java D2D Pipeline to Include Checksum Mapped Sources for Accurate Java Mapping.
40+
https://github.com/aboutcode-org/scancode.io/issues/1870
41+
42+
- Auto-detect pipeline from provided input.
43+
https://github.com/aboutcode-org/scancode.io/issues/1883
44+
45+
- Migrate SCA workflows verification to new verify-project management command.
46+
https://github.com/aboutcode-org/scancode.io/issues/1902
47+
1048
v35.4.0 (2025-09-30)
1149
--------------------
1250

docs/quickstart.rst

Lines changed: 116 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -3,8 +3,8 @@
33
QuickStart
44
==========
55

6-
Run a Scan (no installation required!)
7-
--------------------------------------
6+
Run a Local Directory Scan (no installation required!)
7+
------------------------------------------------------
88

99
The **fastest way** to get started and **scan a codebase** —
1010
**no installation needed** — is by using the latest
@@ -52,8 +52,120 @@ See the :ref:`RUN command <cli_run>` section for more details on this command.
5252
.. note::
5353
Not sure which pipeline to use? Check out :ref:`faq_which_pipeline`.
5454

55-
Next Step: Local Installation
56-
-----------------------------
55+
Run a Remote Package Scan
56+
-------------------------
57+
58+
Let's look at another example — this time scanning a **remote package archive** by
59+
providing its **download URL**:
60+
61+
.. code-block:: bash
62+
63+
docker run --rm \
64+
ghcr.io/aboutcode-org/scancode.io:latest \
65+
run scan_single_package https://github.com/aboutcode-org/python-inspector/archive/refs/tags/v0.14.4.zip \
66+
> results.json
67+
68+
Let's break down what's happening here:
69+
70+
- ``docker run --rm``
71+
Runs a temporary container that is automatically removed after the scan completes.
72+
73+
- ``ghcr.io/aboutcode-org/scancode.io:latest``
74+
Uses the latest ScanCode.io image from GitHub Container Registry.
75+
76+
- ``run scan_single_package <URL>``
77+
Executes the ``scan_single_package`` pipeline, automatically fetching and analyzing
78+
the package archive from the provided URL.
79+
80+
- ``> results.json``
81+
Writes the scan results to a local ``results.json`` file.
82+
83+
Notice that the ``-v "$(pwd)":/codedrop`` option is **not required** in this case
84+
because the input is downloaded directly from the provided URL, rather than coming
85+
from your local filesystem.
86+
87+
The result? A **complete scan of a remote package archive — no setup, one command!**
88+
89+
Use PostgreSQL for Better Performance
90+
-------------------------------------
91+
92+
By default, ScanCode.io uses a **temporary SQLite database** for simplicity.
93+
While this works well for quick scans, it has a few limitations — such as
94+
**no multiprocessing** and slower performance on large codebases.
95+
96+
For improved speed and scalability, you can run your pipelines using a
97+
**PostgreSQL database** instead.
98+
99+
Start a PostgreSQL Database Service
100+
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
101+
102+
First, start a PostgreSQL container in the background:
103+
104+
.. code-block:: bash
105+
106+
docker run -d \
107+
--name scancodeio-run-db \
108+
-e POSTGRES_DB=scancodeio \
109+
-e POSTGRES_USER=scancodeio \
110+
-e POSTGRES_PASSWORD=scancodeio \
111+
-e POSTGRES_INITDB_ARGS="--encoding=UTF-8 --lc-collate=en_US.UTF-8 --lc-ctype=en_US.UTF-8" \
112+
-v scancodeio_pgdata:/var/lib/postgresql/data \
113+
-p 5432:5432 \
114+
postgres:17
115+
116+
This command starts a new PostgreSQL service named ``scancodeio-run-db`` and stores its
117+
data in a named Docker volume called ``scancodeio_pgdata``.
118+
119+
.. note::
120+
You can stop and remove the PostgreSQL service once you are done using:
121+
122+
.. code-block:: bash
123+
124+
docker rm -f scancodeio-run-db
125+
126+
.. tip::
127+
The named volume ``scancodeio_pgdata`` ensures that your database data
128+
**persists across runs**.
129+
You can remove it later with ``docker volume rm scancodeio_pgdata`` if needed.
130+
131+
Run a Docker Image Analysis Using PostgreSQL
132+
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
133+
134+
Once PostgreSQL is running, you can start a ScanCode.io pipeline
135+
using the same Docker image, connecting it to the PostgreSQL database container:
136+
137+
.. code-block:: bash
138+
139+
docker run --rm \
140+
--network host \
141+
-e SCANCODEIO_NO_AUTO_DB=1 \
142+
ghcr.io/aboutcode-org/scancode.io:latest \
143+
run analyze_docker_image docker://alpine:3.22.1 \
144+
> results.json
145+
146+
Here’s what’s happening:
147+
148+
- ``--network host``
149+
Ensures the container can connect to the PostgreSQL service running on your host.
150+
151+
- ``-e SCANCODEIO_NO_AUTO_DB=1``
152+
Tells ScanCode.io **not** to create a temporary SQLite database, and instead use
153+
the configured PostgreSQL connection defined in its default settings.
154+
155+
- ``ghcr.io/aboutcode-org/scancode.io:latest``
156+
Uses the latest ScanCode.io image from GitHub Container Registry.
157+
158+
- ``run analyze_docker_image docker://alpine:3.22.1``
159+
Runs the ``analyze_docker_image`` pipeline, scanning the given Docker image.
160+
161+
- ``> results.json``
162+
Saves the scan results to a local ``results.json`` file.
163+
164+
The result? A **faster, multiprocessing-enabled scan** backed by PostgreSQL — ideal
165+
for large or complex analyses.
166+
167+
Next Step: Installation
168+
-----------------------
57169

58170
Install ScanCode.io, to **unlock all features**:
59171

pyproject.toml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@ build-backend = "setuptools.build_meta"
44

55
[project]
66
name = "scancodeio"
7-
version = "35.4.0"
7+
version = "35.4.1"
88
description = "Automate software composition analysis pipelines"
99
readme = "README.rst"
1010
requires-python = ">=3.10,<3.14"

scancodeio/__init__.py

Lines changed: 10 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -28,7 +28,7 @@
2828

2929
import git
3030

31-
VERSION = "35.4.0"
31+
VERSION = "35.4.1"
3232

3333
PROJECT_DIR = Path(__file__).resolve().parent
3434
ROOT_DIR = PROJECT_DIR.parent
@@ -106,6 +106,9 @@ def combined_run():
106106
configuration.
107107
It combines the creation, execution, and result retrieval of the project into a
108108
single process.
109+
110+
Set SCANCODEIO_NO_AUTO_DB=1 to use the database configuration from the settings
111+
instead of SQLite.
109112
"""
110113
from django.core.checks.security.base import SECRET_KEY_INSECURE_PREFIX
111114
from django.core.management import execute_from_command_line
@@ -114,10 +117,12 @@ def combined_run():
114117
os.environ.setdefault("DJANGO_SETTINGS_MODULE", "scancodeio.settings")
115118
secret_key = SECRET_KEY_INSECURE_PREFIX + get_random_secret_key()
116119
os.environ.setdefault("SECRET_KEY", secret_key)
117-
os.environ.setdefault("SCANCODEIO_DB_ENGINE", "django.db.backends.sqlite3")
118-
os.environ.setdefault("SCANCODEIO_DB_NAME", "scancodeio.sqlite3")
119-
# Disable multiprocessing
120-
os.environ.setdefault("SCANCODEIO_PROCESSES", "0")
120+
121+
# Default to SQLite unless SCANCODEIO_NO_AUTO_DB is provided
122+
if not os.getenv("SCANCODEIO_NO_AUTO_DB"):
123+
os.environ.setdefault("SCANCODEIO_DB_ENGINE", "django.db.backends.sqlite3")
124+
os.environ.setdefault("SCANCODEIO_DB_NAME", "scancodeio.sqlite3")
125+
os.environ.setdefault("SCANCODEIO_PROCESSES", "0") # Disable multiprocessing
121126

122127
sys.argv.insert(1, "run")
123128
execute_from_command_line(sys.argv)

scanpipe/management/commands/__init__.py

Lines changed: 13 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -284,20 +284,23 @@ def validate_pipelines(pipelines_data):
284284
return pipelines_data
285285

286286

287-
def extract_tag_from_input_files(input_files):
287+
def extract_tag_from_input_file(file_location):
288288
"""
289-
Add support for the ":tag" suffix in file location.
289+
Parse a file location with optional tag suffix.
290290
291291
For example: "/path/to/file.zip:tag"
292292
"""
293-
input_files_data = {}
294-
for file in input_files:
295-
if ":" in file:
296-
key, value = file.split(":", maxsplit=1)
297-
input_files_data.update({key: value})
298-
else:
299-
input_files_data.update({file: ""})
300-
return input_files_data
293+
if ":" in file_location:
294+
cleaned_location, tag = file_location.split(":", maxsplit=1)
295+
return cleaned_location, tag
296+
return file_location, ""
297+
298+
299+
def extract_tag_from_input_files(input_files):
300+
"""Parse multiple file locations with optional tag suffixes."""
301+
return dict(
302+
extract_tag_from_input_file(file_location) for file_location in input_files
303+
)
301304

302305

303306
def validate_input_files(input_files):

scanpipe/management/commands/run.py

Lines changed: 38 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -20,13 +20,15 @@
2020
# ScanCode.io is a free software code scanning tool from nexB Inc. and others.
2121
# Visit https://github.com/aboutcode-org/scancode.io for support and download.
2222

23+
from collections import defaultdict
2324
from pathlib import Path
2425

2526
from django.core.management import call_command
2627
from django.core.management.base import BaseCommand
2728
from django.core.management.base import CommandError
2829
from django.utils.crypto import get_random_string
2930

31+
from scanpipe.management.commands import extract_tag_from_input_file
3032
from scanpipe.pipes.fetch import SCHEME_TO_FETCHER_MAPPING
3133

3234

@@ -42,12 +44,16 @@ def add_arguments(self, parser):
4244
help=(
4345
"One or more pipeline to run. "
4446
"The pipelines executed based on their given order. "
45-
'Groups can be provided using the "pipeline_name:option1,option2"'
46-
" syntax."
47+
'Groups can be provided using the "pipeline_name:option1,option2" '
48+
"syntax."
4749
),
4850
)
4951
parser.add_argument(
50-
"input_location", help="Input location: file, directory, and URL supported."
52+
"input_location",
53+
help=(
54+
"Input location: file, directory, and URL supported."
55+
'Multiple values can be provided using the "input1,input2" syntax.'
56+
),
5157
)
5258
parser.add_argument("--project", required=False, help="Project name.")
5359
parser.add_argument(
@@ -68,22 +74,40 @@ def handle(self, *args, **options):
6874
"pipeline": pipelines,
6975
"execute": True,
7076
"verbosity": 0,
77+
**self.get_input_options(input_location),
7178
}
7279

73-
if input_location.startswith(tuple(SCHEME_TO_FETCHER_MAPPING.keys())):
74-
create_project_options["input_urls"] = [input_location]
75-
else:
76-
input_path = Path(input_location)
77-
if not input_path.exists():
78-
raise CommandError(f"{input_location} not found.")
79-
if input_path.is_file():
80-
create_project_options["input_files"] = [input_location]
81-
else:
82-
create_project_options["copy_codebase"] = input_location
83-
8480
# Run the database migrations in case the database is not created or outdated.
8581
call_command("migrate", verbosity=0, interactive=False)
8682
# Create a project with proper inputs and execute the pipeline(s)
8783
call_command("create-project", project_name, **create_project_options)
8884
# Print the results for the specified format on stdout
8985
call_command("output", project=project_name, format=[output_format], print=True)
86+
87+
@staticmethod
88+
def get_input_options(input_location):
89+
"""
90+
Parse a comma-separated list of input locations and convert them into options
91+
for the `create-project` command.
92+
"""
93+
input_options = defaultdict(list)
94+
95+
for location in input_location.split(","):
96+
if location.startswith(tuple(SCHEME_TO_FETCHER_MAPPING.keys())):
97+
input_options["input_urls"].append(location)
98+
99+
else:
100+
cleaned_location, _ = extract_tag_from_input_file(location)
101+
input_path = Path(cleaned_location)
102+
if not input_path.exists():
103+
raise CommandError(f"{location} not found.")
104+
if input_path.is_file():
105+
input_options["input_files"].append(location)
106+
else:
107+
if input_options["copy_codebase"]:
108+
raise CommandError(
109+
"Only one codebase directory can be provided as input."
110+
)
111+
input_options["copy_codebase"] = location
112+
113+
return input_options

0 commit comments

Comments
 (0)