Merge branch 'main' into 1840-load-resource-details

tdruez · tdruez · commit 98ab0c364138 · 2025-10-27T09:27:19.000+04:00
diff --git a/CHANGELOG.rst b/CHANGELOG.rst
@@ -1,12 +1,50 @@
 Changelog
 =========
 
-v35.4.1 (unreleased)
+v35.4.1 (2025-10-24)
 --------------------
 
 - Add ability to download all output results formats as a zipfile for a given project.
   https://github.com/aboutcode-org/scancode.io/issues/1880
 
+- Add support for tagging inputs in the run management command
+  Add ability to skip the SQLite auto db in combined_run
+  Add documentation to leverage PostgreSQL service
+  https://github.com/aboutcode-org/scancode.io/pull/1916
+
+- Refine d2d pipeline for scala and kotlin.
+  https://github.com/aboutcode-org/scancode.io/issues/1898
+
+- Add utilities to create/init FederatedCode data repo.
+  https://github.com/aboutcode-org/scancode.io/issues/1896
+
+- Add a verify-project CLI management command.
+  https://github.com/aboutcode-org/scancode.io/issues/1903
+
+- Add support for multiple inputs in the run management command.
+  https://github.com/aboutcode-org/scancode.io/issues/1916
+
+- Add the django-htmx app to the stack.
+  https://github.com/aboutcode-org/scancode.io/issues/1917
+
+- Adjust the resource tree view table rendering.
+  https://github.com/aboutcode-org/scancode.io/issues/1840
+
+- Add ".." navigation option in table to navigate to parent resource.
+  https://github.com/aboutcode-org/scancode.io/issues/1869
+
+- Add ability to download all output results formats.
+  https://github.com/aboutcode-org/scancode.io/issues/1880
+
+- Update Java D2D Pipeline to Include Checksum Mapped Sources for Accurate Java Mapping.
+  https://github.com/aboutcode-org/scancode.io/issues/1870
+
+- Auto-detect pipeline from provided input.
+  https://github.com/aboutcode-org/scancode.io/issues/1883
+
+- Migrate SCA workflows verification to new verify-project management command.
+  https://github.com/aboutcode-org/scancode.io/issues/1902
+
 v35.4.0 (2025-09-30)
 --------------------
 
diff --git a/docs/quickstart.rst b/docs/quickstart.rst
@@ -3,8 +3,8 @@
 QuickStart
 ==========
 
-Run a Scan (no installation required!)
---------------------------------------
+Run a Local Directory Scan (no installation required!)
+------------------------------------------------------
 
 The **fastest way** to get started and **scan a codebase** —
 **no installation needed** — is by using the latest
@@ -52,8 +52,120 @@ See the :ref:`RUN command <cli_run>` section for more details on this command.
 .. note::
     Not sure which pipeline to use? Check out :ref:`faq_which_pipeline`.
 
-Next Step: Local Installation
------------------------------
+Run a Remote Package Scan
+-------------------------
+
+Let's look at another example — this time scanning a **remote package archive** by
+providing its **download URL**:
+
+.. code-block:: bash
+
+    docker run --rm \
+      ghcr.io/aboutcode-org/scancode.io:latest \
+      run scan_single_package https://github.com/aboutcode-org/python-inspector/archive/refs/tags/v0.14.4.zip \
+      > results.json
+
+Let's break down what's happening here:
+
+- ``docker run --rm``
+  Runs a temporary container that is automatically removed after the scan completes.
+
+- ``ghcr.io/aboutcode-org/scancode.io:latest``
+  Uses the latest ScanCode.io image from GitHub Container Registry.
+
+- ``run scan_single_package <URL>``
+  Executes the ``scan_single_package`` pipeline, automatically fetching and analyzing
+  the package archive from the provided URL.
+
+- ``> results.json``
+  Writes the scan results to a local ``results.json`` file.
+
+Notice that the ``-v "$(pwd)":/codedrop`` option is **not required** in this case
+because the input is downloaded directly from the provided URL, rather than coming
+from your local filesystem.
+
+The result? A **complete scan of a remote package archive — no setup, one command!**
+
+Use PostgreSQL for Better Performance
+-------------------------------------
+
+By default, ScanCode.io uses a **temporary SQLite database** for simplicity.
+While this works well for quick scans, it has a few limitations — such as
+**no multiprocessing** and slower performance on large codebases.
+
+For improved speed and scalability, you can run your pipelines using a
+**PostgreSQL database** instead.
+
+Start a PostgreSQL Database Service
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+First, start a PostgreSQL container in the background:
+
+.. code-block:: bash
+
+    docker run -d \
+      --name scancodeio-run-db \
+      -e POSTGRES_DB=scancodeio \
+      -e POSTGRES_USER=scancodeio \
+      -e POSTGRES_PASSWORD=scancodeio \
+      -e POSTGRES_INITDB_ARGS="--encoding=UTF-8 --lc-collate=en_US.UTF-8 --lc-ctype=en_US.UTF-8" \
+      -v scancodeio_pgdata:/var/lib/postgresql/data \
+      -p 5432:5432 \
+      postgres:17
+
+This command starts a new PostgreSQL service named ``scancodeio-run-db`` and stores its
+data in a named Docker volume called ``scancodeio_pgdata``.
+
+.. note::
+    You can stop and remove the PostgreSQL service once you are done using:
+
+    .. code-block:: bash
+
+        docker rm -f scancodeio-run-db
+
+.. tip::
+    The named volume ``scancodeio_pgdata`` ensures that your database data
+    **persists across runs**.
+    You can remove it later with ``docker volume rm scancodeio_pgdata`` if needed.
+
+Run a Docker Image Analysis Using PostgreSQL
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+Once PostgreSQL is running, you can start a ScanCode.io pipeline
+using the same Docker image, connecting it to the PostgreSQL database container:
+
+.. code-block:: bash
+
+    docker run --rm \
+      --network host \
+      -e SCANCODEIO_NO_AUTO_DB=1 \
+      ghcr.io/aboutcode-org/scancode.io:latest \
+      run analyze_docker_image docker://alpine:3.22.1 \
+      > results.json
+
+Here’s what’s happening:
+
+- ``--network host``
+  Ensures the container can connect to the PostgreSQL service running on your host.
+
+- ``-e SCANCODEIO_NO_AUTO_DB=1``
+  Tells ScanCode.io **not** to create a temporary SQLite database, and instead use
+  the configured PostgreSQL connection defined in its default settings.
+
+- ``ghcr.io/aboutcode-org/scancode.io:latest``
+  Uses the latest ScanCode.io image from GitHub Container Registry.
+
+- ``run analyze_docker_image docker://alpine:3.22.1``
+  Runs the ``analyze_docker_image`` pipeline, scanning the given Docker image.
+
+- ``> results.json``
+  Saves the scan results to a local ``results.json`` file.
+
+The result? A **faster, multiprocessing-enabled scan** backed by PostgreSQL — ideal
+for large or complex analyses.
+
+Next Step: Installation
+-----------------------
 
 Install ScanCode.io, to **unlock all features**:
 
diff --git a/pyproject.toml b/pyproject.toml
@@ -4,7 +4,7 @@ build-backend = "setuptools.build_meta"
 
 [project]
 name = "scancodeio"
-version = "35.4.0"
+version = "35.4.1"
 description = "Automate software composition analysis pipelines"
 readme = "README.rst"
 requires-python = ">=3.10,<3.14"
diff --git a/scancodeio/__init__.py b/scancodeio/__init__.py
@@ -28,7 +28,7 @@
 
 import git
 
-VERSION = "35.4.0"
+VERSION = "35.4.1"
 
 PROJECT_DIR = Path(__file__).resolve().parent
 ROOT_DIR = PROJECT_DIR.parent
@@ -106,6 +106,9 @@ def combined_run():
     configuration.
     It combines the creation, execution, and result retrieval of the project into a
     single process.
+
+    Set SCANCODEIO_NO_AUTO_DB=1 to use the database configuration from the settings
+    instead of SQLite.
     """
     from django.core.checks.security.base import SECRET_KEY_INSECURE_PREFIX
     from django.core.management import execute_from_command_line
@@ -114,10 +117,12 @@ def combined_run():
     os.environ.setdefault("DJANGO_SETTINGS_MODULE", "scancodeio.settings")
     secret_key = SECRET_KEY_INSECURE_PREFIX + get_random_secret_key()
     os.environ.setdefault("SECRET_KEY", secret_key)
-    os.environ.setdefault("SCANCODEIO_DB_ENGINE", "django.db.backends.sqlite3")
-    os.environ.setdefault("SCANCODEIO_DB_NAME", "scancodeio.sqlite3")
-    # Disable multiprocessing
-    os.environ.setdefault("SCANCODEIO_PROCESSES", "0")
+
+    # Default to SQLite unless SCANCODEIO_NO_AUTO_DB is provided
+    if not os.getenv("SCANCODEIO_NO_AUTO_DB"):
+        os.environ.setdefault("SCANCODEIO_DB_ENGINE", "django.db.backends.sqlite3")
+        os.environ.setdefault("SCANCODEIO_DB_NAME", "scancodeio.sqlite3")
+        os.environ.setdefault("SCANCODEIO_PROCESSES", "0")  # Disable multiprocessing
 
     sys.argv.insert(1, "run")
     execute_from_command_line(sys.argv)
diff --git a/scanpipe/management/commands/__init__.py b/scanpipe/management/commands/__init__.py
@@ -284,20 +284,23 @@ def validate_pipelines(pipelines_data):
     return pipelines_data
 
 
-def extract_tag_from_input_files(input_files):
+def extract_tag_from_input_file(file_location):
     """
-    Add support for the ":tag" suffix in file location.
+    Parse a file location with optional tag suffix.
 
     For example: "/path/to/file.zip:tag"
     """
-    input_files_data = {}
-    for file in input_files:
-        if ":" in file:
-            key, value = file.split(":", maxsplit=1)
-            input_files_data.update({key: value})
-        else:
-            input_files_data.update({file: ""})
-    return input_files_data
+    if ":" in file_location:
+        cleaned_location, tag = file_location.split(":", maxsplit=1)
+        return cleaned_location, tag
+    return file_location, ""
+
+
+def extract_tag_from_input_files(input_files):
+    """Parse multiple file locations with optional tag suffixes."""
+    return dict(
+        extract_tag_from_input_file(file_location) for file_location in input_files
+    )
 
 
 def validate_input_files(input_files):
diff --git a/scanpipe/management/commands/run.py b/scanpipe/management/commands/run.py
@@ -20,13 +20,15 @@
 # ScanCode.io is a free software code scanning tool from nexB Inc. and others.
 # Visit https://github.com/aboutcode-org/scancode.io for support and download.
 
+from collections import defaultdict
 from pathlib import Path
 
 from django.core.management import call_command
 from django.core.management.base import BaseCommand
 from django.core.management.base import CommandError
 from django.utils.crypto import get_random_string
 
+from scanpipe.management.commands import extract_tag_from_input_file
 from scanpipe.pipes.fetch import SCHEME_TO_FETCHER_MAPPING
 
 
@@ -42,12 +44,16 @@ def add_arguments(self, parser):
             help=(
                 "One or more pipeline to run. "
                 "The pipelines executed based on their given order. "
-                'Groups can be provided using the "pipeline_name:option1,option2"'
-                " syntax."
+                'Groups can be provided using the "pipeline_name:option1,option2" '
+                "syntax."
             ),
         )
         parser.add_argument(
-            "input_location", help="Input location: file, directory, and URL supported."
+            "input_location",
+            help=(
+                "Input location: file, directory, and URL supported."
+                'Multiple values can be provided using the "input1,input2" syntax.'
+            ),
         )
         parser.add_argument("--project", required=False, help="Project name.")
         parser.add_argument(
@@ -68,22 +74,40 @@ def handle(self, *args, **options):
             "pipeline": pipelines,
             "execute": True,
             "verbosity": 0,
+            **self.get_input_options(input_location),
         }
 
-        if input_location.startswith(tuple(SCHEME_TO_FETCHER_MAPPING.keys())):
-            create_project_options["input_urls"] = [input_location]
-        else:
-            input_path = Path(input_location)
-            if not input_path.exists():
-                raise CommandError(f"{input_location} not found.")
-            if input_path.is_file():
-                create_project_options["input_files"] = [input_location]
-            else:
-                create_project_options["copy_codebase"] = input_location
-
         # Run the database migrations in case the database is not created or outdated.
         call_command("migrate", verbosity=0, interactive=False)
         # Create a project with proper inputs and execute the pipeline(s)
         call_command("create-project", project_name, **create_project_options)
         # Print the results for the specified format on stdout
         call_command("output", project=project_name, format=[output_format], print=True)
+
+    @staticmethod
+    def get_input_options(input_location):
+        """
+        Parse a comma-separated list of input locations and convert them into options
+        for the `create-project` command.
+        """
+        input_options = defaultdict(list)
+
+        for location in input_location.split(","):
+            if location.startswith(tuple(SCHEME_TO_FETCHER_MAPPING.keys())):
+                input_options["input_urls"].append(location)
+
+            else:
+                cleaned_location, _ = extract_tag_from_input_file(location)
+                input_path = Path(cleaned_location)
+                if not input_path.exists():
+                    raise CommandError(f"{location} not found.")
+                if input_path.is_file():
+                    input_options["input_files"].append(location)
+                else:
+                    if input_options["copy_codebase"]:
+                        raise CommandError(
+                            "Only one codebase directory can be provided as input."
+                        )
+                    input_options["copy_codebase"] = location
+
+        return input_options
diff --git a/scanpipe/tests/test_commands.py b/scanpipe/tests/test_commands.py