Skip to content

Conversation

@TG1999
Copy link
Contributor

@TG1999 TG1999 commented Oct 20, 2025

def get_free_port():
"""Find a free host port for Postgres."""
with socket.socket(socket.AF_INET, socket.SOCK_STREAM) as s:
s.bind(("", 0))

Check warning

Code scanning / CodeQL

Binding a socket to all network interfaces Medium

'' binds a socket to all interfaces.

Copilot Autofix

AI 4 days ago

To fix this issue, you should replace the empty string '' used in s.bind(("", 0)) with the loopback interface IP '127.0.0.1'. This will cause the socket to bind only to localhost, reducing exposure by ensuring the port is only allocated on the local interface. The rest of the function can remain unchanged since the purpose is solely to obtain an available port number. The change should be made in the get_free_port function, specifically at line 96. No additional imports or method modifications are needed.


Suggested changeset 1
etc/scripts/run_d2d_scio.py

Autofix patch

Autofix patch
Run the following command in your local git repository to apply this patch
cat << 'EOF' | git apply
diff --git a/etc/scripts/run_d2d_scio.py b/etc/scripts/run_d2d_scio.py
--- a/etc/scripts/run_d2d_scio.py
+++ b/etc/scripts/run_d2d_scio.py
@@ -93,7 +93,7 @@
 def get_free_port():
     """Find a free host port for Postgres."""
     with socket.socket(socket.AF_INET, socket.SOCK_STREAM) as s:
-        s.bind(("", 0))
+        s.bind(("127.0.0.1", 0))
         return s.getsockname()[1]
 
 
EOF
@@ -93,7 +93,7 @@
def get_free_port():
"""Find a free host port for Postgres."""
with socket.socket(socket.AF_INET, socket.SOCK_STREAM) as s:
s.bind(("", 0))
s.bind(("127.0.0.1", 0))
return s.getsockname()[1]


Copilot is powered by AI and may make mistakes. Always verify output.
@TG1999
Copy link
Contributor Author

TG1999 commented Oct 20, 2025

root@tg1999-Precision-3561:/home/tg1999/Desktop/scancode.io# python3 etc/scripts/run_d2d_scio.py --input-file ./project_data/from-from-intbit
set.tar.gz:from --input-file ./project_data/tointbitset.whl:to --option Python --output res1.json
Files copied to: /home/tg1999/Desktop/scancode.io/d2d
Using Postgres host port: 59533
d2460ad0155ddaadb7dca767180e651f334c9bdfe9d35d37745b46ba6226cceb
Waiting for Postgres to be ready...
Postgres is ready.
Running ScanCode pipeline:
Running: /usr/bin/docker run --rm -v /home/tg1999/Desktop/scancode.io/d2d:/code -e DATABASE_URL=postgresql://scancode:[email protected]:59533/scancode --network host ghcr.io/aboutcode-org/scancode.io:latest sh -c scanpipe create-project scanpipe_a04dde0b --input-file /code/from-from-intbitset.tar.gz:from --input-file /code/tointbitset.whl:to --pipeline map_deploy_to_develop:Python, && scanpipe execute --project scanpipe_a04dde0b
Project scanpipe_a04dde0b created with work directory /opt/scancodeio/var/projects/scanpipe_a04dde0b-34ffa9c8
Files copied to the project inputs directory:
- from-from-intbitset.tar.gz
- tointbitset.whl
INFO Run[a75e6434-8b22-4a61-a92e-774dd7521fe1] Enter `execute_pipeline_task` Run.pk=a75e6434-8b22-4a61-a92e-774dd7521fe1
Start the map_deploy_to_develop pipeline execution...
INFO Run[a75e6434-8b22-4a61-a92e-774dd7521fe1] Run pipeline: "map_deploy_to_develop" on project: "scanpipe_a04dde0b"
INFO 
Updating directory fingerprints for 2 directories.
INFO Updating directory DB objects...
INFO Scan 0 codebase resources with scan_file
INFO Starting ProcessPoolExecutor with 15 max_workers
INFO Scan 1 codebase resources with scan_file
INFO Starting ProcessPoolExecutor with 15 max_workers
INFO Project scanpipe_a04dde0b collect_license_detections:
INFO   Processing: from/from-intbitset/intbitset.pyx for licenses
INFO Run[a75e6434-8b22-4a61-a92e-774dd7521fe1] Update Run instance with exitcode, output, and end_date
map_deploy_to_develop successfully executed on project scanpipe_a04dde0b
Running: /usr/bin/docker run --rm -v /home/tg1999/Desktop/scancode.io/d2d:/code -e DATABASE_URL=postgresql://scancode:[email protected]:59533/scancode --network host ghcr.io/aboutcode-org/scancode.io:latest sh -c scanpipe output --project scanpipe_a04dde0b --format json --print
scancode_db_e6d63e

Output with the following script.

@JonoYang
Copy link
Member

For simplicity and safety, I would consider using Docker compose to handle the database service. You can create a new docker-compose.yml that has scanpipe and the database, something along the lines of:

name: scancodeio-d2d
services:
  db:
    image: docker.io/library/postgres:13
    env_file:
      - docker.env
    volumes:
      - db_data:/var/lib/postgresql/data/
    shm_size: "1gb"
    restart: always
    healthcheck:
      test: [ "CMD-SHELL", "pg_isready -U $${POSTGRES_USER} -d $${POSTGRES_DB}" ]
      interval: 10s
      timeout: 5s
      retries: 5

  worker:
    image: ghcr.io/aboutcode-org/scancode.io:latest
    env_file:
      - docker.env
    volumes:
      - .env:/opt/scancodeio/.env
      - /etc/scancodeio/:/etc/scancodeio/
      - workspace:/var/scancodeio/workspace/
    depends_on:
      - db


volumes:
  db_data:
  workspace:

You can run scanpipe commands by doing docker compose -f docker-compose.d2d.yml run worker scanpipe --help

Signed-off-by: Tushar Goel <[email protected]>
def get_free_port():
"""Find a free host port for Postgres."""
with socket.socket(socket.AF_INET, socket.SOCK_STREAM) as s:
s.bind(("", 0))

Check warning

Code scanning / CodeQL

Binding a socket to all network interfaces Medium

'' binds a socket to all interfaces.

Copilot Autofix

AI 4 days ago

General fix:
Instead of binding the test socket to all interfaces (using ''), bind it specifically to the loopback interface (using '127.0.0.1'). This restricts the socket from being accessible on external interfaces, even for the short lifetime of this test binding.

Best way to fix:
In the get_free_port() function, replace s.bind(("", 0)) with s.bind(("127.0.0.1", 0)). This changes only the test socket binding, preserving all existing behavior and functionality.

File/region to change:
File: etc/scripts/d2d/run_d2d_scio.py, line 54 (“s.bind(("", 0))”).

What is needed:

  • Edit the socket bind call.
  • No new imports or additional code changes are required.

Suggested changeset 1
etc/scripts/d2d/run_d2d_scio.py

Autofix patch

Autofix patch
Run the following command in your local git repository to apply this patch
cat << 'EOF' | git apply
diff --git a/etc/scripts/d2d/run_d2d_scio.py b/etc/scripts/d2d/run_d2d_scio.py
--- a/etc/scripts/d2d/run_d2d_scio.py
+++ b/etc/scripts/d2d/run_d2d_scio.py
@@ -51,7 +51,7 @@
 def get_free_port():
     """Find a free host port for Postgres."""
     with socket.socket(socket.AF_INET, socket.SOCK_STREAM) as s:
-        s.bind(("", 0))
+        s.bind(("127.0.0.1", 0))
         return s.getsockname()[1]
 
 
EOF
@@ -51,7 +51,7 @@
def get_free_port():
"""Find a free host port for Postgres."""
with socket.socket(socket.AF_INET, socket.SOCK_STREAM) as s:
s.bind(("", 0))
s.bind(("127.0.0.1", 0))
return s.getsockname()[1]


Copilot is powered by AI and may make mistakes. Always verify output.
@TG1999
Copy link
Contributor Author

TG1999 commented Oct 22, 2025

For simplicity and safety, I would consider using Docker compose to handle the database service. You can create a new docker-compose.yml that has scanpipe and the database, something along the lines of:

name: scancodeio-d2d
services:
  db:
    image: docker.io/library/postgres:13
    env_file:
      - docker.env
    volumes:
      - db_data:/var/lib/postgresql/data/
    shm_size: "1gb"
    restart: always
    healthcheck:
      test: [ "CMD-SHELL", "pg_isready -U $${POSTGRES_USER} -d $${POSTGRES_DB}" ]
      interval: 10s
      timeout: 5s
      retries: 5

  worker:
    image: ghcr.io/aboutcode-org/scancode.io:latest
    env_file:
      - docker.env
    volumes:
      - .env:/opt/scancodeio/.env
      - /etc/scancodeio/:/etc/scancodeio/
      - workspace:/var/scancodeio/workspace/
    depends_on:
      - db


volumes:
  db_data:
  workspace:

You can run scanpipe commands by doing docker compose -f docker-compose.d2d.yml run worker scanpipe --help

@JonoYang we have a follow up issue here for same #1913

@tdruez
Copy link
Contributor

tdruez commented Oct 22, 2025

I agree with @JonoYang, we do not want to re-invent orchestration here.
Let's use what we already have and we know is working fine.
This script could simply be a wrapper around the suggested docker-compose configuration.

Also, in case of a simple one-off pipeline run, what about using the dedicated run command?
See:


The run needs a few improvements being added in #1916

Running it would looks somthing like this:

# 1. Start a postgres service
docker run -d --name scancodeio-run-db postgres:17

# 2. Run d2d pipeline
docker run --rm \
  -v "$(pwd)":/codedrop \
  ghcr.io/aboutcode-org/scancode.io:latest \
  run map_deploy_to_develop:Python intbitset.tar.gz:from,intbitset.whl:to \
  > results.json

# Use download URLs
docker run --rm \
  ghcr.io/aboutcode-org/scancode.io:latest \
  run map_deploy_to_develop:Python https://url/intbitset.tar.gz#from,https://url/intbitset.whl#to \
  > results.json

@TG1999
Copy link
Contributor Author

TG1999 commented Oct 22, 2025

When I use above docker compose file with run command, I get this error.

root@tg1999-Precision-3561:/home/tg1999/Desktop/scancode.io# docker compose -f docker-compose.d2d.yml run worker scanpipe create-project dd --input-file ./project_data/from-from-intbitset.tar.gz:from --input-file ./project_data/tointbitset.whl:to --pipeline map_deploy_to_develop:Python && scanpipe execute --project dd
WARN[0000] Found orphan containers ([scancodeio-d2d-worker-run-739aa81c3565]) for this project. If you removed or renamed this service in your compose file, you can run this command with the --remove-orphans flag to clean it up. 
[+] Creating 1/1
 ✔ Container scancodeio-d2d-db-1  Running                                                                                               0.0s 
Traceback (most recent call last):
  File "/opt/scancodeio/.venv/lib/python3.13/site-packages/django/db/backends/base/base.py", line 278, in ensure_connection
    self.connect()
    ~~~~~~~~~~~~^^
  File "/opt/scancodeio/.venv/lib/python3.13/site-packages/django/utils/asyncio.py", line 26, in inner
    return func(*args, **kwargs)
  File "/opt/scancodeio/.venv/lib/python3.13/site-packages/django/db/backends/base/base.py", line 255, in connect
    self.connection = self.get_new_connection(conn_params)
                      ~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^
  File "/opt/scancodeio/.venv/lib/python3.13/site-packages/django/utils/asyncio.py", line 26, in inner
    return func(*args, **kwargs)
  File "/opt/scancodeio/.venv/lib/python3.13/site-packages/django/db/backends/postgresql/base.py", line 332, in get_new_connection
    connection = self.Database.connect(**conn_params)
  File "/opt/scancodeio/.venv/lib/python3.13/site-packages/psycopg/connection.py", line 118, in connect
    raise last_ex.with_traceback(None)
psycopg.errors.ConnectionTimeout: connection timeout expired

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/opt/scancodeio/.venv/bin/scanpipe", line 7, in <module>
    sys.exit(command_line())
             ~~~~~~~~~~~~^^
  File "/opt/scancodeio/scancodeio/__init__.py", line 98, in command_line
    execute_from_command_line(sys.argv)
    ~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^
  File "/opt/scancodeio/.venv/lib/python3.13/site-packages/django/core/management/__init__.py", line 442, in execute_from_command_line
    utility.execute()
    ~~~~~~~~~~~~~~~^^
  File "/opt/scancodeio/.venv/lib/python3.13/site-packages/django/core/management/__init__.py", line 436, in execute
    self.fetch_command(subcommand).run_from_argv(self.argv)
    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^
  File "/opt/scancodeio/.venv/lib/python3.13/site-packages/django/core/management/base.py", line 413, in run_from_argv
    self.execute(*args, **cmd_options)
    ~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/scancodeio/.venv/lib/python3.13/site-packages/django/core/management/base.py", line 459, in execute
    output = self.handle(*args, **options)
  File "/opt/scancodeio/scanpipe/management/commands/create-project.py", line 42, in handle
    self.create_project(
    ~~~~~~~~~~~~~~~~~~~^
        name=options["name"],
        ^^^^^^^^^^^^^^^^^^^^^
    ...<8 lines>...
        create_global_webhook=not options["no_global_webhook"],
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    )
    ^
  File "/opt/scancodeio/scanpipe/management/commands/__init__.py", line 554, in create_project
    return create_project(
        name=name,
    ...<9 lines>...
        command=self,
    )
  File "/opt/scancodeio/scanpipe/management/commands/__init__.py", line 457, in create_project
    project.full_clean(exclude=["slug"])
    ~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^
  File "/opt/scancodeio/.venv/lib/python3.13/site-packages/django/db/models/base.py", line 1636, in full_clean
    self.validate_unique(exclude=exclude)
    ~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^
  File "/opt/scancodeio/.venv/lib/python3.13/site-packages/django/db/models/base.py", line 1378, in validate_unique
    errors = self._perform_unique_checks(unique_checks)
  File "/opt/scancodeio/.venv/lib/python3.13/site-packages/django/db/models/base.py", line 1488, in _perform_unique_checks
    if qs.exists():
       ~~~~~~~~~^^
  File "/opt/scancodeio/.venv/lib/python3.13/site-packages/django/db/models/query.py", line 1288, in exists
    return self.query.has_results(using=self.db)
           ~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^
  File "/opt/scancodeio/.venv/lib/python3.13/site-packages/django/db/models/sql/query.py", line 660, in has_results
    return compiler.has_results()
           ~~~~~~~~~~~~~~~~~~~~^^
  File "/opt/scancodeio/.venv/lib/python3.13/site-packages/django/db/models/sql/compiler.py", line 1542, in has_results
    return bool(self.execute_sql(SINGLE))
                ~~~~~~~~~~~~~~~~^^^^^^^^
  File "/opt/scancodeio/.venv/lib/python3.13/site-packages/django/db/models/sql/compiler.py", line 1572, in execute_sql
    cursor = self.connection.cursor()
  File "/opt/scancodeio/.venv/lib/python3.13/site-packages/django/utils/asyncio.py", line 26, in inner
    return func(*args, **kwargs)
  File "/opt/scancodeio/.venv/lib/python3.13/site-packages/django/db/backends/base/base.py", line 319, in cursor
    return self._cursor()
           ~~~~~~~~~~~~^^
  File "/opt/scancodeio/.venv/lib/python3.13/site-packages/django/db/backends/base/base.py", line 295, in _cursor
    self.ensure_connection()
    ~~~~~~~~~~~~~~~~~~~~~~^^
  File "/opt/scancodeio/.venv/lib/python3.13/site-packages/django/utils/asyncio.py", line 26, in inner
    return func(*args, **kwargs)
  File "/opt/scancodeio/.venv/lib/python3.13/site-packages/django/db/backends/base/base.py", line 277, in ensure_connection
    with self.wrap_database_errors:
         ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/scancodeio/.venv/lib/python3.13/site-packages/django/db/utils.py", line 91, in __exit__
    raise dj_exc_value.with_traceback(traceback) from exc_value
  File "/opt/scancodeio/.venv/lib/python3.13/site-packages/django/db/backends/base/base.py", line 278, in ensure_connection
    self.connect()
    ~~~~~~~~~~~~^^
  File "/opt/scancodeio/.venv/lib/python3.13/site-packages/django/utils/asyncio.py", line 26, in inner
    return func(*args, **kwargs)
  File "/opt/scancodeio/.venv/lib/python3.13/site-packages/django/db/backends/base/base.py", line 255, in connect
    self.connection = self.get_new_connection(conn_params)
                      ~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^
  File "/opt/scancodeio/.venv/lib/python3.13/site-packages/django/utils/asyncio.py", line 26, in inner
    return func(*args, **kwargs)
  File "/opt/scancodeio/.venv/lib/python3.13/site-packages/django/db/backends/postgresql/base.py", line 332, in get_new_connection
    connection = self.Database.connect(**conn_params)
  File "/opt/scancodeio/.venv/lib/python3.13/site-packages/psycopg/connection.py", line 118, in connect
    raise last_ex.with_traceback(None)
django.db.utils.OperationalError: connection timeout expired

@TG1999
Copy link
Contributor Author

TG1999 commented Oct 23, 2025

@TG1999 TG1999 marked this pull request as draft October 23, 2025 10:50
@tdruez
Copy link
Contributor

tdruez commented Oct 24, 2025

@TG1999 #1916 merged and released https://github.com/aboutcode-org/scancode.io/releases/tag/v35.4.1

Documented at https://scancodeio.readthedocs.io/en/latest/quickstart.html#use-postgresql-for-better-performance

Pull the latest ScanCode.io Docker image

docker pull ghcr.io/aboutcode-org/scancode.io:latest

Start a PostgreSQL Database Service

docker run -d \
  --name scancodeio-run-db \
  -e POSTGRES_DB=scancodeio \
  -e POSTGRES_USER=scancodeio \
  -e POSTGRES_PASSWORD=scancodeio \
  -e POSTGRES_INITDB_ARGS="--encoding=UTF-8 --lc-collate=en_US.UTF-8 --lc-ctype=en_US.UTF-8" \
  -v scancodeio_pgdata:/var/lib/postgresql/data \
  -p 5432:5432 \
  postgres:17

Stop the service with docker rm -f scancodeio-run-db once done.

Run the map_deploy_to_develop pipeline on remote inputs

FROM_URL=https://github.com/aboutcode-org/scancode.io/raw/refs/heads/main/scanpipe/tests/data/d2d-python/from-intbitset.tar.gz
TO_URL=https://github.com/aboutcode-org/scancode.io/raw/refs/heads/main/scanpipe/tests/data/d2d-python/to-intbitset.whl

docker run --rm \
  --network host \
  -e SCANCODEIO_NO_AUTO_DB=1 \
  ghcr.io/aboutcode-org/scancode.io:latest \
  run map_deploy_to_develop ${FROM_URL}#from,${TO_URL}#to \
  > results.json

Run the map_deploy_to_develop pipeline on local inputs

docker run --rm \
  -v "$(pwd)":/codedrop \
  --network host \
  -e SCANCODEIO_NO_AUTO_DB=1 \
  ghcr.io/aboutcode-org/scancode.io:latest \
  run map_deploy_to_develop:Python intbitset.tar.gz:from,intbitset.whl:to \
  > results.json

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants