Skip to content

Use python standard library to replace bzip2 dependency#1119

Open
j-rivero wants to merge 1 commit intomasterfrom
jrivero/python_instead_of_bzip2
Open

Use python standard library to replace bzip2 dependency#1119
j-rivero wants to merge 1 commit intomasterfrom
jrivero/python_instead_of_bzip2

Conversation

@j-rivero
Copy link
Contributor

This pull request updates the way install space compression is performed in the CI job template. The main change is switching from a shell-based tar command to a Python-based approach using the tarfile module. This should save us a dependency on bzip2 for the Jenkins nodes. See #1118

Mainly using arcname to do the conversion from ws/install_isolated to ros-${version}. Not fully sure if there is a use case that is not be covered by the IA generated regexp.

Local generation:

   @@ -418,2 +418 @@                                                                                                                                                                                                                                                                                                                  
         -cd $WORKSPACE                                                                                                                                                                                                                                                                                                                     
         -tar -cjf ros2-global-linux-jammy-amd64-ci.tar.bz2 -C ws --transform "s/^install_isolated/ros2-linux/" install_isolated                                                                                                                                                                                                            
         +cd $WORKSPACE && python3 -c "import tarfile; t = tarfile.open('ros2-global-linux-jammy-amd64-ci.tar.bz2', 'w:bz2'); t.add('ws/install_isolated', arcname='ros2-linux'); t.close()"

Tested https://build.osrfoundation.org/job/__test_ci__colcon_any-manual_ubuntu_noble_amd64/3/ . Tar seems like:

~/Downloads/ros2-global-linux-noble-amd64-ci via 🐏 25GiB/31GiB | 964kiB/8GiB ❯ tree -L 2
.
└── ros2-linux
    ├── COLCON_IGNORE
    ├── gz-cmake
    ├── gz-common
    ├── gz-fuel_tools
    ├── gz-gui
    ├── gz-math
    ├── gz-msgs
    ├── gz-physics
    ├── gz-plugin
    ├── gz-rendering
    ├── gz-sensors
    ├── gz-sim
    ├── gz-tools2
    ├── gz-transport
    ├── gz-utils
    ├── local_setup.sh
    ├── _local_setup_util_sh.py
    ├── sdformat
    └── setup.sh

17 directories, 4 files

Signed-off-by: Jose Luis Rivero <jrivero@honurobotics.com>
Generated-by: Claude Opus 4.5
@j-rivero
Copy link
Contributor Author

We could potentially add the sha256 to Python with something like:

--- a/ros_buildfarm/templates/ci/ci_job.xml.em
+++ b/ros_buildfarm/templates/ci/ci_job.xml.em
@@ -373,17 +373,18 @@ parameters = [
     script='\n'.join([
         'echo "# BEGIN SECTION: Compress install space"',
         'cd $WORKSPACE && python3 -c "'
+        'import hashlib; '
         'import tarfile; '
-        't = tarfile.open('
-        '\'ros%d-%s-linux-%s-%s-ci.tar.bz2\', '
-        '\'w:bz2\'); '
+        'archive = \'ros%d-%s-linux-%s-%s-ci.tar.bz2\'; '
+        't = tarfile.open(archive, \'w:bz2\'); '
         't.add(\'ws/install_isolated\', arcname=\'ros%d-linux\'); '
-        't.close()"' % (
+        't.close(); '
+        'h = hashlib.sha256(); '
+        'h.update(open(archive, \'rb\').read()); '
+        'open(archive.replace(\'.tar.bz2\', \'-CHECKSUM\'), \'w\').write(h.hexdigest() + \' *\' + archive + \'\\\\n\')"' % (
           ros_version, rosdistro_name or 'global', os_code_name, arch,
           ros_version
         ),
-        'sha256sum -b ros%d-%s-linux-%s-%s-ci.tar.bz2' % (ros_version, rosdistro_name or 'global', os_code_name, arch) +
-        ' > ros%d-%s-linux-%s-%s-ci-CHECKSUM' % (ros_version, rosdistro_name or 'global', os_code_name, arch),
         'cd -',
         'echo "# END SECTION"',
     ]),
~

but not sure if that is going to be a performance improvement, python needs to close the file and call that open through hashlib again.

@cottsay
Copy link
Member

cottsay commented Jan 20, 2026

I took a quick look at the Python docs and didn't find what I was hoping to see.

The idea is to open the tar file object and wrap that file object in an object that computes the sha256 as data is written to the file object. The tarfile.open function can be passed that file object instead of the file path.

I'm confident that such code can be written, but I'm disappointed that Python doesn't make it easier to combine file streams and buffered readers/writers with hashlib.

In any case, even if we don't optimize the computation as I had in mind, moving the sha256 computation into Python also drops our dependency on the sha256sum tool. Assuming that tool is present just like we did with bzip2 is why we're having this conversation to begin with.

@cottsay
Copy link
Member

cottsay commented Jan 20, 2026

An additional thought is that we might want to move this Python code into a new script in ros_buildfarm rather than invoking inline Python. create_workspace_archive.py or something.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants