Skip to content

Comments

Avoid extracting joblib archives#729

Closed
egibs wants to merge 2 commits intochainguard-dev:mainfrom
egibs:ignore-joblib-archives
Closed

Avoid extracting joblib archives#729
egibs wants to merge 2 commits intochainguard-dev:mainfrom
egibs:ignore-joblib-archives

Conversation

@egibs
Copy link
Member

@egibs egibs commented Dec 18, 2024

We're encountering failures when trying to extract files such as:

joblib/test/data/joblib_0.9.2_compressed_pickle_py35_np19.gz

While these appear to be valid gzip archives, they are actually just application/octet-stream/data files:

joblib/test/data/joblib_0.9.2_compressed_pickle_py35_np19.gz: data

This PR avoids extracting files like this if they contain common joblib extensions and have a MIME type of application/octet-stream.

To avoid creating temporary directories (and also scanning the equivalent of "" if tmpRoot is not created), I tweaked processArchive to return early if we detect these files. The nested extraction logic has also been updated to ignore joblib archives.

@egibs egibs requested a review from tstromberg December 18, 2024 20:21
Signed-off-by: egibs <20933572+egibs@users.noreply.github.com>
@egibs egibs force-pushed the ignore-joblib-archives branch from d3c9c8c to 7dc9d41 Compare December 18, 2024 20:46
@tstromberg
Copy link

I can't help but feel like hardcoding this exception introduces unnecessary complexity without a tangible benefit. Is this just to suppress output errors?

@egibs
Copy link
Member Author

egibs commented Dec 19, 2024

I can't help but feel like hardcoding this exception introduces unnecessary complexity without a tangible benefit. Is this just to suppress output errors?

Pretty much. Mainly since the files look like valid archives that should be extracting correctly.

Signed-off-by: Evan Gibler <20933572+egibs@users.noreply.github.com>
@egibs
Copy link
Member Author

egibs commented Dec 31, 2024

Closing this out for now since it's just to quiet down errors for unsupported archive types.

@egibs egibs closed this Dec 31, 2024
@egibs egibs deleted the ignore-joblib-archives branch January 17, 2025 23:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants