-
-
Notifications
You must be signed in to change notification settings - Fork 19.1k
ENH: add support for reading .tar archives #44787
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from 4 commits
Commits
Show all changes
38 commits
Select commit
Hold shift + click to select a range
c1823ef
Add reproduction test for .tar.gz archives
Skn0tt 9a85cba
add support for .tar archives
Skn0tt e673061
update doc comments
Skn0tt a0d6386
fix: pep8 errors
Skn0tt 6a8edef
refactor: flip _compression_to_extension around to support multiple e…
Skn0tt d4e40c9
refactor: detect tar files using existing extension mapping
Skn0tt 5f22df7
feat: add support for writing tar files
Skn0tt c6573ef
feat: assure it respects .gz endings
Skn0tt f3b6ed5
Merge branch 'master' into read-tar-archives
Skn0tt a4ac382
feat: add "tar" entry to compressionoptions
Skn0tt e66826b
chore: add whatsnew entry
Skn0tt 941be37
fix: test_compression_size_fh
Skn0tt e3369aa
Merge branch 'master' into read-tar-archives
Skn0tt 0468e5f
add tarfile to shared compression docs
Skn0tt 2531ee0
fix formatting
Skn0tt 57eba0a
pass through "mode" via compression args
Skn0tt 38f7d54
fix pickle test
Skn0tt 887fd10
add class comment
Skn0tt fc2e7f0
Merge remote-tracking branch 'origin/main' into read-tar-archives
Skn0tt 669d942
sort imports
Skn0tt 7d7d3c6
add _compression_to_extension back for backwards compatibility
Skn0tt 8b8b8ac
fix some type warnings
Skn0tt dd356f6
fix: formatting
Skn0tt 514014a
fix: mypy complaints
Skn0tt 38971c7
fix: more tests
Skn0tt e35d361
fix: some error with xml
Skn0tt c5088fc
fix: interpreted text role
Skn0tt f6c5173
move to v1.5 whatsnw
Skn0tt 9a4fa07
add versionadded note
Skn0tt 0c31aa8
don't leave blank lines
Skn0tt 086c598
add tests for zero files / multiple files
Skn0tt 861faf0
move _compression_to_extension to tests
Skn0tt 9458ecb
revert added "mode" argument
Skn0tt d20f315
add test to ensure that `compression.mode` works
Skn0tt 1066f1b
Merge branch 'main' into read-tar-archives
Skn0tt 6b0e1e6
Merge branch 'main' into read-tar-archives
Skn0tt 0d9ed18
compare strings, not bytes
Skn0tt 37370c2
replace carriage returns
Skn0tt File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -18,6 +18,7 @@ | |
import mmap | ||
import os | ||
from pathlib import Path | ||
import tarfile | ||
import tempfile | ||
from typing import ( | ||
IO, | ||
|
@@ -262,7 +263,7 @@ def _get_filepath_or_buffer( | |
---------- | ||
filepath_or_buffer : a url, filepath (str, py.path.local or pathlib.Path), | ||
or buffer | ||
compression : {{'gzip', 'bz2', 'zip', 'xz', None}}, optional | ||
compression : {{'gzip', 'bz2', 'zip', 'xz', 'tar', None}}, optional | ||
encoding : the encoding to use to decode bytes, default is 'utf-8' | ||
mode : str, optional | ||
|
||
|
@@ -496,9 +497,9 @@ def infer_compression( | |
---------- | ||
filepath_or_buffer : str or file handle | ||
File path or object. | ||
compression : {'infer', 'gzip', 'bz2', 'zip', 'xz', None} | ||
compression : {'infer', 'gzip', 'bz2', 'zip', 'xz', 'tar', None} | ||
If 'infer' and `filepath_or_buffer` is path-like, then detect | ||
compression from the following extensions: '.gz', '.bz2', '.zip', | ||
compression from the following extensions: '.gz', '.bz2', '.zip', '.tar', | ||
or '.xz' (otherwise no compression). | ||
|
||
Returns | ||
|
@@ -520,6 +521,9 @@ def infer_compression( | |
# Cannot infer compression of a buffer, assume no compression | ||
return None | ||
|
||
if ".tar" in filepath_or_buffer: | ||
twoertwein marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
return "tar" | ||
|
||
# Infer compression from the filename/URL extension | ||
for compression, extension in _compression_to_extension.items(): | ||
if filepath_or_buffer.lower().endswith(extension): | ||
|
@@ -747,6 +751,21 @@ def get_handle( | |
f"Only one file per ZIP: {zip_names}" | ||
) | ||
|
||
# TAR Encoding | ||
elif compression == "tar": | ||
tar = tarfile.open(handle, "r:*") | ||
|
||
handles.append(tar) | ||
files = tar.getnames() | ||
if len(files) == 1: | ||
handle = tar.extractfile(files[0]) | ||
elif len(files) == 0: | ||
raise ValueError(f"Zero files found in TAR archive {path_or_buf}") | ||
else: | ||
raise ValueError( | ||
"Multiple files found in TAR archive. " | ||
f"Only one file per TAR archive: {files}" | ||
) | ||
|
||
# XZ Compression | ||
elif compression == "xz": | ||
handle = get_lzma_file()(handle, ioargs.mode) | ||
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.