-
-
Notifications
You must be signed in to change notification settings - Fork 33.2k
GH-128131: Completely support random read access of uncompressed unencrypted files in ZipFile #128143
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GH-128131: Completely support random read access of uncompressed unencrypted files in ZipFile #128143
Changes from 12 commits
f18239d
f90340f
5d23be6
1a64610
57cb51c
4d9cea4
3d37f31
67f05de
25f0a7e
f2f2374
a2c9037
6a55aad
1239005
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change | ||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| @@ -1,3 +1,4 @@ | ||||||||||||||||||||||||
| import _pyio | ||||||||||||||||||||||||
| import array | ||||||||||||||||||||||||
| import contextlib | ||||||||||||||||||||||||
| import importlib.util | ||||||||||||||||||||||||
|
|
@@ -3448,5 +3449,75 @@ def test_too_short(self): | |||||||||||||||||||||||
| b"zzz", zipfile._Extra.strip(b"zzz", (self.ZIP64_EXTRA,))) | ||||||||||||||||||||||||
|
|
||||||||||||||||||||||||
|
|
||||||||||||||||||||||||
| class StatIO(_pyio.BytesIO): | ||||||||||||||||||||||||
| """Buffer which remembers the number of bytes that were read.""" | ||||||||||||||||||||||||
|
|
||||||||||||||||||||||||
| def __init__(self): | ||||||||||||||||||||||||
| super().__init__() | ||||||||||||||||||||||||
| self.bytes_read = 0 | ||||||||||||||||||||||||
|
|
||||||||||||||||||||||||
| def read(self, size=-1): | ||||||||||||||||||||||||
| bs = super().read(size) | ||||||||||||||||||||||||
| self.bytes_read += len(bs) | ||||||||||||||||||||||||
| return bs | ||||||||||||||||||||||||
|
|
||||||||||||||||||||||||
|
|
||||||||||||||||||||||||
| class StoredZipExtFileRandomReadTest(unittest.TestCase): | ||||||||||||||||||||||||
5ec1cff marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||||||||||||||||||||
| def test_random_read(self): | ||||||||||||||||||||||||
|
||||||||||||||||||||||||
|
|
||||||||||||||||||||||||
| sio = StatIO() | ||||||||||||||||||||||||
| # 20000 bytes | ||||||||||||||||||||||||
| txt = b'0123456789' * 2000 | ||||||||||||||||||||||||
|
|
||||||||||||||||||||||||
| # The seek length must be greater than ZipExtFile.MIN_READ_SIZE | ||||||||||||||||||||||||
| # as `ZipExtFile._read2()` reads in blocks of this size and we | ||||||||||||||||||||||||
| # need to seek out of the buffered data | ||||||||||||||||||||||||
| min_size = zipfile.ZipExtFile.MIN_READ_SIZE | ||||||||||||||||||||||||
|
||||||||||||||||||||||||
| self.assertGreaterEqual(10002, min_size) # for forward seek test | ||||||||||||||||||||||||
| self.assertGreaterEqual(5003, min_size) # for backward seek test | ||||||||||||||||||||||||
|
||||||||||||||||||||||||
| self.assertGreaterEqual(10002, min_size) # for forward seek test | |
| self.assertGreaterEqual(5003, min_size) # for backward seek test | |
| forward_seek = 10002 | |
| backward_seek = 5003 | |
| self.assertGreaterEqual(forward_seek, min_size) | |
| self.assertGreaterEqual(backward_seek, min_size) |
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider setting a read_length variable to describe the 100. Then, in this assertion, use the same language. e.g.
| # The read length must be less than MIN_READ_SIZE, since we assume that | |
| # only 1 block is read in the test. | |
| self.assertGreaterEqual(min_size, 100) # for read() calls | |
| # Set read length less than MIN_READ_SIZE to ensure only 1 block is read. | |
| read_length = 100 | |
| self.assertLessThan(read_length, min_size) |
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Re-use the variables defined above.
| arr = fp.read(100) | |
| self.assertEqual(fp.tell(), 10102) | |
| self.assertEqual(arr, txt[10002:10102]) | |
| arr = fp.read(read_length) | |
| self.assertEqual(fp.tell(), forward_seek + read_length) | |
| self.assertEqual(arr, txt[forward_seek:forward_seek + read_length]) |
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please use a better name for d that indicates its meaning, or just inline it if it's only needed once and its meaning is inconsequential.
5ec1cff marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
picnixz marked this conversation as resolved.
Show resolved
Hide resolved
picnixz marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| fp.seek(-5003, os.SEEK_CUR) | |
| fp.seek(-backward_seek, os.SEEK_CUR) |
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| self.assertEqual(fp.tell(), 5099) # 5099 = 10102 - 5003 | |
| self.assertEqual(fp._left, fp._compress_left) | |
| arr = fp.read(100) | |
| self.assertEqual(fp.tell(), 5199) | |
| self.assertEqual(arr, txt[5099:5199]) | |
| self.assertEqual(fp.tell(), forward_seek - backward_seek + read_length) | |
| self.assertEqual(fp._left, fp._compress_left) | |
| arr = fp.read(read_length) | |
| backward_pos = forward_seek - backward_seek + read_length | |
| self.assertEqual(fp.tell(), backward_pos + read_length) | |
| self.assertEqual(arr, txt[backward_pos:backward_pos + read_length]) |
5ec1cff marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This feels like it should be a separate test (or two). In fact, I'm not even sure I understand why this private flag is even relevant to the issue at hand. If it's not a separate test with a separate justification, can you explain why it's related to the issue at hand? If these are private attributes, what is the public effect that's being validated?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test is to ensure that the eof flag is correctly updated after seeking to the end of the file and then seeking back.
5ec1cff marked this conversation as resolved.
Show resolved
Hide resolved
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,2 @@ | ||
| Completely support random access of uncompressed unencrypted read-only | ||
| zip files obtained by :meth:`ZipFile.open <zipfile.ZipFile.open>`. |
Uh oh!
There was an error while loading. Please reload this page.