Skip to content

Commit f8adb0c

Browse files
committed
Revise doc for repack
1 parent 577400a commit f8adb0c

File tree

1 file changed

+26
-24
lines changed

1 file changed

+26
-24
lines changed

README.md

Lines changed: 26 additions & 24 deletions
Original file line numberDiff line numberDiff line change
@@ -31,38 +31,40 @@ This package extends `zipfile` with `remove`-related functionalities.
3131
3232
* `ZipFile.repack(removed=None, *, strict_descriptor=False[, chunk_size])`
3333

34-
Rewrites the archive to remove stale local file entries, shrinking its file
35-
size. The archive must be opened with mode ``'a'``.
34+
Rewrites the archive to remove unreferenced local file entries, shrinking
35+
its file size. The archive must be opened with mode ``'a'``.
3636

3737
If *removed* is provided, it must be a sequence of `ZipInfo` objects
38-
representing removed entries; only their corresponding local file entries
39-
will be removed.
40-
41-
If *removed* is not provided, the archive is scanned to identify and remove
42-
local file entries that are no longer referenced in the central directory.
43-
The algorithm assumes that local file entries (and the central directory,
44-
which is mostly treated as the "last entry") are stored consecutively:
45-
46-
1. Data before the first referenced entry is removed only when it appears to
47-
be a sequence of consecutive entries with no extra following bytes; extra
48-
preceding bytes are preserved.
49-
2. Data between referenced entries is removed only when it appears to
50-
be a sequence of consecutive entries with no extra preceding bytes; extra
51-
following bytes are preserved.
52-
3. Entries must not overlap. If any entry's data overlaps with another, a
53-
`BadZipFile` error is raised and no changes are made.
54-
55-
When scanning, setting `strict_descriptor=True` disables detection of any
56-
entry using an unsigned data descriptor (deprecated in the ZIP specification
57-
since version 6.3.0, released on 2006-09-29, and used only by some legacy
58-
tools). This improves performance, but may cause some stale entries to be
59-
preserved.
38+
representing the recently removed members, and only their corresponding
39+
local file entries will be removed. Otherwise, the archive is scanned to
40+
locate and remove local file entries that are no longer referenced in the
41+
central directory.
42+
43+
When scanning, setting ``strict_descriptor=True`` disables detection of any
44+
entry using an unsigned data descriptor (a format deprecated by the ZIP
45+
specification since version 6.3.0, released on 2006-09-29, and used only by
46+
some legacy tools), which is significantly slower to scan—around 100 to
47+
1000 times in the worst case. This does not affect performance on entries
48+
without such feature.
6049

6150
*chunk_size* may be specified to control the buffer size when moving
6251
entry data (default is 1 MiB).
6352

6453
Calling `repack` on a closed ZipFile will raise a `ValueError`.
6554

55+
> **Note:**
56+
> The scanning algorithm is heuristic-based and assumes that the ZIP file
57+
> is normally structured—for example, with local file entries stored
58+
> consecutively, without overlap or interleaved binary data. Prepended
59+
> binary data, such as a self-extractor stub, is recognized and preserved
60+
> unless it happens to contain bytes that coincidentally resemble a valid
61+
> local file entry in multiple respects—an extremely rare case. Embedded
62+
> ZIP payloads are also handled correctly, as long as they follow normal
63+
> structure. However, the algorithm does not guarantee correctness or
64+
> safety on untrusted or intentionally crafted input. It is generally
65+
> recommended to provide the *removed* argument for better reliability and
66+
> performance.
67+
6668
* `ZipFile.copy(zinfo_or_arcname, new_arcname[, chunk_size])`
6769

6870
Copies a member *zinfo_or_arcname* to *new_arcname* in the archive.

0 commit comments

Comments
 (0)