@@ -31,38 +31,40 @@ This package extends `zipfile` with `remove`-related functionalities.
3131
3232* ` ZipFile.repack(removed=None, *, strict_descriptor=False[, chunk_size]) `
3333
34- Rewrites the archive to remove stale local file entries, shrinking its file
35- size. The archive must be opened with mode `` 'a' `` .
34+ Rewrites the archive to remove unreferenced local file entries, shrinking
35+ its file size. The archive must be opened with mode `` 'a' `` .
3636
3737 If * removed* is provided, it must be a sequence of ` ZipInfo ` objects
38- representing removed entries; only their corresponding local file entries
39- will be removed.
40-
41- If * removed* is not provided, the archive is scanned to identify and remove
42- local file entries that are no longer referenced in the central directory.
43- The algorithm assumes that local file entries (and the central directory,
44- which is mostly treated as the "last entry") are stored consecutively:
45-
46- 1 . Data before the first referenced entry is removed only when it appears to
47- be a sequence of consecutive entries with no extra following bytes; extra
48- preceding bytes are preserved.
49- 2 . Data between referenced entries is removed only when it appears to
50- be a sequence of consecutive entries with no extra preceding bytes; extra
51- following bytes are preserved.
52- 3 . Entries must not overlap. If any entry's data overlaps with another, a
53- ` BadZipFile ` error is raised and no changes are made.
54-
55- When scanning, setting ` strict_descriptor=True ` disables detection of any
56- entry using an unsigned data descriptor (deprecated in the ZIP specification
57- since version 6.3.0, released on 2006-09-29, and used only by some legacy
58- tools). This improves performance, but may cause some stale entries to be
59- preserved.
38+ representing the recently removed members, and only their corresponding
39+ local file entries will be removed. Otherwise, the archive is scanned to
40+ locate and remove local file entries that are no longer referenced in the
41+ central directory.
42+
43+ When scanning, setting `` strict_descriptor=True `` disables detection of any
44+ entry using an unsigned data descriptor (a format deprecated by the ZIP
45+ specification since version 6.3.0, released on 2006-09-29, and used only by
46+ some legacy tools), which is significantly slower to scan—around 100 to
47+ 1000 times in the worst case. This does not affect performance on entries
48+ without such feature.
6049
6150 * chunk_size* may be specified to control the buffer size when moving
6251 entry data (default is 1 MiB).
6352
6453 Calling ` repack ` on a closed ZipFile will raise a ` ValueError ` .
6554
55+ > ** Note:**
56+ > The scanning algorithm is heuristic-based and assumes that the ZIP file
57+ > is normally structured—for example, with local file entries stored
58+ > consecutively, without overlap or interleaved binary data. Prepended
59+ > binary data, such as a self-extractor stub, is recognized and preserved
60+ > unless it happens to contain bytes that coincidentally resemble a valid
61+ > local file entry in multiple respects—an extremely rare case. Embedded
62+ > ZIP payloads are also handled correctly, as long as they follow normal
63+ > structure. However, the algorithm does not guarantee correctness or
64+ > safety on untrusted or intentionally crafted input. It is generally
65+ > recommended to provide the * removed* argument for better reliability and
66+ > performance.
67+
6668* ` ZipFile.copy(zinfo_or_arcname, new_arcname[, chunk_size]) `
6769
6870 Copies a member * zinfo_or_arcname* to * new_arcname* in the archive.
0 commit comments