Skip to content

Commit bcd84b9

Browse files
zip and unzip API (#317)
Pulls in changes from #316 and #310 and cleans it up Mostly documented in the readme.adoc. Major APIs added: - `os.zip`: create or append to an existing zip file on disk - `os.zip.stream`: create a new zip file but write it to an `java.io.OutputStream` rather than a file on disk - `os.unzip`: unzip a zip file on disk into a folder on disk - `os.unzip.stream`: unzip a zip file from an `java.io.InputStream` into a folder on disk - `os.unzip.list`: list the contents of a zip file - `os.unzip.streamRaw`: low-level API used by `os.unzip.stream` and `os.unzip.list`, exposed in case users need it - `os.zip.open`: Opens a zip file as `java.nio.file.FileSystem` and gives you an `os.Path` you can use to work with it Hopefully these are APIs we can start using in Mill rather than `"zip"` subprocesses or ad-hoc helpers like `IO.unpackZip` Limitations: * Use of `java.nio.file.FileSystem` is only supported on JVM and not on Scala-Native, and so using `os.zip` to append to existing jar files or `os.zip.open` does not work on Scala-Native. * Also `os.zip` doesn't support creating/unpacking symlinks or preserving filesystem permissions in Zip files, because the underlying `java.util.zip.Zip*Stream` doesn't support them. Apache Commons Compress can work with them (https://commons.apache.org/proper/commons-compress/zip.html), but if we're sticking with std lib we don't have that * Bumps the version requirement to Java 11 and above, matching the direction of the rest of com-lihaoyi. Probably not strictly necessary, but we have to do it eventually and now is as good a time as ever with requests already bumped and Mill bumping soon in 0.12.0 --------- Co-authored-by: Chaitanya Waikar <[email protected]>
1 parent 8847bd0 commit bcd84b9

File tree

7 files changed

+1017
-37
lines changed

7 files changed

+1017
-37
lines changed

.github/workflows/run-tests.yml

Lines changed: 2 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -13,13 +13,8 @@ jobs:
1313
strategy:
1414
fail-fast: false
1515
matrix:
16-
os: [ubuntu-latest, windows-latest]
17-
java-version: [8, 17]
18-
include:
19-
- os: macos-latest
20-
java-version: 17
21-
- os: macos-latest
22-
java-version: 11
16+
os: [ubuntu-latest, windows-latest, macos-latest]
17+
java-version: [11, 17]
2318

2419
runs-on: ${{ matrix.os }}
2520

Readme.adoc

Lines changed: 269 additions & 26 deletions
Original file line numberDiff line numberDiff line change
@@ -985,9 +985,9 @@ os.remove(target: Path, checkExists: Boolean = false): Boolean
985985
----
986986

987987
Remove the target file or folder. Folders need to be empty to be removed; if you
988-
want to remove a folder tree recursively, use <<os-remove-all>>.
988+
want to remove a folder tree recursively, use <<os-remove-all>>.
989989
Returns `true` if the file was present before.
990-
It will fail with an exception when the file is missing but `checkExists` is `true`,
990+
It will fail with an exception when the file is missing but `checkExists` is `true`,
991991
or when the directory to remove is not empty.
992992

993993
[source,scala]
@@ -1215,6 +1215,249 @@ os.write(tempDir / "file", "Hello")
12151215
os.list(tempDir) ==> Seq(tempDir / "file")
12161216
----
12171217

1218+
=== Zip & Unzip Files
1219+
1220+
==== `os.zip`
1221+
1222+
[source,scala]
1223+
----
1224+
def apply(dest: os.Path,
1225+
sources: Seq[ZipSource] = List(),
1226+
excludePatterns: Seq[Regex] = List(),
1227+
includePatterns: Seq[Regex] = List(),
1228+
preserveMtimes: Boolean = false,
1229+
deletePatterns: Seq[Regex] = List(),
1230+
compressionLevel: Int = -1 /* 0-9 */): os.Path
1231+
----
1232+
1233+
The zip object provides functionality to create or modify zip archives. It supports:
1234+
1235+
- Zipping Files and Directories: You can zip both individual files and entire directories.
1236+
- Appending to Existing Archives: Files can be appended to an existing zip archive.
1237+
- Exclude Patterns (-x): You can specify files or patterns to exclude while zipping.
1238+
- Include Patterns (-i): You can include specific files or patterns while zipping.
1239+
- Delete Patterns (-d): You can delete specific files from an existing zip archive.
1240+
- Configuring whether or not to preserve filesyste mtimes and permissions
1241+
1242+
This will create a new zip archive at `dest` containing `file1.txt` and everything
1243+
inside `sources`. If `dest` already exists as a zip, the files will be appended to the
1244+
existing zip, and any existing zip entries matching `deletePatterns` will be removed.
1245+
1246+
Note that `os.zip` doesn't support creating/unpacking symlinks or filesystem permissions
1247+
in Zip files, because the underlying `java.util.zip.Zip*Stream` doesn't support them.
1248+
1249+
===== Zipping Files and Folders
1250+
1251+
The example below demonstrates the core workflows: creating a zip, appending to it, and
1252+
unzipping it:
1253+
1254+
[source,scala]
1255+
----
1256+
// Zipping files and folders in a new zip file
1257+
val zipFileName = "zip-file-test.zip"
1258+
val zipFile1: os.Path = os.zip(
1259+
destination = wd / zipFileName,
1260+
sourcePaths = Seq(
1261+
wd / "File.txt",
1262+
wd / "folder1"
1263+
)
1264+
)
1265+
1266+
// Adding files and folders to an existing zip file
1267+
os.zip(
1268+
destination = zipFile1,
1269+
sourcePaths = Seq(
1270+
wd / "folder2",
1271+
wd / "Multi Line.txt"
1272+
)
1273+
)
1274+
1275+
// Unzip file to a destination folder
1276+
val unzippedFolder = os.unzip(
1277+
source = wd / zipFileName,
1278+
destination = wd / "unzipped folder"
1279+
)
1280+
1281+
val paths = os.walk(unzippedFolder)
1282+
val expected = Seq(
1283+
// Files get included in the zip root using their name
1284+
wd / "unzipped folder/File.txt",
1285+
wd / "unzipped folder/Multi Line.txt",
1286+
// Folder contents get included relative to the source root
1287+
wd / "unzipped folder/nestedA",
1288+
wd / "unzipped folder/nestedB",
1289+
wd / "unzipped folder/one.txt",
1290+
wd / "unzipped folder/nestedA/a.txt",
1291+
wd / "unzipped folder/nestedB/b.txt",
1292+
)
1293+
assert(paths.sorted == expected)
1294+
----
1295+
1296+
===== Renaming files in the zip
1297+
1298+
You can also pass in a mapping to `os.zip` to specify exactly where in the zip each
1299+
input source file or folder should go:
1300+
1301+
```scala
1302+
val zipFileName = "zip-file-test.zip"
1303+
val zipFile1: os.Path = os.zip(
1304+
destination = wd / zipFileName,
1305+
sourcePaths = List(
1306+
// renaming files and folders
1307+
wd / "File.txt" -> os.sub / "renamed-file.txt",
1308+
wd / "folder1" -> os.sub / "renamed-folder"
1309+
)
1310+
)
1311+
1312+
val unzippedFolder = os.unzip(
1313+
source = zipFile1,
1314+
destination = wd / "unzipped folder"
1315+
)
1316+
1317+
val paths = os.walk(unzippedFolder)
1318+
val expected = Seq(
1319+
wd / "unzipped folder/renamed-file.txt",
1320+
wd / "unzipped folder/renamed-folder",
1321+
wd / "unzipped folder/renamed-folder/one.txt",
1322+
)
1323+
assert(paths.sorted == expected)
1324+
```
1325+
1326+
===== Excluding/Including Files in Zip
1327+
1328+
You can specify files or folders to be excluded or included when creating the zip:
1329+
1330+
[source,scala]
1331+
----
1332+
os.zip(
1333+
os.Path("/path/to/destination.zip"),
1334+
List(os.Path("/path/to/folder")),
1335+
excludePatterns = List(".*\\.log".r, "temp/.*".r), // Exclude log files and "temp" folder
1336+
includePatterns = List(".*\\.txt".r) // Include only .txt files
1337+
)
1338+
1339+
----
1340+
1341+
This will include only `.txt` files, excluding any `.log` files and anything inside
1342+
the `temp` folder.
1343+
1344+
==== `oz.zip.stream`
1345+
1346+
You can use `os.zip.stream` to write the final zip to an `OutputStream` rather than a
1347+
concrete `os.Path`. `os.zip.stream` returns a `geny.Writable`, which has a `writeBytesToStream`
1348+
method:
1349+
1350+
```scala
1351+
val zipFileName = "zipStreamFunction.zip"
1352+
1353+
val stream = os.write.outputStream(wd / "zipStreamFunction.zip")
1354+
1355+
val writable = zip.stream(sources = Seq(wd / "File.txt"))
1356+
1357+
writable.writeBytesTo(stream)
1358+
stream.close()
1359+
1360+
val unzippedFolder = os.unzip(
1361+
source = wd / zipFileName,
1362+
dest = wd / "zipStreamFunction"
1363+
)
1364+
1365+
val paths = os.walk(unzippedFolder)
1366+
assert(paths == Seq(unzippedFolder / "File.txt"))
1367+
```
1368+
1369+
This can be useful for streaming the zipped data to places which are not files:
1370+
over the network, over a pipe, etc.
1371+
1372+
==== `os.unzip`
1373+
1374+
===== Unzipping Files
1375+
[source,scala]
1376+
1377+
----
1378+
os.unzip(os.Path("/path/to/archive.zip"), Some(os.Path("/path/to/destination")))
1379+
----
1380+
1381+
This extracts the contents of `archive.zip` to the specified destination.
1382+
1383+
1384+
===== Excluding Files While Unzipping
1385+
You can exclude certain files from being extracted using patterns:
1386+
1387+
[source,scala]
1388+
----
1389+
os.unzip(
1390+
os.Path("/path/to/archive.zip"),
1391+
Some(os.Path("/path/to/destination")),
1392+
excludePatterns = List(".*\\.log".r, "temp/.*".r) // Exclude log files and the "temp" folder
1393+
)
1394+
----
1395+
1396+
===== `oz.unzip.list`
1397+
You can list the contents of the zip file without extracting them:
1398+
1399+
[source,scala]
1400+
----
1401+
os.unzip.list(os.Path("/path/to/archive.zip"))
1402+
----
1403+
1404+
This will print all the file paths contained in the zip archive.
1405+
1406+
==== `oz.unzip.stream`
1407+
1408+
You can unzip a zip file from any arbitrary `java.io.InputStream` containing its binary data
1409+
using the `os.unzip.stream` method:
1410+
1411+
```scala
1412+
val readableZipStream: java.io.InputStream = ???
1413+
1414+
// Unzipping the stream to the destination folder
1415+
os.unzip.stream(
1416+
source = readableZipStream,
1417+
dest = unzippedFolder
1418+
)
1419+
```
1420+
1421+
This can be useful if the zip file does not exist on disk, e.g. if it is received over the network
1422+
or produced in-memory by application logic.
1423+
1424+
OS-Lib also provides the `os.unzip.streamRaw` API, which is a lower level API used internally
1425+
within `os.unzip.stream` but can also be used directly if lower-level control is necessary.
1426+
1427+
==== `os.zip.open`
1428+
1429+
```scala
1430+
os.zip.open(path: Path): ZipRoot
1431+
```
1432+
1433+
`os.zip.open` allows you to treat zip files as filesystems, using normal `os.*` operations
1434+
on them. This provides a move flexible way to manipulate the contents of the zip in a fine-grained
1435+
manner when the normal `os.zip` or `os.unzip` operations do not suffice.
1436+
1437+
```scala
1438+
val zipFile = os.zip.open(wd / "zip-test.zip")
1439+
try {
1440+
os.copy(wd / "File.txt", zipFile / "File.txt")
1441+
os.copy(wd / "folder1", zipFile / "folder1")
1442+
os.copy(wd / "folder2", zipFile / "folder2")
1443+
}finally zipFile.close()
1444+
1445+
val zipFile2 = os.zip.open(wd / "zip-test.zip")
1446+
try{
1447+
os.list(zipFile2) ==> Vector(zipFile2 / "File.txt", zipFile2 / "folder1", zipFile2 / "folder2")
1448+
os.remove.all(zipFile2 / "folder2")
1449+
os.remove(zipFile2 / "File.txt")
1450+
}finally zipFile2.close()
1451+
1452+
val zipFile3 = os.zip.open(wd / "zip-test.zip")
1453+
try os.list(zipFile3) ==> Vector(zipFile3 / "folder1")
1454+
finally zipFile3.close()
1455+
```
1456+
1457+
`os.zip.open` returns a `ZipRoot`, which is identical to `os.Path` except it references the root
1458+
of the zip file rather than a bare path on the filesystem. Note that you need to call `ZipRoot#close()`
1459+
when you are done with it to avoid leaking filesystem resources.
1460+
12181461
=== Filesystem Metadata
12191462

12201463
==== `os.stat`
@@ -1708,13 +1951,13 @@ val yes10 = os.proc("yes")
17081951
----
17091952

17101953
This feature is implemented inside the library and will terminate any process reading the
1711-
stdin of other process in pipeline on every IO error. This behavior can be disabled via the
1712-
`handleBrokenPipe` flag on `call` and `spawn` methods. Note that Windows does not support
1713-
broken pipe behaviour, so a command like`yes` would run forever. `handleBrokenPipe` is set
1954+
stdin of other process in pipeline on every IO error. This behavior can be disabled via the
1955+
`handleBrokenPipe` flag on `call` and `spawn` methods. Note that Windows does not support
1956+
broken pipe behaviour, so a command like`yes` would run forever. `handleBrokenPipe` is set
17141957
to false by default on Windows.
17151958

17161959
Both `call` and `spawn` correspond in their behavior to their counterparts in the `os.proc`,
1717-
but `spawn` returns the `os.ProcessPipeline` instance instead. It offers the same
1960+
but `spawn` returns the `os.ProcessPipeline` instance instead. It offers the same
17181961
`API` as `SubProcess`, but will operate on the set of processes instead of a single one.
17191962

17201963
`Pipefail` is enabled by default, so if any of the processes in the pipeline fails, the whole
@@ -2105,14 +2348,14 @@ explicitly choose to convert relative paths to absolute using some base.
21052348

21062349
==== Roots and filesystems
21072350

2108-
If you are using a system that supports different roots of paths, e.g. Windows,
2109-
you can use the argument of `os.root` to specify which root you want to use.
2351+
If you are using a system that supports different roots of paths, e.g. Windows,
2352+
you can use the argument of `os.root` to specify which root you want to use.
21102353
If not specified, the default root will be used (usually, C on Windows, / on Unix).
21112354

21122355
[source,scala]
21132356
----
2114-
val root = os.root('C:\') / "Users/me"
2115-
assert(root == os.Path("C:\Users\me"))
2357+
val root = os.root("C:\\") / "Users/me"
2358+
assert(root == os.Path("C:\\Users\\me"))
21162359
----
21172360

21182361
Additionally, custom filesystems can be specified by passing a `FileSystem` to
@@ -2128,11 +2371,11 @@ val fs = FileSystems.newFileSystem(uri, env);
21282371
val path = os.root("/", fs) / "dir"
21292372
----
21302373

2131-
Note that the jar file system operations suchs as writing to a file are supported
2132-
only on JVM 11+. Depending on the filesystem, some operations may not be supported -
2133-
for example, running an `os.proc` with pwd in a jar file won't work. You may also
2134-
meet limitations imposed by the implementations - in jar file system, the files are
2135-
created only after the file system is closed. Until that, the ones created in your
2374+
Note that the jar file system operations suchs as writing to a file are supported
2375+
only on JVM 11+. Depending on the filesystem, some operations may not be supported -
2376+
for example, running an `os.proc` with pwd in a jar file won't work. You may also
2377+
meet limitations imposed by the implementations - in jar file system, the files are
2378+
created only after the file system is closed. Until that, the ones created in your
21362379
program are kept in memory.
21372380

21382381
==== `os.ResourcePath`
@@ -2199,9 +2442,9 @@ By default, the following types of values can be used where-ever ``os.Source``s
21992442
are required:
22002443

22012444
* Any `geny.Writable` data type:
2202-
** `Array[Byte]`
2203-
** `java.lang.String` (these are treated as UTF-8)
2204-
** `java.io.InputStream`
2445+
** `Array[Byte]`
2446+
** `java.lang.String` (these are treated as UTF-8)
2447+
** `java.io.InputStream`
22052448
* `java.nio.channels.SeekableByteChannel`
22062449
* Any `TraversableOnce[T]` of the above: e.g. `Seq[String]`,
22072450
`List[Array[Byte]]`, etc.
@@ -2266,9 +2509,9 @@ string, int or set representations of the `os.PermSet` via:
22662509
=== 0.10.7
22672510

22682511
* Allow multi-segment paths segments for literals https://github.com/com-lihaoyi/os-lib/pull/297: You
2269-
can now write `os.pwd / "foo/bar/qux"` rather than `os.pwd / "foo" / "bar" / "qux"`. Note that this
2270-
is only allowed for string literals, and non-literal path segments still need to be wrapped e.g.
2271-
`def myString = "foo/bar/qux"; os.pwd / os.SubPath(myString)` for security and safety purposes
2512+
can now write `os.pwd / "foo/bar/qux"` rather than `os.pwd / "foo" / "bar" / "qux"`. Note that this
2513+
is only allowed for string literals, and non-literal path segments still need to be wrapped e.g.
2514+
`def myString = "foo/bar/qux"; os.pwd / os.SubPath(myString)` for security and safety purposes
22722515

22732516
[#0-10-6]
22742517
=== 0.10.6
@@ -2279,23 +2522,23 @@ string, int or set representations of the `os.PermSet` via:
22792522
=== 0.10.5
22802523

22812524
* Introduce `os.SubProcess.env` `DynamicVariable` to override default `env`
2282-
(https://github.com/com-lihaoyi/os-lib/pull/295)
2525+
(https://github.com/com-lihaoyi/os-lib/pull/295)
22832526

22842527

22852528
[#0-10-4]
22862529
=== 0.10.4
22872530

22882531
* Add a lightweight syntax for `os.call()` and `os.spawn` APIs
2289-
(https://github.com/com-lihaoyi/os-lib/pull/292)
2532+
(https://github.com/com-lihaoyi/os-lib/pull/292)
22902533
* Add a configurable grace period when subprocesses timeout and have to
2291-
be terminated to give a chance for shutdown logic to run
2292-
(https://github.com/com-lihaoyi/os-lib/pull/286)
2534+
be terminated to give a chance for shutdown logic to run
2535+
(https://github.com/com-lihaoyi/os-lib/pull/286)
22932536

22942537
[#0-10-3]
22952538
=== 0.10.3
22962539

22972540
* `os.Inherit` now can be redirected on a threadlocal basis via `os.Inherit.in`, `.out`, or `.err`.
2298-
`os.InheritRaw` is available if you do not want the redirects to take effect
2541+
`os.InheritRaw` is available if you do not want the redirects to take effect
22992542

23002543

23012544
[#0-10-2]

0 commit comments

Comments
 (0)