Skip to content

Commit 00341c1

Browse files
authored
Refactor tidy_unzip() (#2028)
* Refactor tidy_unzip() Fixes #1961, closes #1962. Analyzing the above gave me a new/different understanding of common ZIP file organization structures. I think I've got a better heuristic for setting `exdir`. * Add NEWS bullet
1 parent 2248463 commit 00341c1

15 files changed

+496
-165
lines changed

NEWS.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,7 @@
11
# usethis (development version)
22

3+
* `use_zip()` and `use_course()` are equipped to handle a ZIP where the parent folder is implicit (@burnsal, #1961).
4+
35
* `use_test_helper()` is a new function to create a test helper file (@olivroy, #1822).
46

57
* `use_cpp11()` makes it easier to update `NAMESPACE` (@pachadotdev, #1921).

R/course.R

Lines changed: 18 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -360,15 +360,15 @@ tidy_unzip <- function(zipfile, cleanup = FALSE) {
360360
## DropBox ZIP files often include lots of hidden R, RStudio, and Git files
361361
filenames <- filenames[keep_lgl(filenames)]
362362

363-
td <- top_directory(filenames)
364-
loose_parts <- is.na(td)
365-
366-
if (loose_parts) {
363+
parents <- path_before_slash(filenames)
364+
unique_parents <- unique(parents)
365+
if (length(unique_parents) == 1 && unique_parents != "") {
366+
target <- path(base_path, unique_parents)
367+
utils::unzip(zipfile, files = filenames, exdir = base_path)
368+
} else {
369+
# there is no parent; archive contains loose parts
367370
target <- path_ext_remove(zipfile)
368371
utils::unzip(zipfile, files = filenames, exdir = target)
369-
} else {
370-
target <- path(base_path, td)
371-
utils::unzip(zipfile, files = filenames, exdir = base_path)
372372
}
373373
ui_bullets(c(
374374
"v" = "Unpacking ZIP file into {.path {pth(target, base_path)}}
@@ -398,7 +398,7 @@ tidy_unzip <- function(zipfile, cleanup = FALSE) {
398398
}
399399
}
400400

401-
invisible(target)
401+
invisible(unclass(target))
402402
}
403403

404404
#' @rdname use_course_details
@@ -525,15 +525,17 @@ keep_lgl <- function(file,
525525
!grepl(ignores, file, perl = TRUE)
526526
}
527527

528-
top_directory <- function(filenames) {
529-
in_top <- path_dir(filenames) == "."
530-
unique_top <- unique(filenames[in_top])
531-
is_directory <- grepl("/$", unique_top)
532-
if (length(unique_top) > 1 || !is_directory) {
533-
NA_character_
534-
} else {
535-
unique_top
528+
path_before_slash <- function(filepath) {
529+
f <- function(x) {
530+
parts <- strsplit(x, "/", fixed = TRUE)[[1]]
531+
if (length(parts) > 1 || grepl("/", x)) {
532+
parts[1]
533+
} else {
534+
""
535+
}
536536
}
537+
purrr::map_chr(filepath, f)
538+
537539
}
538540

539541
content_type <- function(h) {

tests/testthat/ref/README.Rmd

Lines changed: 151 additions & 41 deletions
Original file line numberDiff line numberDiff line change
@@ -14,66 +14,169 @@ library(fs)
1414

1515
## Different styles of ZIP file
1616

17-
Examples based on foo folder found here.
17+
usethis has an unexported function `tidy_unzip()`, which is used under the hood in `use_course()` and `use_zip()`.
18+
It is a wrapper around `utils::unzip()` that uses some heuristics to choose a good value for `exdir`, which is the "the directory to extract files to."
19+
Why do we do this?
20+
Because it's really easy to _not_ get the desired result when unpacking a ZIP archive.
21+
22+
Common aggravations:
23+
24+
* Instead of the unpacked files being corraled within a folder, they explode as "loose parts" into the current working directory. Too little nesting.
25+
* The unpacked files are contained in a folder, but that folder itself is contained inside another folder. Too much nesting.
26+
27+
`tidy_unzip()` tries to get the nesting just right.
28+
29+
Why doesn't unzipping "just work"?
30+
Because the people who make `.zip` files make lots of different choices when they actually create the archive and these details aren't baked in, i.e. a successful roundtrip isn't automatic.
31+
It usually requires some peeking inside the archive and adjusting the unpack options.
32+
33+
This README documents specific `.zip` situations that we anticipate.
34+
35+
## Explicit parent folder
36+
37+
Consider the foo folder:
1838

1939
```{bash}
2040
tree foo
2141
```
2242

23-
### Not Loose Parts, a.k.a. GitHub style
43+
Zip it up like so:
44+
45+
```{bash, eval = FALSE}
46+
zip -r foo-explicit-parent.zip foo/
47+
```
48+
49+
This is the type of ZIP file that we get from GitHub via links of the forms <https://github.com/r-lib/usethis/archive/main.zip> and <http://github.com/r-lib/usethis/zipball/main/>.
50+
51+
Inspect it in the shell:
52+
53+
```{bash}
54+
unzip -Z1 foo-explicit-parent.zip
55+
```
56+
57+
Or from R:
58+
59+
```{r}
60+
foo_files <- unzip("foo-explicit-parent.zip", list = TRUE)
61+
with(
62+
foo_files,
63+
data.frame(Name = Name, dirname = path_dir(Name), basename = path_file(Name))
64+
)
65+
```
66+
67+
Note that the folder `foo/` is explicitly included and all of the files are contained in it (in this case, just one file).
68+
69+
## Implicit parent folder
70+
71+
Consider the foo folder:
72+
73+
```{bash}
74+
tree foo
75+
```
2476

25-
This is the structure of ZIP files yielded by GitHub via links of the forms <https://github.com/r-lib/usethis/archive/master.zip> and <http://github.com/r-lib/usethis/zipball/master/>.
77+
Zip it up like so:
2678

2779
```{bash, eval = FALSE}
28-
zip -r foo-not-loose.zip foo/
80+
zip -r foo-implicit-parent.zip foo/*
2981
```
3082

31-
Notice that everything is packaged below one top-level directory.
83+
Note the use of `foo/*`, as opposed to `foo` or `foo/`.
84+
This type of ZIP file was reported in <https://github.com/r-lib/usethis/issues/1961>.
85+
The example given there is <https://agdatacommons.nal.usda.gov/ndownloader/files/44576230>.
86+
87+
Inspect our small example in the shell:
88+
89+
```{bash}
90+
unzip -Z1 foo-implicit-parent.zip
91+
```
92+
93+
Or from R:
3294

3395
```{r}
34-
foo_not_loose_files <- unzip("foo-not-loose.zip", list = TRUE)
96+
foo_files <- unzip("foo-implicit-parent.zip", list = TRUE)
3597
with(
36-
foo_not_loose_files,
98+
foo_files,
3799
data.frame(Name = Name, dirname = path_dir(Name), basename = path_file(Name))
38100
)
39101
```
40102

41-
### Loose Parts, the Regular Way
103+
Note that `foo/` is not included and its (original) existence is just implicit in the relative path to, e.g.,
104+
`foo/file.txt`.
105+
106+
Here's a similar look at the example from issue #1961:
107+
108+
```bash
109+
~/rrr/usethis/tests/testthat/ref % unzip -l ~/Downloads/Species\ v2.3.zip
110+
Archive: /Users/jenny/Downloads/Species v2.3.zip
111+
Length Date Time Name
112+
--------- ---------- ----- ----
113+
1241 04-27-2023 09:16 species_v2/label_encoder.txt
114+
175187560 04-06-2023 15:13 species_v2/model_arch.pt
115+
174953575 04-14-2023 12:28 species_v2/model_weights.pth
116+
--------- -------
117+
350142376 3 files
118+
```
119+
120+
Note also that the implicit parent folder `species_v2` is not the base of the ZIP file name `Species v2.3.zip`.
42121

43-
This is the structure of many ZIP files I've seen, just in general.
122+
## No parent
123+
124+
Consider the foo folder:
125+
126+
```{bash}
127+
tree foo
128+
```
129+
130+
Zip it up like so:
44131

45132
```{bash, eval = FALSE}
46-
cd foo
47-
zip ../foo-loose-regular.zip *
48-
cd ..
133+
(cd foo && zip -r ../foo-no-parent.zip .)
49134
```
50135

51-
All the files are packaged in the ZIP archive as "loose parts", i.e. there is no explicit top-level directory.
136+
Note that we are zipping everything in a folder from *inside* the folder.
137+
138+
Inspect our small example in the shell:
139+
140+
```{bash}
141+
unzip -Z1 foo-no-parent.zip
142+
```
143+
144+
Or from R:
52145

53146
```{r}
54-
foo_loose_regular_files <- unzip("foo-loose-regular.zip", list = TRUE)
147+
foo_files <- unzip("foo-no-parent.zip", list = TRUE)
55148
with(
56-
foo_loose_regular_files,
149+
foo_files,
57150
data.frame(Name = Name, dirname = path_dir(Name), basename = path_file(Name))
58151
)
59152
```
60153

61-
### Loose Parts, the DropBox Way
154+
All the files are packaged in the ZIP archive as "loose parts", i.e. there is no explicit or implicit top-level directory.
155+
156+
## No parent, the DropBox Variation
62157

63158
This is the structure of ZIP files yielded by DropBox via links of this form <https://www.dropbox.com/sh/12345abcde/6789wxyz?dl=1>. I can't figure out how to even do this with zip locally, so I had to create an example on DropBox and download it. Jim Hester reports it is possible with `archive::archive_write_files()`.
64159

65160
<https://www.dropbox.com/sh/5qfvssimxf2ja58/AABz3zrpf-iPYgvQCgyjCVdKa?dl=1>
66161

67-
It's basically like the "loose parts" above, except it includes a spurious top-level directory `"/"`.
162+
It's basically like the "no parent" example above, except it includes a spurious top-level directory `"/"`.
163+
164+
Inspect our small example in the shell:
165+
166+
```{bash}
167+
unzip -Z1 foo-loose-dropbox.zip
168+
```
169+
170+
Or from R:
68171

69172
```{r}
70173
# curl::curl_download(
71174
# "https://www.dropbox.com/sh/5qfvssimxf2ja58/AABz3zrpf-iPYgvQCgyjCVdKa?dl=1",
72175
# destfile = "foo-loose-dropbox.zip"
73176
# )
74-
foo_loose_dropbox_files <- unzip("foo-loose-dropbox.zip", list = TRUE)
177+
foo_files <- unzip("foo-loose-dropbox.zip", list = TRUE)
75178
with(
76-
foo_loose_dropbox_files,
179+
foo_files,
77180
data.frame(Name = Name, dirname = path_dir(Name), basename = path_file(Name))
78181
)
79182
```
@@ -87,43 +190,50 @@ mapname: conversion of failed
87190
inflating: file.txt
88191
```
89192

90-
So this is a pretty odd ZIP packing strategy. But we need to plan for it.
193+
which indicates some tripping over the `/`.
194+
Only `file.txt` is left behind.
91195

92-
## Subdirs only at top-level
196+
This is a pretty odd ZIP packing strategy. But we need to plan for it.
93197

94-
Let's make sure we detect loose parts (or not) when the top-level has only directories, not files.
198+
## Only subdirectories
95199

96-
Example based on the yo directory here:
200+
For testing purposes, we also want an example where all the files are inside subdirectories.
201+
202+
Examples based on the yo directory here:
97203

98204
```{bash}
99205
tree yo
100206
```
101207

102-
```{bash, eval = FALSE}
103-
zip -r yo-not-loose.zip yo/
104-
```
105-
106-
```{r}
107-
(yo_not_loose_files <- unzip("yo-not-loose.zip", list = TRUE))
108-
top_directory(yo_not_loose_files$Name)
109-
```
208+
Zip it up, in all the usual ways:
110209

111210
```{bash, eval = FALSE}
112-
cd yo
113-
zip -r ../yo-loose-regular.zip *
114-
cd ..
211+
zip -r yo-explicit-parent.zip yo/
212+
zip -r yo-implicit-parent.zip yo/*
213+
(cd yo && zip -r ../yo-no-parent.zip .)
115214
```
116215

117-
```{r}
118-
(yo_loose_regular_files <- unzip("yo-loose-regular.zip", list = TRUE))
119-
top_directory(yo_loose_regular_files$Name)
120-
```
216+
Again, I couldn't create the DropBox variant locally, so I did it by downloading from DropBox.
121217

122-
```{r}
218+
```{r eval = FALSE}
123219
# curl::curl_download(
124220
# "https://www.dropbox.com/sh/afydxe6pkpz8v6m/AADHbMZAaW3IQ8zppH9mjNsga?dl=1",
125221
# destfile = "yo-loose-dropbox.zip"
126222
# )
127-
(yo_loose_dropbox_files <- unzip("yo-loose-dropbox.zip", list = TRUE))
128-
top_directory(yo_loose_dropbox_files$Name)
223+
```
224+
225+
Inspect each in the shell:
226+
227+
```{bash}
228+
unzip -Z1 yo-explicit-parent.zip
229+
```
230+
231+
```{bash}
232+
unzip -Z1 yo-implicit-parent.zip
233+
```
234+
```{bash}
235+
unzip -Z1 yo-no-parent.zip
236+
```
237+
```{bash}
238+
unzip -Z1 yo-loose-dropbox.zip
129239
```

0 commit comments

Comments
 (0)