Skip to content

Commit 416b02e

Browse files
authored
Merge pull request #11466 from IQSS/9620-better-file-bundle-name
Prettier names for the zipped multi-file bundles
2 parents 039b3b7 + add3415 commit 416b02e

File tree

4 files changed

+64
-17
lines changed

4 files changed

+64
-17
lines changed
Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,2 @@
1+
The Data Access APIs that generate multi-file zipped bundles will offer file name suggestions based on the persistent identifiers (for example, `doi-10.70122-fk2-xxyyzz.zip`), instead of the fixed `dataverse_files.zip` as in prior versions.
2+
See the Data Access API guide for more info.

doc/sphinx-guides/source/api/dataaccess.rst

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -21,7 +21,7 @@ There are a number of reasons why not all of the files can be downloaded:
2121
- Some of the files are restricted and your API token doesn't have access (you will still get the unrestricted files).
2222
- The Dataverse installation has limited how large the zip bundle can be.
2323

24-
In the curl example below, the flags ``-O`` and ``J`` are used. When there are no errors, this has the effect of saving the file as "dataverse_files.zip" (just like the web interface). The flags force errors to be downloaded as a file.
24+
In the curl example below, the flags ``-O`` and ``-J`` are used. When there are no errors, this has the effect of saving the file under the name suggested by Dataverse (which as of v6.7 will be based on the persistent identifier of the dataset and the latest version number, for example ``doi-10.70122-fk2-n2xgbj_1.1.zip``; in prior versions the file name was ``dataverse_files.zip`` in all cases). This mirrors the way the files are saved when downloaded in a browser. The flags also force error messages to be downloaded as a file.
2525

2626
Please note that in addition to the files from dataset, an additional file call "MANIFEST.TXT" will be included in the zipped bundle. It has additional information about the files.
2727

@@ -70,6 +70,8 @@ A curl example using a DOI (with version):
7070
7171
curl -O -J -H "X-Dataverse-key:$API_TOKEN" $SERVER_URL/api/access/dataset/:persistentId/versions/$VERSION?persistentId=$PERSISTENT_ID
7272
73+
Similarly to the API above, this will save the downloaded bundle under the name based on the persistent identifier and the version number, for example, ``doi-10.70122-fk2-n2xgbj_1.1.zip`` or ``doi-10.70122-fk2-n2xgbj_draft.zip``.
74+
7375
The fully expanded example above (without environment variables) looks like this:
7476

7577
.. code-block:: bash
@@ -173,7 +175,7 @@ Multiple File ("bundle") download
173175

174176
Alternate Form: POST to ``/api/access/datafiles`` with a ``fileIds`` input field containing the same comma separated list of file ids. This is most useful when your list of files surpasses the allowed URL length (varies but can be ~2000 characters).
175177

176-
Returns the files listed, zipped.
178+
Returns the files listed, zipped. As of v6.7 the name of the zipped bundle will be based on the persistent identifier of the parent dataset, for example, ``doi-10.70122-fk2-xxyyzz.zip``; in prior versions the file name was ``dataverse_files.zip`` in all cases).
177179

178180
.. note:: If the request can only be completed partially - if only *some* of the requested files can be served (because of the permissions and/or size restrictions), the file MANIFEST.TXT included in the zipped bundle will have entries specifying the reasons the missing files could not be downloaded. IN THE FUTURE the API will return a 207 status code to indicate that the result was a partial success. (As of writing this - v.4.11 - this hasn't been implemented yet)
179181

src/main/java/edu/harvard/iq/dataverse/api/Access.java

Lines changed: 30 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -169,6 +169,7 @@ public class Access extends AbstractApiBean {
169169
@Inject
170170
DataverseFeaturedItemServiceBean dataverseFeaturedItemServiceBean;
171171

172+
private static final String DEFAULT_BUNDLE_NAME = "dataverse_files.zip";
172173
//@EJB
173174

174175
// TODO:
@@ -643,7 +644,7 @@ public DownloadInstance downloadAuxiliaryFile(@Context ContainerRequestContext c
643644
public Response postDownloadDatafiles(@Context ContainerRequestContext crc, String fileIds, @QueryParam("gbrecs") boolean gbrecs, @Context UriInfo uriInfo, @Context HttpHeaders headers, @Context HttpServletResponse response) throws WebApplicationException {
644645

645646

646-
return downloadDatafiles(getRequestUser(crc), fileIds, gbrecs, uriInfo, headers, response);
647+
return downloadDatafiles(getRequestUser(crc), fileIds, gbrecs, uriInfo, headers, response, null);
647648
}
648649

649650
@GET
@@ -664,7 +665,7 @@ public Response downloadAllFromLatest(@Context ContainerRequestContext crc, @Pat
664665
// We don't want downloads from Draft versions to be counted,
665666
// so we are setting the gbrecs (aka "do not write guestbook response")
666667
// variable accordingly:
667-
return downloadDatafiles(getRequestUser(crc), fileIds, true, uriInfo, headers, response);
668+
return downloadDatafiles(getRequestUser(crc), fileIds, true, uriInfo, headers, response, "draft");
668669
}
669670
}
670671

@@ -685,7 +686,7 @@ public Response downloadAllFromLatest(@Context ContainerRequestContext crc, @Pat
685686
}
686687

687688
String fileIds = getFileIdsAsCommaSeparated(latest.getFileMetadatas());
688-
return downloadDatafiles(getRequestUser(crc), fileIds, gbrecs, uriInfo, headers, response);
689+
return downloadDatafiles(getRequestUser(crc), fileIds, gbrecs, uriInfo, headers, response, latest.getFriendlyVersionNumber());
689690
} catch (WrappedResponse wr) {
690691
return wr.getResponse();
691692
}
@@ -735,7 +736,7 @@ public Command<DatasetVersion> handleLatestPublished() {
735736
if (dsv.isDraft()) {
736737
gbrecs = true;
737738
}
738-
return downloadDatafiles(getRequestUser(crc), fileIds, gbrecs, uriInfo, headers, response);
739+
return downloadDatafiles(getRequestUser(crc), fileIds, gbrecs, uriInfo, headers, response, dsv.getFriendlyVersionNumber().toLowerCase());
739740
} catch (WrappedResponse wr) {
740741
return wr.getResponse();
741742
}
@@ -749,6 +750,24 @@ private static String getFileIdsAsCommaSeparated(List<FileMetadata> fileMetadata
749750
}
750751
return String.join(",", ids);
751752
}
753+
754+
private String generateMultiFileBundleName(Dataset dataset, String versionTag) {
755+
String bundleName = DEFAULT_BUNDLE_NAME;
756+
757+
if (dataset != null && dataset.getGlobalId() != null) {
758+
String protocol = dataset.getProtocol();
759+
String authority = dataset.getAuthority().toLowerCase();
760+
String identifier = dataset.getIdentifier().replace('/', '-').toLowerCase();
761+
762+
if (versionTag != null) {
763+
bundleName = protocol + "-" + authority + "-" + identifier + "_" + versionTag + ".zip";
764+
} else {
765+
bundleName = protocol + "-" + authority + "-" + identifier + ".zip";
766+
}
767+
}
768+
769+
return bundleName;
770+
}
752771

753772
/*
754773
* API method for downloading zipped bundles of multiple files:
@@ -758,10 +777,10 @@ private static String getFileIdsAsCommaSeparated(List<FileMetadata> fileMetadata
758777
@Path("datafiles/{fileIds}")
759778
@Produces({"application/zip"})
760779
public Response datafiles(@Context ContainerRequestContext crc, @PathParam("fileIds") String fileIds, @QueryParam("gbrecs") boolean gbrecs, @Context UriInfo uriInfo, @Context HttpHeaders headers, @Context HttpServletResponse response) throws WebApplicationException {
761-
return downloadDatafiles(getRequestUser(crc), fileIds, gbrecs, uriInfo, headers, response);
780+
return downloadDatafiles(getRequestUser(crc), fileIds, gbrecs, uriInfo, headers, response, null);
762781
}
763782

764-
private Response downloadDatafiles(User user, String rawFileIds, boolean donotwriteGBResponse, UriInfo uriInfo, HttpHeaders headers, HttpServletResponse response) throws WebApplicationException /* throws NotFoundException, ServiceUnavailableException, PermissionDeniedException, AuthorizationRequiredException*/ {
783+
private Response downloadDatafiles(User user, String rawFileIds, boolean donotwriteGBResponse, UriInfo uriInfo, HttpHeaders headers, HttpServletResponse response, String versionTag) throws WebApplicationException /* throws NotFoundException, ServiceUnavailableException, PermissionDeniedException, AuthorizationRequiredException*/ {
765784
final long zipDownloadSizeLimit = systemConfig.getZipDownloadLimit();
766785

767786
logger.fine("setting zip download size limit to " + zipDownloadSizeLimit + " bytes.");
@@ -852,8 +871,9 @@ public void write(OutputStream os) throws IOException,
852871
// to produce some output.
853872
zipper = new DataFileZipper(os);
854873
zipper.setFileManifest(fileManifest);
855-
response.setHeader("Content-disposition", "attachment; filename=\"dataverse_files.zip\"");
856-
response.setHeader("Content-Type", "application/zip; name=\"dataverse_files.zip\"");
874+
String bundleName = generateMultiFileBundleName(file.getOwner(), versionTag);
875+
response.setHeader("Content-disposition", "attachment; filename=\"" + bundleName + "\"");
876+
response.setHeader("Content-Type", "application/zip; name=\"" + bundleName + "\"");
857877
}
858878

859879
long size = 0L;
@@ -960,8 +980,8 @@ public InputStream tempPreview(@PathParam("fileSystemId") String fileSystemId, @
960980
961981
}*/
962982

963-
964-
983+
984+
965985
// TODO: Rather than only supporting looking up files by their database IDs, consider supporting persistent identifiers.
966986
@Path("fileCardImage/{fileId}")
967987
@GET

src/test/java/edu/harvard/iq/dataverse/api/AccessIT.java

Lines changed: 28 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -21,6 +21,8 @@
2121
import java.io.ByteArrayOutputStream;
2222
import java.io.InputStream;
2323
import java.util.HashMap;
24+
import java.util.regex.Matcher;
25+
import java.util.regex.Pattern;
2426

2527
import org.hamcrest.collection.IsMapContaining;
2628

@@ -43,7 +45,8 @@ public class AccessIT {
4345
public static String apiToken;
4446
public static String dataverseAlias;
4547
public static Integer datasetId;
46-
48+
public static String persistentId;
49+
4750
public static Integer basicFileId;
4851
public static Integer tabFile1Id;
4952
public static Integer tabFile2Id;
@@ -78,7 +81,6 @@ public class AccessIT {
7881
private static String testFileFromZipUploadWithFoldersChecksum1 = "8f326944be21361ad8219bc3269bc9eb";
7982
private static String testFileFromZipUploadWithFoldersChecksum2 = "0fe4efd85229bad6e587fd3f1a6c8e05";
8083
private static String testFileFromZipUploadWithFoldersChecksum3 = "00433ccb20111f9d40f0e5ab6fa8396f";
81-
8284

8385
@BeforeAll
8486
public static void setUp() throws InterruptedException {
@@ -101,6 +103,7 @@ public static void setUp() throws InterruptedException {
101103
Response createDatasetResponse = UtilIT.createDatasetViaNativeApi(dataverseAlias, pathToJsonFile, apiToken);
102104
createDatasetResponse.prettyPrint();
103105
datasetId = JsonPath.from(createDatasetResponse.body().asString()).getInt("data.id");
106+
persistentId = JsonPath.from(createDatasetResponse.body().asString()).getString("data.persistentId");
104107

105108
Response allowAccessRequests = UtilIT.allowAccessRequests(datasetId.toString(), true, apiToken);
106109
allowAccessRequests.prettyPrint();
@@ -285,7 +288,27 @@ public void testDownloadMultipleFiles_NonLoggedInOpen() throws IOException {
285288
assertThat(files2, IsMapContaining.hasKey(tabFile2NameConvert));
286289

287290
System.out.println("origSize: " + origSizeAnon + " | convertSize: " + convertSizeAnon);
288-
assertThat(origSizeAnon, is(not(convertSizeAnon)));
291+
assertThat(origSizeAnon, is(not(convertSizeAnon)));
292+
293+
// Finally, verify that the multi-file bundle produced by the API
294+
// is properly named (as of v6.7 this should be a pretty name based on
295+
// the persistent Id of the dataset).
296+
297+
String contentDispositionHeader = anonDownloadConverted.getHeader("Content-disposition");
298+
System.out.println("Response header: "+contentDispositionHeader);
299+
300+
Pattern regexPattern = Pattern.compile("attachment; filename=\"([a-z0-9\\.-]*\\.zip)\"");
301+
Matcher regexMatcher = regexPattern.matcher(contentDispositionHeader);
302+
boolean regexMatch = regexMatcher.find();
303+
assertTrue(regexMatch);
304+
305+
String expectedPrettyName = persistentId.replaceAll("[:/]", "-").toLowerCase() + ".zip";
306+
System.out.println("expected \"pretty\" file name of the zipped multi-file bundle: " + expectedPrettyName);
307+
308+
String fileBundleName = regexMatcher.group(1);
309+
System.out.println("file name found in the header: "+fileBundleName);
310+
311+
assertEquals(fileBundleName, expectedPrettyName);
289312
}
290313

291314
@Test
@@ -476,7 +499,7 @@ public void testRequestAccess() throws InterruptedException {
476499
basicFileName = "004.txt";
477500
String basicPathToFile = "scripts/search/data/replace_test/" + basicFileName;
478501
Response basicAddResponse = UtilIT.uploadFileViaNative(datasetIdNew.toString(), basicPathToFile, apiToken);
479-
basicFileId = JsonPath.from(basicAddResponse.body().asString()).getInt("data.files[0].dataFile.id");
502+
Integer basicFileIdNew = JsonPath.from(basicAddResponse.body().asString()).getInt("data.files[0].dataFile.id");
480503

481504
String tabFile3NameRestrictedNew = "stata13-auto-withstrls.dta";
482505
String tab3PathToFile = "scripts/search/data/tabular/" + tabFile3NameRestrictedNew;
@@ -534,7 +557,7 @@ public void testRequestAccess() throws InterruptedException {
534557
assertEquals(400, requestFileAccessResponse.getStatusCode());
535558

536559
//if you make a request of a public file you should also get a command exception
537-
requestFileAccessResponse = UtilIT.requestFileAccess(basicFileId.toString(), apiTokenRando);
560+
requestFileAccessResponse = UtilIT.requestFileAccess(basicFileIdNew.toString(), apiTokenRando);
538561
assertEquals(400, requestFileAccessResponse.getStatusCode());
539562

540563

0 commit comments

Comments
 (0)