Skip to content

Commit f648084

Browse files
authored
Merge branch 'develop' into 11772-api-terms-of-access
2 parents ec9033a + e6af7be commit f648084

34 files changed

+2492
-279
lines changed
Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,13 @@
1+
- The optional Croissant exporter has been updated to 0.1.6 to prevent variable names, variable descriptions, and variable types from being exposed for restricted files. See https://github.com/gdcc/exporter-croissant/pull/20 and #11752.
2+
3+
## Upgrade Instructions
4+
5+
### Update Croissant exporter, if enabled, and reexport metadata
6+
7+
If you have enabled the Croissant dataset metadata exporter, you should upgrade to version 0.1.6.
8+
9+
- Stop Payara.
10+
- Delete the old Croissant exporter jar file. It will be located in the directory defined by the `dataverse.spi.exporters.directory` setting.
11+
- Download the updated Croissant jar from https://repo1.maven.org/maven2/io/gdcc/export/croissant/ and place it in the same directory.
12+
- Restart Payara.
13+
- Run reExportAll.
Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
This version of Dataverse includes extensions of the Dataverse External Vocabulary mechanism (https://guides.dataverse.org/en/latest/admin/metadatacustomization.html#using-external-vocabulary-services) that improve Dataverse's ability to include metadata about vocabulary terms and external identifiers such as ORCID and ROR in it's metadata exports. More information on how to configure external vocabulary scripts to use this functionality can be found at https://github.com/gdcc/dataverse-external-vocab-support/blob/main/docs/readme.md and in the examples in the https://github.com/gdcc/dataverse-external-vocab-support repository.
Lines changed: 33 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,33 @@
1+
### Pagination for API Version Summaries
2+
3+
We've added pagination support to the following API endpoints:
4+
5+
- File Version Differences: api/files/{id}/versionDifferences
6+
7+
- Dataset Version Summaries: api/datasets/:persistentId/versions/compareSummary
8+
9+
You can now use two new query parameters to control the results:
10+
11+
- **limit**: An integer specifying the maximum number of results to return per page.
12+
13+
- **offset**: An integer specifying the number of results to skip before starting to return items. This is used to
14+
navigate to different pages.
15+
16+
### Performance enhancements for API Version Summaries
17+
18+
In addition to adding pagination, we've significantly improved the performance of these endpoints by implementing more
19+
efficient database queries.
20+
21+
These changes address performance bottlenecks that were previously encountered, especially with datasets or files
22+
containing a large number of versions.
23+
24+
### Fixes for File Version Summaries API
25+
26+
The implementation for file version summaries was unreliable, leading to exceptions and functional inconsistencies, as
27+
documented in issue #11561. This functionality has been reviewed and fixed to ensure correctness and stability.
28+
29+
### Related issues and PRs
30+
31+
- https://github.com/IQSS/dataverse/issues/11855
32+
- https://github.com/IQSS/dataverse/pull/11859
33+
- https://github.com/IQSS/dataverse/issues/11561

doc/sphinx-guides/source/api/native-api.rst

Lines changed: 29 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -2143,14 +2143,26 @@ be available to users who have permission to view unpublished drafts. The api to
21432143
export SERVER_URL=https://demo.dataverse.org
21442144
export PERSISTENT_IDENTIFIER=doi:10.5072/FK2/BCCP9Z
21452145
2146-
curl -H "X-Dataverse-key: $API_TOKEN" -X PUT "$SERVER_URL/api/datasets/:persistentId/versions/compareSummary?persistentId=$PERSISTENT_IDENTIFIER"
2146+
curl -H "X-Dataverse-key: $API_TOKEN" -X GET "$SERVER_URL/api/datasets/:persistentId/versions/compareSummary?persistentId=$PERSISTENT_IDENTIFIER"
21472147
21482148
The fully expanded example above (without environment variables) looks like this:
21492149

21502150
.. code-block:: bash
21512151
2152-
curl -H "X-Dataverse-key: xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" -X PUT "https://demo.dataverse.org/api/datasets/:persistentId/versions/compareSummary?persistentId=doi:10.5072/FK2/BCCP9Z"
2152+
curl -H "X-Dataverse-key: xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" -X GET "https://demo.dataverse.org/api/datasets/:persistentId/versions/compareSummary?persistentId=doi:10.5072/FK2/BCCP9Z"
21532153
2154+
You can control pagination of the results using the following optional query parameters.
2155+
2156+
* ``limit``: The maximum number of version differences to return.
2157+
* ``offset``: The number of version differences to skip from the beginning of the list. Used for retrieving subsequent pages of results.
2158+
2159+
To aid in pagination the JSON response also includes the total number of rows (totalCount) available.
2160+
2161+
For example, to get the second page of results, with 2 items per page, you would use ``limit=2`` and ``offset=2`` (skipping the first two results).
2162+
2163+
.. code-block:: bash
2164+
2165+
curl -H "X-Dataverse-key: xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" -X GET "https://demo.dataverse.org/api/datasets/:persistentId/versions/compareSummary?persistentId=doi:10.5072/FK2/BCCP9Z&limit=2&offset=2"
21542166
21552167
Update Metadata For a Dataset
21562168
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
@@ -4362,8 +4374,21 @@ The fully expanded example above (without environment variables) looks like this
43624374
43634375
.. code-block:: bash
43644376
4365-
curl -X GET "https://demo.dataverse.org/api/files/1234/versionDifferences"
4366-
curl -X GET "https://demo.dataverse.org/api/files/:persistentId/versionDifferences?persistentId=doi:10.5072/FK2/J8SJZB"
4377+
curl -H "X-Dataverse-key: xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" -X GET "https://demo.dataverse.org/api/files/1234/versionDifferences"
4378+
curl -H "X-Dataverse-key: xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" -X GET "https://demo.dataverse.org/api/files/:persistentId/versionDifferences?persistentId=doi:10.5072/FK2/J8SJZB"
4379+
4380+
You can control pagination of the results using the following optional query parameters.
4381+
4382+
* ``limit``: The maximum number of version differences to return.
4383+
* ``offset``: The number of version differences to skip from the beginning of the list. Used for retrieving subsequent pages of results.
4384+
4385+
To aid in pagination the JSON response also includes the total number of rows (totalCount) available.
4386+
4387+
For example, to get the second page of results, with 2 items per page, you would use ``limit=2`` and ``offset=2`` (skipping the first two results).
4388+
4389+
.. code-block:: bash
4390+
4391+
curl -H "X-Dataverse-key: xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" -X GET "https://demo.dataverse.org/api/files/1234/versionDifferences?limit=2&offset=2"
43674392
43684393
Adding Files
43694394
~~~~~~~~~~~~

src/main/java/edu/harvard/iq/dataverse/DataFileServiceBean.java

Lines changed: 132 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -28,18 +28,18 @@
2828
import java.util.Map;
2929
import java.util.Set;
3030
import java.util.UUID;
31+
import java.util.function.Function;
3132
import java.util.logging.Level;
3233
import java.util.logging.Logger;
34+
import java.util.stream.Collectors;
35+
3336
import jakarta.ejb.EJB;
3437
import jakarta.ejb.Stateless;
3538
import jakarta.ejb.TransactionAttribute;
3639
import jakarta.ejb.TransactionAttributeType;
3740
import jakarta.inject.Named;
38-
import jakarta.persistence.EntityManager;
39-
import jakarta.persistence.NoResultException;
40-
import jakarta.persistence.PersistenceContext;
41-
import jakarta.persistence.Query;
42-
import jakarta.persistence.TypedQuery;
41+
import jakarta.persistence.*;
42+
import jakarta.persistence.criteria.*;
4343

4444
/**
4545
*
@@ -376,6 +376,133 @@ public FileMetadata findFileMetadataByDatasetVersionIdAndDataFileId(Long dataset
376376
}
377377
}
378378

379+
/**
380+
* Finds the complete history of a file's presence across all dataset versions.
381+
* <p>
382+
* This method returns a {@link VersionedFileMetadata} entry for every version
383+
* of the specified dataset. If a version does not contain the file, the
384+
* {@code fileMetadata} field in the corresponding DTO will be {@code null}.
385+
* It correctly handles file replacements by searching for all files sharing the
386+
* same {@code rootDataFileId}.
387+
*
388+
* @param datasetId The ID of the parent dataset.
389+
* @param dataFile The DataFile entity to find the history for.
390+
* @param canViewUnpublishedVersions A boolean indicating if the user has permission to view non-released versions.
391+
* @param limit (Optional) The maximum number of results to return.
392+
* @param offset (Optional) The starting point of the result list.
393+
* @return A chronologically sorted, paginated list of the file's version history, including versions where the file is absent.
394+
*/
395+
public List<VersionedFileMetadata> findFileMetadataHistory(Long datasetId,
396+
DataFile dataFile,
397+
boolean canViewUnpublishedVersions,
398+
Integer limit,
399+
Integer offset) {
400+
if (dataFile == null) {
401+
return Collections.emptyList();
402+
}
403+
404+
// Query 1: Get the paginated list of relevant DatasetVersions
405+
CriteriaBuilder cb = em.getCriteriaBuilder();
406+
CriteriaQuery<DatasetVersion> versionQuery = cb.createQuery(DatasetVersion.class);
407+
Root<DatasetVersion> versionRoot = versionQuery.from(DatasetVersion.class);
408+
409+
List<Predicate> versionPredicates = new ArrayList<>();
410+
versionPredicates.add(cb.equal(versionRoot.join("dataset").get("id"), datasetId));
411+
if (!canViewUnpublishedVersions) {
412+
versionPredicates.add(versionRoot.get("versionState").in(
413+
VersionState.RELEASED, VersionState.DEACCESSIONED));
414+
}
415+
versionQuery.where(versionPredicates.toArray(new Predicate[0]));
416+
versionQuery.orderBy(
417+
cb.desc(versionRoot.get("versionNumber")),
418+
cb.desc(versionRoot.get("minorVersionNumber"))
419+
);
420+
421+
TypedQuery<DatasetVersion> typedVersionQuery = em.createQuery(versionQuery);
422+
if (limit != null) {
423+
typedVersionQuery.setMaxResults(limit);
424+
}
425+
if (offset != null) {
426+
typedVersionQuery.setFirstResult(offset);
427+
}
428+
List<DatasetVersion> datasetVersions = typedVersionQuery.getResultList();
429+
430+
if (datasetVersions.isEmpty()) {
431+
return Collections.emptyList();
432+
}
433+
434+
// Query 2: Get all FileMetadata for this file's history in this dataset
435+
CriteriaQuery<FileMetadata> fmQuery = cb.createQuery(FileMetadata.class);
436+
Root<FileMetadata> fmRoot = fmQuery.from(FileMetadata.class);
437+
438+
List<Predicate> fmPredicates = new ArrayList<>();
439+
fmPredicates.add(cb.equal(fmRoot.get("datasetVersion").get("dataset").get("id"), datasetId));
440+
441+
// Find the file by its entire lineage
442+
if (dataFile.getRootDataFileId() < 0) {
443+
fmPredicates.add(cb.equal(fmRoot.get("dataFile").get("id"), dataFile.getId()));
444+
} else {
445+
fmPredicates.add(cb.equal(fmRoot.get("dataFile").get("rootDataFileId"), dataFile.getRootDataFileId()));
446+
}
447+
fmQuery.where(fmPredicates.toArray(new Predicate[0]));
448+
449+
List<FileMetadata> fileHistory = em.createQuery(fmQuery).getResultList();
450+
451+
// Combine results
452+
Map<Long, FileMetadata> fmMap = fileHistory.stream()
453+
.collect(Collectors.toMap(
454+
fm -> fm.getDatasetVersion().getId(),
455+
Function.identity()
456+
));
457+
458+
// Create the final list, looking up the FileMetadata for each version
459+
return datasetVersions.stream()
460+
.map(version -> new VersionedFileMetadata(
461+
version,
462+
fmMap.get(version.getId()) // This will be null if no entry exists for that version ID
463+
))
464+
.collect(Collectors.toList());
465+
}
466+
467+
/**
468+
* Finds the FileMetadata for a given file in the version immediately preceding a specified version.
469+
*
470+
* @param fileMetadata The FileMetadata instance from the current version, used to identify the file's lineage.
471+
* @return The FileMetadata from the immediately prior version, or {@code null} if this is the first version of the file.
472+
*/
473+
public FileMetadata getPreviousFileMetadata(FileMetadata fileMetadata) {
474+
if (fileMetadata == null || fileMetadata.getDataFile() == null) {
475+
return null;
476+
}
477+
478+
// 1. Get the ID of the file that was replaced.
479+
Long previousId = fileMetadata.getDataFile().getPreviousDataFileId();
480+
481+
// If there's no previous ID, this is the first version of the file.
482+
if (previousId == null) {
483+
return null;
484+
}
485+
486+
CriteriaBuilder cb = em.getCriteriaBuilder();
487+
CriteriaQuery<FileMetadata> cq = cb.createQuery(FileMetadata.class);
488+
Root<FileMetadata> fileMetadataRoot = cq.from(FileMetadata.class);
489+
490+
// 2. Join FileMetadata to DataFile to access the ID.
491+
Join<FileMetadata, DataFile> dataFileJoin = fileMetadataRoot.join("dataFile");
492+
493+
// 3. Find the FileMetadata whose DataFile ID matches the previousId.
494+
cq.where(cb.equal(dataFileJoin.get("id"), previousId));
495+
496+
// --- Execution ---
497+
TypedQuery<FileMetadata> query = em.createQuery(cq);
498+
try {
499+
return query.getSingleResult();
500+
} catch (NoResultException e) {
501+
// If no result is found, return null.
502+
return null;
503+
}
504+
}
505+
379506
public FileMetadata findMostRecentVersionFileIsIn(DataFile file) {
380507
if (file == null) {
381508
return null;

src/main/java/edu/harvard/iq/dataverse/DatasetFieldServiceBean.java

Lines changed: 13 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -764,13 +764,24 @@ Object processPathSegment(int index, String[] pathParts, JsonValue curPath, Stri
764764
JsonValue val = jo.get(keyVal[0]);
765765
if (val != null) {
766766
if (val.getValueType().equals(ValueType.STRING)) {
767+
//Match a string value
767768
if (((JsonString) val).getString().equals(expected)) {
768769
logger.fine("Found: " + jo);
769770
curPath = jo;
770771
return processPathSegment(index + 1, pathParts, curPath, termUri);
771772
}
772-
} else {
773-
logger.warning("Expected a string value for " + keyVal[0] + " but found: " + val.getValueType());
773+
} else if (val.getValueType() == JsonValue.ValueType.ARRAY) {
774+
// Match one string in an array
775+
JsonArray jsonArray = (JsonArray) val;
776+
for (JsonValue arrayVal : jsonArray) {
777+
if (arrayVal.getValueType() == JsonValue.ValueType.STRING) {
778+
if (((JsonString) arrayVal).getString().equals(expected)) {
779+
logger.fine("Found match in array: " + jo.toString());
780+
curPath = jo;
781+
return processPathSegment(index + 1, pathParts, curPath, termUri);
782+
}
783+
}
784+
}
774785
}
775786
}
776787
}

src/main/java/edu/harvard/iq/dataverse/DatasetVersion.java

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -73,7 +73,11 @@
7373
@NamedQuery(name = "DatasetVersion.findById",
7474
query = "SELECT o FROM DatasetVersion o LEFT JOIN FETCH o.fileMetadatas WHERE o.id=:id"),
7575
@NamedQuery(name = "DatasetVersion.findByDataset",
76-
query = "SELECT o FROM DatasetVersion o WHERE o.dataset.id=:datasetId ORDER BY o.versionNumber DESC, o.minorVersionNumber DESC"),
76+
query = "SELECT o FROM DatasetVersion o WHERE o.dataset.id=:datasetId ORDER BY o.versionNumber DESC, o.minorVersionNumber DESC"),
77+
@NamedQuery(name = "DatasetVersion.findByDesiredStatesAndDataset",
78+
query = "SELECT o FROM DatasetVersion o " +
79+
"WHERE o.dataset.id = :datasetId AND o.versionState IN :states " +
80+
"ORDER BY o.versionNumber DESC, o.minorVersionNumber DESC"),
7781
@NamedQuery(name = "DatasetVersion.findReleasedByDataset",
7882
query = "SELECT o FROM DatasetVersion o WHERE o.dataset.id=:datasetId AND o.versionState=edu.harvard.iq.dataverse.DatasetVersion.VersionState.RELEASED ORDER BY o.versionNumber DESC, o.minorVersionNumber DESC")/*,
7983
@NamedQuery(name = "DatasetVersion.findVersionElements",

0 commit comments

Comments
 (0)