Skip to content

Commit 40bf831

Browse files
authored
Merge pull request #11859 from IQSS/11855-version-summaries-pagination
Version summaries fixes, enhancements and pagination
2 parents 19e67ef + 6e696ee commit 40bf831

29 files changed

+2406
-275
lines changed
Lines changed: 33 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,33 @@
1+
### Pagination for API Version Summaries
2+
3+
We've added pagination support to the following API endpoints:
4+
5+
- File Version Differences: api/files/{id}/versionDifferences
6+
7+
- Dataset Version Summaries: api/datasets/:persistentId/versions/compareSummary
8+
9+
You can now use two new query parameters to control the results:
10+
11+
- **limit**: An integer specifying the maximum number of results to return per page.
12+
13+
- **offset**: An integer specifying the number of results to skip before starting to return items. This is used to
14+
navigate to different pages.
15+
16+
### Performance enhancements for API Version Summaries
17+
18+
In addition to adding pagination, we've significantly improved the performance of these endpoints by implementing more
19+
efficient database queries.
20+
21+
These changes address performance bottlenecks that were previously encountered, especially with datasets or files
22+
containing a large number of versions.
23+
24+
### Fixes for File Version Summaries API
25+
26+
The implementation for file version summaries was unreliable, leading to exceptions and functional inconsistencies, as
27+
documented in issue #11561. This functionality has been reviewed and fixed to ensure correctness and stability.
28+
29+
### Related issues and PRs
30+
31+
- https://github.com/IQSS/dataverse/issues/11855
32+
- https://github.com/IQSS/dataverse/pull/11859
33+
- https://github.com/IQSS/dataverse/issues/11561

doc/sphinx-guides/source/api/native-api.rst

Lines changed: 29 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -2143,14 +2143,26 @@ be available to users who have permission to view unpublished drafts. The api to
21432143
export SERVER_URL=https://demo.dataverse.org
21442144
export PERSISTENT_IDENTIFIER=doi:10.5072/FK2/BCCP9Z
21452145
2146-
curl -H "X-Dataverse-key: $API_TOKEN" -X PUT "$SERVER_URL/api/datasets/:persistentId/versions/compareSummary?persistentId=$PERSISTENT_IDENTIFIER"
2146+
curl -H "X-Dataverse-key: $API_TOKEN" -X GET "$SERVER_URL/api/datasets/:persistentId/versions/compareSummary?persistentId=$PERSISTENT_IDENTIFIER"
21472147
21482148
The fully expanded example above (without environment variables) looks like this:
21492149

21502150
.. code-block:: bash
21512151
2152-
curl -H "X-Dataverse-key: xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" -X PUT "https://demo.dataverse.org/api/datasets/:persistentId/versions/compareSummary?persistentId=doi:10.5072/FK2/BCCP9Z"
2152+
curl -H "X-Dataverse-key: xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" -X GET "https://demo.dataverse.org/api/datasets/:persistentId/versions/compareSummary?persistentId=doi:10.5072/FK2/BCCP9Z"
21532153
2154+
You can control pagination of the results using the following optional query parameters.
2155+
2156+
* ``limit``: The maximum number of version differences to return.
2157+
* ``offset``: The number of version differences to skip from the beginning of the list. Used for retrieving subsequent pages of results.
2158+
2159+
To aid in pagination the JSON response also includes the total number of rows (totalCount) available.
2160+
2161+
For example, to get the second page of results, with 2 items per page, you would use ``limit=2`` and ``offset=2`` (skipping the first two results).
2162+
2163+
.. code-block:: bash
2164+
2165+
curl -H "X-Dataverse-key: xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" -X GET "https://demo.dataverse.org/api/datasets/:persistentId/versions/compareSummary?persistentId=doi:10.5072/FK2/BCCP9Z&limit=2&offset=2"
21542166
21552167
Update Metadata For a Dataset
21562168
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
@@ -4322,8 +4334,21 @@ The fully expanded example above (without environment variables) looks like this
43224334
43234335
.. code-block:: bash
43244336
4325-
curl -X GET "https://demo.dataverse.org/api/files/1234/versionDifferences"
4326-
curl -X GET "https://demo.dataverse.org/api/files/:persistentId/versionDifferences?persistentId=doi:10.5072/FK2/J8SJZB"
4337+
curl -H "X-Dataverse-key: xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" -X GET "https://demo.dataverse.org/api/files/1234/versionDifferences"
4338+
curl -H "X-Dataverse-key: xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" -X GET "https://demo.dataverse.org/api/files/:persistentId/versionDifferences?persistentId=doi:10.5072/FK2/J8SJZB"
4339+
4340+
You can control pagination of the results using the following optional query parameters.
4341+
4342+
* ``limit``: The maximum number of version differences to return.
4343+
* ``offset``: The number of version differences to skip from the beginning of the list. Used for retrieving subsequent pages of results.
4344+
4345+
To aid in pagination the JSON response also includes the total number of rows (totalCount) available.
4346+
4347+
For example, to get the second page of results, with 2 items per page, you would use ``limit=2`` and ``offset=2`` (skipping the first two results).
4348+
4349+
.. code-block:: bash
4350+
4351+
curl -H "X-Dataverse-key: xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" -X GET "https://demo.dataverse.org/api/files/1234/versionDifferences?limit=2&offset=2"
43274352
43284353
Adding Files
43294354
~~~~~~~~~~~~

src/main/java/edu/harvard/iq/dataverse/DataFileServiceBean.java

Lines changed: 132 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -28,18 +28,18 @@
2828
import java.util.Map;
2929
import java.util.Set;
3030
import java.util.UUID;
31+
import java.util.function.Function;
3132
import java.util.logging.Level;
3233
import java.util.logging.Logger;
34+
import java.util.stream.Collectors;
35+
3336
import jakarta.ejb.EJB;
3437
import jakarta.ejb.Stateless;
3538
import jakarta.ejb.TransactionAttribute;
3639
import jakarta.ejb.TransactionAttributeType;
3740
import jakarta.inject.Named;
38-
import jakarta.persistence.EntityManager;
39-
import jakarta.persistence.NoResultException;
40-
import jakarta.persistence.PersistenceContext;
41-
import jakarta.persistence.Query;
42-
import jakarta.persistence.TypedQuery;
41+
import jakarta.persistence.*;
42+
import jakarta.persistence.criteria.*;
4343

4444
/**
4545
*
@@ -376,6 +376,133 @@ public FileMetadata findFileMetadataByDatasetVersionIdAndDataFileId(Long dataset
376376
}
377377
}
378378

379+
/**
380+
* Finds the complete history of a file's presence across all dataset versions.
381+
* <p>
382+
* This method returns a {@link VersionedFileMetadata} entry for every version
383+
* of the specified dataset. If a version does not contain the file, the
384+
* {@code fileMetadata} field in the corresponding DTO will be {@code null}.
385+
* It correctly handles file replacements by searching for all files sharing the
386+
* same {@code rootDataFileId}.
387+
*
388+
* @param datasetId The ID of the parent dataset.
389+
* @param dataFile The DataFile entity to find the history for.
390+
* @param canViewUnpublishedVersions A boolean indicating if the user has permission to view non-released versions.
391+
* @param limit (Optional) The maximum number of results to return.
392+
* @param offset (Optional) The starting point of the result list.
393+
* @return A chronologically sorted, paginated list of the file's version history, including versions where the file is absent.
394+
*/
395+
public List<VersionedFileMetadata> findFileMetadataHistory(Long datasetId,
396+
DataFile dataFile,
397+
boolean canViewUnpublishedVersions,
398+
Integer limit,
399+
Integer offset) {
400+
if (dataFile == null) {
401+
return Collections.emptyList();
402+
}
403+
404+
// Query 1: Get the paginated list of relevant DatasetVersions
405+
CriteriaBuilder cb = em.getCriteriaBuilder();
406+
CriteriaQuery<DatasetVersion> versionQuery = cb.createQuery(DatasetVersion.class);
407+
Root<DatasetVersion> versionRoot = versionQuery.from(DatasetVersion.class);
408+
409+
List<Predicate> versionPredicates = new ArrayList<>();
410+
versionPredicates.add(cb.equal(versionRoot.join("dataset").get("id"), datasetId));
411+
if (!canViewUnpublishedVersions) {
412+
versionPredicates.add(versionRoot.get("versionState").in(
413+
VersionState.RELEASED, VersionState.DEACCESSIONED));
414+
}
415+
versionQuery.where(versionPredicates.toArray(new Predicate[0]));
416+
versionQuery.orderBy(
417+
cb.desc(versionRoot.get("versionNumber")),
418+
cb.desc(versionRoot.get("minorVersionNumber"))
419+
);
420+
421+
TypedQuery<DatasetVersion> typedVersionQuery = em.createQuery(versionQuery);
422+
if (limit != null) {
423+
typedVersionQuery.setMaxResults(limit);
424+
}
425+
if (offset != null) {
426+
typedVersionQuery.setFirstResult(offset);
427+
}
428+
List<DatasetVersion> datasetVersions = typedVersionQuery.getResultList();
429+
430+
if (datasetVersions.isEmpty()) {
431+
return Collections.emptyList();
432+
}
433+
434+
// Query 2: Get all FileMetadata for this file's history in this dataset
435+
CriteriaQuery<FileMetadata> fmQuery = cb.createQuery(FileMetadata.class);
436+
Root<FileMetadata> fmRoot = fmQuery.from(FileMetadata.class);
437+
438+
List<Predicate> fmPredicates = new ArrayList<>();
439+
fmPredicates.add(cb.equal(fmRoot.get("datasetVersion").get("dataset").get("id"), datasetId));
440+
441+
// Find the file by its entire lineage
442+
if (dataFile.getRootDataFileId() < 0) {
443+
fmPredicates.add(cb.equal(fmRoot.get("dataFile").get("id"), dataFile.getId()));
444+
} else {
445+
fmPredicates.add(cb.equal(fmRoot.get("dataFile").get("rootDataFileId"), dataFile.getRootDataFileId()));
446+
}
447+
fmQuery.where(fmPredicates.toArray(new Predicate[0]));
448+
449+
List<FileMetadata> fileHistory = em.createQuery(fmQuery).getResultList();
450+
451+
// Combine results
452+
Map<Long, FileMetadata> fmMap = fileHistory.stream()
453+
.collect(Collectors.toMap(
454+
fm -> fm.getDatasetVersion().getId(),
455+
Function.identity()
456+
));
457+
458+
// Create the final list, looking up the FileMetadata for each version
459+
return datasetVersions.stream()
460+
.map(version -> new VersionedFileMetadata(
461+
version,
462+
fmMap.get(version.getId()) // This will be null if no entry exists for that version ID
463+
))
464+
.collect(Collectors.toList());
465+
}
466+
467+
/**
468+
* Finds the FileMetadata for a given file in the version immediately preceding a specified version.
469+
*
470+
* @param fileMetadata The FileMetadata instance from the current version, used to identify the file's lineage.
471+
* @return The FileMetadata from the immediately prior version, or {@code null} if this is the first version of the file.
472+
*/
473+
public FileMetadata getPreviousFileMetadata(FileMetadata fileMetadata) {
474+
if (fileMetadata == null || fileMetadata.getDataFile() == null) {
475+
return null;
476+
}
477+
478+
// 1. Get the ID of the file that was replaced.
479+
Long previousId = fileMetadata.getDataFile().getPreviousDataFileId();
480+
481+
// If there's no previous ID, this is the first version of the file.
482+
if (previousId == null) {
483+
return null;
484+
}
485+
486+
CriteriaBuilder cb = em.getCriteriaBuilder();
487+
CriteriaQuery<FileMetadata> cq = cb.createQuery(FileMetadata.class);
488+
Root<FileMetadata> fileMetadataRoot = cq.from(FileMetadata.class);
489+
490+
// 2. Join FileMetadata to DataFile to access the ID.
491+
Join<FileMetadata, DataFile> dataFileJoin = fileMetadataRoot.join("dataFile");
492+
493+
// 3. Find the FileMetadata whose DataFile ID matches the previousId.
494+
cq.where(cb.equal(dataFileJoin.get("id"), previousId));
495+
496+
// --- Execution ---
497+
TypedQuery<FileMetadata> query = em.createQuery(cq);
498+
try {
499+
return query.getSingleResult();
500+
} catch (NoResultException e) {
501+
// If no result is found, return null.
502+
return null;
503+
}
504+
}
505+
379506
public FileMetadata findMostRecentVersionFileIsIn(DataFile file) {
380507
if (file == null) {
381508
return null;

src/main/java/edu/harvard/iq/dataverse/DatasetVersion.java

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -73,7 +73,11 @@
7373
@NamedQuery(name = "DatasetVersion.findById",
7474
query = "SELECT o FROM DatasetVersion o LEFT JOIN FETCH o.fileMetadatas WHERE o.id=:id"),
7575
@NamedQuery(name = "DatasetVersion.findByDataset",
76-
query = "SELECT o FROM DatasetVersion o WHERE o.dataset.id=:datasetId ORDER BY o.versionNumber DESC, o.minorVersionNumber DESC"),
76+
query = "SELECT o FROM DatasetVersion o WHERE o.dataset.id=:datasetId ORDER BY o.versionNumber DESC, o.minorVersionNumber DESC"),
77+
@NamedQuery(name = "DatasetVersion.findByDesiredStatesAndDataset",
78+
query = "SELECT o FROM DatasetVersion o " +
79+
"WHERE o.dataset.id = :datasetId AND o.versionState IN :states " +
80+
"ORDER BY o.versionNumber DESC, o.minorVersionNumber DESC"),
7781
@NamedQuery(name = "DatasetVersion.findReleasedByDataset",
7882
query = "SELECT o FROM DatasetVersion o WHERE o.dataset.id=:datasetId AND o.versionState=edu.harvard.iq.dataverse.DatasetVersion.VersionState.RELEASED ORDER BY o.versionNumber DESC, o.minorVersionNumber DESC")/*,
7983
@NamedQuery(name = "DatasetVersion.findVersionElements",

0 commit comments

Comments
 (0)