IQSS
diff --git a/‎doc/release-notes/11752-croissant-restricted.md‎
Lines changed: 13 additions & 0 deletions b/‎doc/release-notes/11752-croissant-restricted.md‎
Lines changed: 13 additions & 0 deletions
diff --git a/‎doc/release-notes/11793 - ext. vocab. improvements.md‎
Lines changed: 1 addition & 0 deletions b/‎doc/release-notes/11793 - ext. vocab. improvements.md‎
Lines changed: 1 addition & 0 deletions
diff --git a/‎doc/release-notes/11855-version-summaries-pagination-and-perfomance.md‎
Lines changed: 33 additions & 0 deletions b/‎doc/release-notes/11855-version-summaries-pagination-and-perfomance.md‎
Lines changed: 33 additions & 0 deletions
diff --git a/‎doc/sphinx-guides/source/api/native-api.rst‎
Lines changed: 29 additions & 4 deletions b/‎doc/sphinx-guides/source/api/native-api.rst‎
Lines changed: 29 additions & 4 deletions
diff --git a/‎src/main/java/edu/harvard/iq/dataverse/DataFileServiceBean.java‎
Lines changed: 132 additions & 5 deletions b/‎src/main/java/edu/harvard/iq/dataverse/DataFileServiceBean.java‎
Lines changed: 132 additions & 5 deletions
diff --git a/‎src/main/java/edu/harvard/iq/dataverse/DatasetFieldServiceBean.java‎
Lines changed: 13 additions & 2 deletions b/‎src/main/java/edu/harvard/iq/dataverse/DatasetFieldServiceBean.java‎
Lines changed: 13 additions & 2 deletions
diff --git a/‎src/main/java/edu/harvard/iq/dataverse/DatasetVersion.java‎
Lines changed: 5 additions & 1 deletion b/‎src/main/java/edu/harvard/iq/dataverse/DatasetVersion.java‎
Lines changed: 5 additions & 1 deletion
@@ -0,0 +1,13 @@
+- The optional Croissant exporter has been updated to 0.1.6 to prevent variable names, variable descriptions, and variable types from being exposed for restricted files. See https://github.com/gdcc/exporter-croissant/pull/20 and #11752.
+
+## Upgrade Instructions
+
+### Update Croissant exporter, if enabled, and reexport metadata
+
+If you have enabled the Croissant dataset metadata exporter, you should upgrade to version 0.1.6.
+
+- Stop Payara.
+- Delete the old Croissant exporter jar file. It will be located in the directory defined by the `dataverse.spi.exporters.directory` setting.
+- Download the updated Croissant jar from https://repo1.maven.org/maven2/io/gdcc/export/croissant/ and place it in the same directory.
+- Restart Payara.
+- Run reExportAll.
@@ -0,0 +1 @@
+This version of Dataverse includes extensions of the Dataverse External Vocabulary mechanism (https://guides.dataverse.org/en/latest/admin/metadatacustomization.html#using-external-vocabulary-services) that improve Dataverse's ability to include metadata about vocabulary terms and external identifiers such as ORCID and ROR in it's metadata exports. More information on how to configure external vocabulary scripts to use this functionality can be found at https://github.com/gdcc/dataverse-external-vocab-support/blob/main/docs/readme.md and in the examples in the https://github.com/gdcc/dataverse-external-vocab-support repository.
@@ -0,0 +1,33 @@
+### Pagination for API Version Summaries
+
+We've added pagination support to the following API endpoints:
+
+- File Version Differences: api/files/{id}/versionDifferences
+
+- Dataset Version Summaries: api/datasets/:persistentId/versions/compareSummary
+
+You can now use two new query parameters to control the results:
+
+- **limit**: An integer specifying the maximum number of results to return per page.
+
+- **offset**: An integer specifying the number of results to skip before starting to return items. This is used to
+  navigate to different pages.
+
+### Performance enhancements for API Version Summaries
+
+In addition to adding pagination, we've significantly improved the performance of these endpoints by implementing more
+efficient database queries.
+
+These changes address performance bottlenecks that were previously encountered, especially with datasets or files
+containing a large number of versions.
+
+### Fixes for File Version Summaries API
+
+The implementation for file version summaries was unreliable, leading to exceptions and functional inconsistencies, as
+documented in issue #11561. This functionality has been reviewed and fixed to ensure correctness and stability.
+
+### Related issues and PRs
+
+- https://github.com/IQSS/dataverse/issues/11855
+- https://github.com/IQSS/dataverse/pull/11859
+- https://github.com/IQSS/dataverse/issues/11561
@@ -2143,14 +2143,26 @@ be available to users who have permission to view unpublished drafts. The api to
   export SERVER_URL=https://demo.dataverse.org
   export PERSISTENT_IDENTIFIER=doi:10.5072/FK2/BCCP9Z
 
-  curl -H "X-Dataverse-key: $API_TOKEN" -X PUT "$SERVER_URL/api/datasets/:persistentId/versions/compareSummary?persistentId=$PERSISTENT_IDENTIFIER"
+  curl -H "X-Dataverse-key: $API_TOKEN" -X GET "$SERVER_URL/api/datasets/:persistentId/versions/compareSummary?persistentId=$PERSISTENT_IDENTIFIER"
 
 The fully expanded example above (without environment variables) looks like this:
 
 .. code-block:: bash
 
-  curl -H "X-Dataverse-key: xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" -X PUT "https://demo.dataverse.org/api/datasets/:persistentId/versions/compareSummary?persistentId=doi:10.5072/FK2/BCCP9Z"
+  curl -H "X-Dataverse-key: xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" -X GET "https://demo.dataverse.org/api/datasets/:persistentId/versions/compareSummary?persistentId=doi:10.5072/FK2/BCCP9Z"
 
+You can control pagination of the results using the following optional query parameters.
+
+* ``limit``: The maximum number of version differences to return.
+* ``offset``: The number of version differences to skip from the beginning of the list. Used for retrieving subsequent pages of results.
+
+To aid in pagination the JSON response also includes the total number of rows (totalCount) available.
+
+For example, to get the second page of results, with 2 items per page, you would use ``limit=2`` and ``offset=2`` (skipping the first two results).
+
+.. code-block:: bash
+
+  curl -H "X-Dataverse-key: xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" -X GET "https://demo.dataverse.org/api/datasets/:persistentId/versions/compareSummary?persistentId=doi:10.5072/FK2/BCCP9Z&limit=2&offset=2"
 
 Update Metadata For a Dataset
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
@@ -4362,8 +4374,21 @@ The fully expanded example above (without environment variables) looks like this
 
 .. code-block:: bash
 
-  curl  -X GET "https://demo.dataverse.org/api/files/1234/versionDifferences"
-  curl  -X GET "https://demo.dataverse.org/api/files/:persistentId/versionDifferences?persistentId=doi:10.5072/FK2/J8SJZB"
+  curl -H "X-Dataverse-key: xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" -X GET "https://demo.dataverse.org/api/files/1234/versionDifferences"
+  curl -H "X-Dataverse-key: xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" -X GET "https://demo.dataverse.org/api/files/:persistentId/versionDifferences?persistentId=doi:10.5072/FK2/J8SJZB"
+
+You can control pagination of the results using the following optional query parameters.
+
+* ``limit``: The maximum number of version differences to return.
+* ``offset``: The number of version differences to skip from the beginning of the list. Used for retrieving subsequent pages of results.
+
+To aid in pagination the JSON response also includes the total number of rows (totalCount) available.
+
+For example, to get the second page of results, with 2 items per page, you would use ``limit=2`` and ``offset=2`` (skipping the first two results).
+
+.. code-block:: bash
+
+  curl -H "X-Dataverse-key: xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" -X GET "https://demo.dataverse.org/api/files/1234/versionDifferences?limit=2&offset=2"
 
 Adding Files
 ~~~~~~~~~~~~
 
@@ -28,18 +28,18 @@
 import java.util.Map;
 import java.util.Set;
 import java.util.UUID;
+import java.util.function.Function;
 import java.util.logging.Level;
 import java.util.logging.Logger;
+import java.util.stream.Collectors;
+
 import jakarta.ejb.EJB;
 import jakarta.ejb.Stateless;
 import jakarta.ejb.TransactionAttribute;
 import jakarta.ejb.TransactionAttributeType;
 import jakarta.inject.Named;
-import jakarta.persistence.EntityManager;
-import jakarta.persistence.NoResultException;
-import jakarta.persistence.PersistenceContext;
-import jakarta.persistence.Query;
-import jakarta.persistence.TypedQuery;
+import jakarta.persistence.*;
+import jakarta.persistence.criteria.*;
 
 /**
  *
@@ -376,6 +376,133 @@ public FileMetadata findFileMetadataByDatasetVersionIdAndDataFileId(Long dataset
         }
     }
 
+    /**
+     * Finds the complete history of a file's presence across all dataset versions.
+     * <p>
+     * This method returns a {@link VersionedFileMetadata} entry for every version
+     * of the specified dataset. If a version does not contain the file, the
+     * {@code fileMetadata} field in the corresponding DTO will be {@code null}.
+     * It correctly handles file replacements by searching for all files sharing the
+     * same {@code rootDataFileId}.
+     *
+     * @param datasetId                  The ID of the parent dataset.
+     * @param dataFile                   The DataFile entity to find the history for.
+     * @param canViewUnpublishedVersions A boolean indicating if the user has permission to view non-released versions.
+     * @param limit                      (Optional) The maximum number of results to return.
+     * @param offset                     (Optional) The starting point of the result list.
+     * @return A chronologically sorted, paginated list of the file's version history, including versions where the file is absent.
+     */
+    public List<VersionedFileMetadata> findFileMetadataHistory(Long datasetId,
+                                                               DataFile dataFile,
+                                                               boolean canViewUnpublishedVersions,
+                                                               Integer limit,
+                                                               Integer offset) {
+        if (dataFile == null) {
+            return Collections.emptyList();
+        }
+
+        // Query 1: Get the paginated list of relevant DatasetVersions
+        CriteriaBuilder cb = em.getCriteriaBuilder();
+        CriteriaQuery<DatasetVersion> versionQuery = cb.createQuery(DatasetVersion.class);
+        Root<DatasetVersion> versionRoot = versionQuery.from(DatasetVersion.class);
+
+        List<Predicate> versionPredicates = new ArrayList<>();
+        versionPredicates.add(cb.equal(versionRoot.join("dataset").get("id"), datasetId));
+        if (!canViewUnpublishedVersions) {
+            versionPredicates.add(versionRoot.get("versionState").in(
+                    VersionState.RELEASED, VersionState.DEACCESSIONED));
+        }
+        versionQuery.where(versionPredicates.toArray(new Predicate[0]));
+        versionQuery.orderBy(
+                cb.desc(versionRoot.get("versionNumber")),
+                cb.desc(versionRoot.get("minorVersionNumber"))
+        );
+
+        TypedQuery<DatasetVersion> typedVersionQuery = em.createQuery(versionQuery);
+        if (limit != null) {
+            typedVersionQuery.setMaxResults(limit);
+        }
+        if (offset != null) {
+            typedVersionQuery.setFirstResult(offset);
+        }
+        List<DatasetVersion> datasetVersions = typedVersionQuery.getResultList();
+
+        if (datasetVersions.isEmpty()) {
+            return Collections.emptyList();
+        }
+
+        // Query 2: Get all FileMetadata for this file's history in this dataset
+        CriteriaQuery<FileMetadata> fmQuery = cb.createQuery(FileMetadata.class);
+        Root<FileMetadata> fmRoot = fmQuery.from(FileMetadata.class);
+
+        List<Predicate> fmPredicates = new ArrayList<>();
+        fmPredicates.add(cb.equal(fmRoot.get("datasetVersion").get("dataset").get("id"), datasetId));
+
+        // Find the file by its entire lineage
+        if (dataFile.getRootDataFileId() < 0) {
+            fmPredicates.add(cb.equal(fmRoot.get("dataFile").get("id"), dataFile.getId()));
+        } else {
+            fmPredicates.add(cb.equal(fmRoot.get("dataFile").get("rootDataFileId"), dataFile.getRootDataFileId()));
+        }
+        fmQuery.where(fmPredicates.toArray(new Predicate[0]));
+
+        List<FileMetadata> fileHistory = em.createQuery(fmQuery).getResultList();
+
+        // Combine results
+        Map<Long, FileMetadata> fmMap = fileHistory.stream()
+                .collect(Collectors.toMap(
+                        fm -> fm.getDatasetVersion().getId(),
+                        Function.identity()
+                ));
+
+        // Create the final list, looking up the FileMetadata for each version
+        return datasetVersions.stream()
+                .map(version -> new VersionedFileMetadata(
+                        version,
+                        fmMap.get(version.getId()) // This will be null if no entry exists for that version ID
+                ))
+                .collect(Collectors.toList());
+    }
+
+    /**
+     * Finds the FileMetadata for a given file in the version immediately preceding a specified version.
+     *
+     * @param fileMetadata   The FileMetadata instance from the current version, used to identify the file's lineage.
+     * @return The FileMetadata from the immediately prior version, or {@code null} if this is the first version of the file.
+     */
+    public FileMetadata getPreviousFileMetadata(FileMetadata fileMetadata) {
+        if (fileMetadata == null || fileMetadata.getDataFile() == null) {
+            return null;
+        }
+
+        // 1. Get the ID of the file that was replaced.
+        Long previousId = fileMetadata.getDataFile().getPreviousDataFileId();
+
+        // If there's no previous ID, this is the first version of the file.
+        if (previousId == null) {
+            return null;
+        }
+
+        CriteriaBuilder cb = em.getCriteriaBuilder();
+        CriteriaQuery<FileMetadata> cq = cb.createQuery(FileMetadata.class);
+        Root<FileMetadata> fileMetadataRoot = cq.from(FileMetadata.class);
+
+        // 2. Join FileMetadata to DataFile to access the ID.
+        Join<FileMetadata, DataFile> dataFileJoin = fileMetadataRoot.join("dataFile");
+
+        // 3. Find the FileMetadata whose DataFile ID matches the previousId.
+        cq.where(cb.equal(dataFileJoin.get("id"), previousId));
+
+        // --- Execution ---
+        TypedQuery<FileMetadata> query = em.createQuery(cq);
+        try {
+            return query.getSingleResult();
+        } catch (NoResultException e) {
+            // If no result is found, return null.
+            return null;
+        }
+    }
+
     public FileMetadata findMostRecentVersionFileIsIn(DataFile file) {
         if (file == null) {
             return null;
 
@@ -764,13 +764,24 @@ Object processPathSegment(int index, String[] pathParts, JsonValue curPath, Stri
                                 JsonValue val = jo.get(keyVal[0]);
                                 if (val != null) {
                                     if (val.getValueType().equals(ValueType.STRING)) {
+                                        //Match a string value
                                         if (((JsonString) val).getString().equals(expected)) {
                                             logger.fine("Found: " + jo);
                                             curPath = jo;
                                             return processPathSegment(index + 1, pathParts, curPath, termUri);
                                         }
-                                    } else {
-                                        logger.warning("Expected a string value for " + keyVal[0] + " but found: " + val.getValueType());
+                                    } else if (val.getValueType() == JsonValue.ValueType.ARRAY) {
+                                        // Match one string in an array
+                                        JsonArray jsonArray = (JsonArray) val;
+                                        for (JsonValue arrayVal : jsonArray) {
+                                            if (arrayVal.getValueType() == JsonValue.ValueType.STRING) {
+                                                if (((JsonString) arrayVal).getString().equals(expected)) {
+                                                    logger.fine("Found match in array: " + jo.toString());
+                                                    curPath = jo;
+                                                    return processPathSegment(index + 1, pathParts, curPath, termUri);
+                                                }
+                                            }
+                                        }
                                     }
                                 }
                             }
 
@@ -73,7 +73,11 @@
     @NamedQuery(name = "DatasetVersion.findById", 
                 query = "SELECT o FROM DatasetVersion o LEFT JOIN FETCH o.fileMetadatas WHERE o.id=:id"), 
     @NamedQuery(name = "DatasetVersion.findByDataset",
-                query = "SELECT o FROM DatasetVersion o WHERE o.dataset.id=:datasetId ORDER BY o.versionNumber DESC, o.minorVersionNumber DESC"), 
+                query = "SELECT o FROM DatasetVersion o WHERE o.dataset.id=:datasetId ORDER BY o.versionNumber DESC, o.minorVersionNumber DESC"),
+    @NamedQuery(name = "DatasetVersion.findByDesiredStatesAndDataset",
+            query = "SELECT o FROM DatasetVersion o " +
+                    "WHERE o.dataset.id = :datasetId AND o.versionState IN :states " +
+                    "ORDER BY o.versionNumber DESC, o.minorVersionNumber DESC"),
     @NamedQuery(name = "DatasetVersion.findReleasedByDataset",
                 query = "SELECT o FROM DatasetVersion o WHERE o.dataset.id=:datasetId AND o.versionState=edu.harvard.iq.dataverse.DatasetVersion.VersionState.RELEASED ORDER BY o.versionNumber DESC, o.minorVersionNumber DESC")/*,
     @NamedQuery(name = "DatasetVersion.findVersionElements",
Original file line number	Diff line number	Diff line change
`@@ -0,0 +1 @@`
	`1`	+This version of Dataverse includes extensions of the Dataverse External Vocabulary mechanism (https://guides.dataverse.org/en/latest/admin/metadatacustomization.html#using-external-vocabulary-services) that improve Dataverse's ability to include metadata about vocabulary terms and external identifiers such as ORCID and ROR in it's metadata exports. More information on how to configure external vocabulary scripts to use this functionality can be found at https://github.com/gdcc/dataverse-external-vocab-support/blob/main/docs/readme.md and in the examples in the https://github.com/gdcc/dataverse-external-vocab-support repository.