Skip to content

Commit e6473e3

Browse files
committed
Merge branch 'develop' into 11413-rate-limiting-statistics-api
2 parents 8758e9a + 8599993 commit e6473e3

File tree

19 files changed

+1075
-44
lines changed

19 files changed

+1075
-44
lines changed
Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,4 @@
1+
Added a new API for persistent identifier reconciliation. An unpublished dataset can be updated with a new
2+
pidProvider. If a persistent identifier was already registered when the dataset was registered, this is undone and the
3+
new provider (if changed in the meantime) is used. Note that this change does not affect the storage repository where the old identifier is still
4+
used. See [the guides](https://dataverse-guide--10567.org.readthedocs.build/en/10567/api/native-api.html#reconcile-the-pid-of-a-dataset-if-multiple-pid-providers-are-enabled), #10501, and #10567.

doc/sphinx-guides/source/admin/user-administration.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -112,6 +112,7 @@ This enables additional settings for each user in the notifications tab of their
112112
* ``SUBMITTEDDS`` Submitted for review
113113
* ``WORKFLOW_FAILURE`` External workflow run has failed
114114
* ``WORKFLOW_SUCCESS`` External workflow run has succeeded
115+
* ``PIDRECONCILED`` Dataset persistent identifier changed
115116

116117
After enabling this feature, all notifications are enabled by default, until this is changed by the user.
117118

doc/sphinx-guides/source/api/native-api.rst

Lines changed: 43 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -819,7 +819,7 @@ In particular, the user permissions that this API call checks, returned as boole
819819
820820
curl -H "X-Dataverse-key: $API_TOKEN" -X GET "$SERVER_URL/api/dataverses/$ID/userPermissions"
821821
822-
.. _create-dataset-command:
822+
.. _create-dataset-command:
823823

824824
Create a Dataset in a Dataverse Collection
825825
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
@@ -1551,7 +1551,7 @@ The optional ``excludeFiles`` parameter specifies whether the files should be li
15511551

15521552
The optional ``excludeMetadataBlocks`` parameter specifies whether the metadata blocks should be listed in the output. It defaults to ``false``, preserving backward compatibility. (Note that for a dataset with a large number of versions and/or metadata blocks having the metadata blocks included can dramatically increase the volume of the output).
15531553

1554-
The optional ``offset`` and ``limit`` parameters can be used to specify the range of the versions list to be shown. This can be used to paginate through the list in a dataset with a large number of versions.
1554+
The optional ``offset`` and ``limit`` parameters can be used to specify the range of the versions list to be shown. This can be used to paginate through the list in a dataset with a large number of versions.
15551555

15561556

15571557
Get Version of a Dataset
@@ -2039,13 +2039,13 @@ be available to users who have permission to view unpublished drafts. The api to
20392039
export SERVER_URL=https://demo.dataverse.org
20402040
export PERSISTENT_IDENTIFIER=doi:10.5072/FK2/BCCP9Z
20412041
2042-
curl -H "X-Dataverse-key: $API_TOKEN" -X PUT "$SERVER_URL/api/datasets/:persistentId/versions/compareSummary?persistentId=$PERSISTENT_IDENTIFIER"
2042+
curl -H "X-Dataverse-key: $API_TOKEN" -X PUT "$SERVER_URL/api/datasets/:persistentId/versions/compareSummary?persistentId=$PERSISTENT_IDENTIFIER"
20432043
20442044
The fully expanded example above (without environment variables) looks like this:
20452045

20462046
.. code-block:: bash
20472047
2048-
curl -H "X-Dataverse-key: xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" -X PUT "https://demo.dataverse.org/api/datasets/:persistentId/versions/compareSummary?persistentId=doi:10.5072/FK2/BCCP9Z"
2048+
curl -H "X-Dataverse-key: xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" -X PUT "https://demo.dataverse.org/api/datasets/:persistentId/versions/compareSummary?persistentId=doi:10.5072/FK2/BCCP9Z"
20492049
20502050
20512051
Update Metadata For a Dataset
@@ -3092,7 +3092,7 @@ The fully expanded example above (without environment variables) looks like this
30923092
.. code-block:: bash
30933093
30943094
curl "https://demo.dataverse.org/api/datasets/:persistentId/makeDataCount/citations?persistentId=10.5072/FK2/J8SJZB"
3095-
3095+
30963096
Delete Unpublished Dataset
30973097
~~~~~~~~~~~~~~~~~~~~~~~~~~
30983098

@@ -3378,7 +3378,7 @@ Usage example:
33783378
.. code-block:: bash
33793379
33803380
curl -H "Accept:application/json" "$SERVER_URL/api/datasets/:persistentId/versions/$VERSION/citation?persistentId=$PERSISTENT_IDENTIFIER&includeDeaccessioned=true"
3381-
3381+
33823382
Get Citation In Other Formats
33833383
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
33843384

@@ -3406,7 +3406,7 @@ Usage example:
34063406
.. code-block:: bash
34073407
34083408
curl "$SERVER_URL/api/datasets/:persistentId/versions/$VERSION/citation/$FORMAT?persistentId=$PERSISTENT_IDENTIFIER&includeDeaccessioned=true"
3409-
3409+
34103410
34113411
Get Citation by Preview URL Token
34123412
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
@@ -3564,6 +3564,32 @@ The API can also be used to reset the dataset to use the default/inherited value
35643564
35653565
The default will always be the same provider as for the dataset PID if that provider can generate new PIDs, and will be the PID Provider set for the collection or the global default otherwise.
35663566
3567+
Reconcile the PID of a Dataset (If Multiple PID Providers Are Enabled)
3568+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
3569+
3570+
Dataverse supports configuration with multiple Persistent Identifier (PID) providers (refer to the :ref:`pids-configuration` section for further details).
3571+
This API endpoint assigns new PIDs to a draft Dataset - and, if applicable, to its Datafiles (cf. :ref:`:AllowEnablingFilePIDsPerCollection <:AllowEnablingFilePIDsPerCollection>`) —
3572+
using the currently configured PIDProvider. In cases where the active PIDProvider differs from the one initially used to mint the dataset’s original PID, this API call facilitates reconciliation.
3573+
It ensures consistency by reassigning a PID that aligns with the current provider’s specifications. More specifically, for a draft dataset,
3574+
a new PID is minted through the active provider, and the previously assigned PID is preserved as an alternativePersistentIdentifier.
3575+
The same procedure applies to associated datafiles, provided that DataFile PIDs are enabled. (Note: If the currently configured PID provider is identical to the one originally used, this API call has no effect. )
3576+
3577+
The API is restricted to superusers and to datasets that have not already been published. (It does not make any changes to any PID Provider.)
3578+
Warning: This change does not affect the storage repository, where the old PID is still
3579+
used in the name of where files are stored for the dataset. If you want to remove the PID from the name used in storage, you could manually
3580+
move the files offline and remove the old identifier from the database (by setting storagelocationdesignator to false for the old identifier
3581+
in the alternativepersistentidentifier table). However, this step is not required for Dataverse to function correctly.
3582+
3583+
To reconcile the PID of a dataset:
3584+
3585+
.. code-block:: bash
3586+
3587+
export API_TOKEN=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx
3588+
export SERVER_URL=https://demo.dataverse.org
3589+
export PERSISTENT_IDENTIFIER=doi:10.5072/FK2/YD5QDG
3590+
3591+
curl -X PUT -H "X-Dataverse-key:$API_TOKEN" "$SERVER_URL/api/datasets/:persistentId/pidReconcile?persistentId=$PERSISTENT_IDENTIFIER"
3592+
35673593
.. _api-dataset-types:
35683594
35693595
Dataset Types
@@ -5959,21 +5985,21 @@ To create a harvesting client you must supply a JSON file that describes the con
59595985
- ``dataverseAlias``: The alias of an existing collection where harvested datasets will be deposited
59605986
- ``harvestUrl``: The URL of the remote OAI archive
59615987
- ``archiveUrl``: The URL of the remote archive that will be used in the redirect links pointing back to the archival locations of the harvested records. It may or may not be on the same server as the harvestUrl above. If this OAI archive is another Dataverse installation, it will be the same URL as harvestUrl minus the "/oai". For example: https://demo.dataverse.org/ vs. https://demo.dataverse.org/oai
5962-
- ``metadataFormat``: A supported metadata format. As of writing this the supported formats are "oai_dc", "oai_ddi" and "dataverse_json".
5988+
- ``metadataFormat``: A supported metadata format. As of writing this the supported formats are "oai_dc", "oai_ddi" and "dataverse_json".
59635989
59645990
The following optional fields are supported:
59655991
59665992
- ``sourceName``: When ``index-harvested-metadata-source`` is enabled (see :ref:`feature-flags`), sourceName will override the nickname in the Metadata Source facet. It can be used to group the content from many harvesting clients under the same name.
59675993
- ``archiveDescription``: What the name suggests. If not supplied, will default to "This Dataset is harvested from our partners. Clicking the link will take you directly to the archival source of the data."
5968-
- ``set``: The OAI set on the remote server. If not supplied, will default to none, i.e., "harvest everything". (Note: see the note below on using sets when harvesting from DataCite; this is new as of v6.6).
5994+
- ``set``: The OAI set on the remote server. If not supplied, will default to none, i.e., "harvest everything". (Note: see the note below on using sets when harvesting from DataCite; this is new as of v6.6).
59695995
- ``style``: Defaults to "default" - a generic OAI archive. (Make sure to use "dataverse" when configuring harvesting from another Dataverse installation).
5970-
- ``schedule``: Defaults to "none" (not scheduled). Two formats are supported, for weekly- and daily-scheduled harvests; examples: ``Weekly, Sat 5 AM``; ``Daily, 11 PM``. Note that if a schedule definition is not formatted exactly as described here, it will be ignored silently and the client will be left unscheduled.
5971-
- ``customHeaders``: This can be used to configure this client with a specific HTTP header that will be added to every OAI request. This is to accommodate a use case where the remote server requires this header to supply some form of a token in order to offer some content not available to other clients. See the example below. Multiple headers can be supplied separated by `\\n` - actual "backslash" and "n" characters, not a single "new line" character.
5996+
- ``schedule``: Defaults to "none" (not scheduled). Two formats are supported, for weekly- and daily-scheduled harvests; examples: ``Weekly, Sat 5 AM``; ``Daily, 11 PM``. Note that if a schedule definition is not formatted exactly as described here, it will be ignored silently and the client will be left unscheduled.
5997+
- ``customHeaders``: This can be used to configure this client with a specific HTTP header that will be added to every OAI request. This is to accommodate a use case where the remote server requires this header to supply some form of a token in order to offer some content not available to other clients. See the example below. Multiple headers can be supplied separated by `\\n` - actual "backslash" and "n" characters, not a single "new line" character.
59725998
- ``allowHarvestingMissingCVV``: Flag to allow datasets to be harvested with Controlled Vocabulary Values that existed in the originating Dataverse Project but are not in the harvesting Dataverse Project. (Default is false). Currently only settable using API.
59735999
- ``useOaiIdentifiersAsPids``: Defaults to false; if set to true, the harvester will attempt to use the identifier from the OAI-PMH record header as the **first choice** for the persistent id of the harvested dataset. When set to false, Dataverse will still attempt to use this identifier, but only if none of the ``<dc:identifier>`` entries in the OAI_DC record contain a valid persistent id (this is new as of v6.5).
59746000
- ``useListRecords``: Defaults to false; if set to true, the harvester will attempt to retrieve multiple records in a single pass using the OAI-PMH verb ListRecords. By default, our harvester relies on the combination of ListIdentifiers followed by multiple GetRecord calls for each individual record. Note that this option is required when configuring harvesting from DataCite. (this is new as of v6.6).
59756001
5976-
Generally, the API will accept the output of the GET version of the API for an existing client as valid input, but some fields will be ignored.
6002+
Generally, the API will accept the output of the GET version of the API for an existing client as valid input, but some fields will be ignored.
59776003
59786004
You can download this :download:`harvesting-client.json <../_static/api/harvesting-client.json>` file to use as a starting point.
59796005
@@ -6113,7 +6139,7 @@ must be encoded as follows:
61136139
61146140
.. code-block:: bash
61156141
6116-
echo "prefix:10.17603%20AND%20(types.resourceType:Report*%20OR%20types.resourceType:Mission*)" | base64
6142+
echo "prefix:10.17603%20AND%20(types.resourceType:Report*%20OR%20types.resourceType:Mission*)" | base64
61176143
cHJlZml4OjEwLjE3NjAzJTIwQU5EJTIwKHR5cGVzLnJlc291cmNlVHlwZTpSZXBvcnQqJTIwT1IlMjB0eXBlcy5yZXNvdXJjZVR5cGU6TWlzc2lvbiopCg==
61186144
61196145
@@ -6308,7 +6334,7 @@ This API endpoint provides a list of feature flags and "enabled" or "disabled" f
63086334
.. code-block:: bash
63096335
63106336
export SERVER_URL=http://localhost:8080
6311-
6337+
63126338
curl "$SERVER_URL/api/admin/featureFlags"
63136339
63146340
The fully expanded example above (without environment variables) looks like this:
@@ -6330,15 +6356,15 @@ This endpoint reports "enabled" as true for false for a single feature flag. (Fo
63306356
63316357
export SERVER_URL=http://localhost:8080
63326358
export FLAG=DATASET_TYPES
6333-
6359+
63346360
curl "$SERVER_URL/api/admin/featureFlags/$FLAG"
63356361
63366362
The fully expanded example above (without environment variables) looks like this:
63376363
63386364
.. code-block:: bash
63396365
63406366
curl "http://localhost:8080/api/admin/featureFlags/DATASET_TYPES"
6341-
6367+
63426368
Manage Banner Messages
63436369
~~~~~~~~~~~~~~~~~~~~~~
63446370
@@ -7261,7 +7287,7 @@ Superusers can add a new license by posting a JSON file adapted from this exampl
72617287
Licenses must have a "name" and "uri" and may have the following optional fields: "shortDescription", "iconUri", "rightsIdentifier", "rightsIdentifierScheme", "schemeUri", "languageCode", "active", "sortOrder".
72627288
The "name" and "uri" are used to display the license in the user interface, with "shortDescription" and "iconUri" being used to enhance the display if available.
72637289
The "rightsIdentifier", "rightsIdentifierScheme", and "schemeUri" should be added if the license is available from https://spdx.org . "languageCode" should be sent if the language is not in English ("en"). "active" is a boolean indicating whether the license should be shown to users as an option. "sortOrder" is a numeric value - licenses are shown in the relative numeric order of this value.
7264-
7290+
72657291
.. code-block:: bash
72667292
72677293
export API_TOKEN=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx

pom.xml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -29,7 +29,7 @@
2929
<reload4j.version>1.2.18.4</reload4j.version>
3030
<flyway.version>10.19.0</flyway.version>
3131
<jhove.version>1.20.1</jhove.version>
32-
<poi.version>5.2.5</poi.version>
32+
<poi.version>5.4.0</poi.version>
3333
<tika.version>2.9.2</tika.version>
3434
<netcdf.version>5.5.3</netcdf.version>
3535

src/main/java/edu/harvard/iq/dataverse/AlternativePersistentIdentifier.java

Lines changed: 7 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -3,19 +3,18 @@
33

44
import java.io.Serializable;
55
import java.util.Date;
6-
import jakarta.persistence.Entity;
7-
import jakarta.persistence.GeneratedValue;
8-
import jakarta.persistence.GenerationType;
9-
import jakarta.persistence.Id;
10-
import jakarta.persistence.JoinColumn;
11-
import jakarta.persistence.ManyToOne;
12-
import jakarta.persistence.Temporal;
13-
import jakarta.persistence.TemporalType;
6+
7+
import jakarta.persistence.*;
148

159
/**
1610
*
1711
* @author skraffmi
1812
*/
13+
@NamedQueries({
14+
@NamedQuery(name = "AlternativePersistentIdentifier.findByProtocolIdentifierAuthority",
15+
query = "SELECT o.id FROM AlternativePersistentIdentifier o WHERE o.identifier=:identifier and o.authority=:authority and o.protocol=:protocol")
16+
}
17+
)
1918
@Entity
2019
public class AlternativePersistentIdentifier implements Serializable {
2120

src/main/java/edu/harvard/iq/dataverse/DvObject.java

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -505,7 +505,6 @@ public void setStorageQuota(StorageQuota storageQuota) {
505505
* @return {@code true} iff {@code other} is {@code this} or below {@code this} in the containment hierarchy.
506506
*/
507507
public abstract boolean isAncestorOf( DvObject other );
508-
509508

510509
@OneToMany(mappedBy = "definitionPoint",cascade={ CascadeType.REMOVE, CascadeType.MERGE,CascadeType.PERSIST}, orphanRemoval=true)
511510
List<RoleAssignment> roleAssignments;

src/main/java/edu/harvard/iq/dataverse/DvObjectServiceBean.java

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -195,6 +195,15 @@ public boolean isGlobalIdLocallyUnique(GlobalId globalId) {
195195
.getResultList().isEmpty();
196196
}
197197

198+
public boolean isGlobalIdLocallyUniqueAlternativeIds(GlobalId globalId) {
199+
return em.createNamedQuery("AlternativePersistentIdentifier.findByProtocolIdentifierAuthority")
200+
.setParameter("identifier", globalId.getIdentifier())
201+
.setParameter("authority", globalId.getAuthority())
202+
.setParameter("protocol", globalId.getProtocol())
203+
.getResultList().isEmpty();
204+
}
205+
206+
198207
public DvObject updateContentIndexTime(DvObject dvObject) {
199208
/**
200209
* @todo to avoid a possible OptimisticLockException, should we merge

src/main/java/edu/harvard/iq/dataverse/MailServiceBean.java

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -572,6 +572,11 @@ public String getMessageTextBasedOnNotification(UserNotification userNotificatio
572572
String[] paramArrayStatus = {version.getDataset().getDisplayName(), (version.getExternalStatusLabel()==null) ? "<none>" : DatasetUtil.getLocaleExternalStatus(version.getExternalStatusLabel())};
573573
messageText += MessageFormat.format(pattern, paramArrayStatus);
574574
return messageText;
575+
case PIDRECONCILED:
576+
version = (DatasetVersion) targetObject;
577+
pattern = BundleUtil.getStringFromBundle("notification.email.pid.reconciled");
578+
messageText += MessageFormat.format(pattern, new String[] {version.getDataset().getDisplayName(), version.getDataset().getGlobalId().asString()});
579+
return messageText;
575580
case CREATEACC:
576581
String accountCreatedMessage = BundleUtil.getStringFromBundle("notification.email.welcome", Arrays.asList(
577582
BrandingUtil.getInstallationBrandName(),
@@ -777,6 +782,7 @@ public Object getObjectOfNotification (UserNotification userNotification){
777782
case RETURNEDDS:
778783
case WORKFLOW_SUCCESS:
779784
case WORKFLOW_FAILURE:
785+
case PIDRECONCILED:
780786
case STATUSUPDATED:
781787
return versionService.find(userNotification.getObjectId());
782788
case CREATEACC:

src/main/java/edu/harvard/iq/dataverse/UserNotification.java

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -39,8 +39,8 @@ public enum Type {
3939
CHECKSUMIMPORT, CHECKSUMFAIL, CONFIRMEMAIL, APIGENERATED, INGESTCOMPLETED, INGESTCOMPLETEDWITHERRORS,
4040
PUBLISHFAILED_PIDREG, WORKFLOW_SUCCESS, WORKFLOW_FAILURE, STATUSUPDATED, DATASETCREATED, DATASETMENTIONED,
4141
GLOBUSUPLOADCOMPLETED, GLOBUSUPLOADCOMPLETEDWITHERRORS,
42-
GLOBUSDOWNLOADCOMPLETED, GLOBUSDOWNLOADCOMPLETEDWITHERRORS, REQUESTEDFILEACCESS,
43-
GLOBUSUPLOADREMOTEFAILURE, GLOBUSUPLOADLOCALFAILURE;
42+
GLOBUSDOWNLOADCOMPLETED, GLOBUSDOWNLOADCOMPLETEDWITHERRORS, REQUESTEDFILEACCESS,
43+
GLOBUSUPLOADREMOTEFAILURE, GLOBUSUPLOADLOCALFAILURE, PIDRECONCILED;
4444

4545
public String getDescription() {
4646
return BundleUtil.getStringFromBundle("notification.typeDescription." + this.name());

src/main/java/edu/harvard/iq/dataverse/api/Datasets.java

Lines changed: 30 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -5153,7 +5153,36 @@ public Response getCanDownloadAtLeastOneFile(@Context ContainerRequestContext cr
51535153
return ok(permissionService.canDownloadAtLeastOneFile(req, datasetVersion));
51545154
}, getRequestUser(crc));
51555155
}
5156-
5156+
5157+
@PUT
5158+
@AuthRequired
5159+
@Path("{identifier}/pidReconcile")
5160+
public Response reconcilePid(@Context ContainerRequestContext crc, @PathParam("identifier") String datasetId) throws WrappedResponse {
5161+
5162+
// Superuser-only:
5163+
AuthenticatedUser user;
5164+
try {
5165+
user = getRequestAuthenticatedUserOrDie(crc);
5166+
} catch (WrappedResponse ex) {
5167+
return error(Response.Status.UNAUTHORIZED, "Authentication is required.");
5168+
}
5169+
if (!user.isSuperuser()) {
5170+
return error(Response.Status.FORBIDDEN, "Superusers only.");
5171+
}
5172+
5173+
Dataset dataset;
5174+
PidProvider pidProvider;
5175+
try {
5176+
dataset = findDatasetOrDie(datasetId);
5177+
} catch (WrappedResponse ex) {
5178+
return error(Response.Status.NOT_FOUND, "No such dataset");
5179+
}
5180+
return response(req -> {
5181+
execCommand(new ReconcileDatasetPidCommand(req, dataset, dataset.getEffectivePidGenerator()));
5182+
return ok(dataset.getGlobalId().toString());
5183+
}, getRequestUser(crc));
5184+
5185+
}
51575186
/**
51585187
* Get the PidProvider that will be used for generating new DOIs in this dataset
51595188
*

0 commit comments

Comments
 (0)