Skip to content

Commit 21f69e3

Browse files
committed
Merge branch 'develop' into 11546-pid-separator
2 parents 17b0e2c + bbc24e9 commit 21f69e3

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

49 files changed

+1866
-1230
lines changed

conf/solr/schema.xml

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -157,7 +157,8 @@
157157
<field name="dvSubject" type="string" stored="true" indexed="true" multiValued="true"/>
158158

159159
<field name="publicationStatus" type="string" stored="true" indexed="true" multiValued="true"/>
160-
<field name="externalStatus" type="string" stored="true" indexed="true" multiValued="false"/>
160+
<field name="curationStatus" type="string" stored="true" indexed="true" multiValued="false"/>
161+
<field name="curationStatusCreateTime" type="pdate" indexed="true" stored="true"/>
161162
<field name="embargoEndDate" type="plong" stored="true" indexed="true" multiValued="false"/>
162163
<field name="retentionEndDate" type="plong" stored="true" indexed="true" multiValued="false"/>
163164

Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,13 @@
1+
## Upgrade to AWS SDK v2 (for S3), v1 EOL in December 2025
2+
3+
To support S3 storage, Dataverse uses the AWS SDK. We have upgraded to v2 of this SDK because v1 reaches End Of Life (EOL) in [December 2025](https://aws.amazon.com/fr/blogs/developer/announcing-end-of-support-for-aws-sdk-for-java-v1-x-on-december-31-2025/).
4+
5+
As part of the upgrade, the payload-signing setting for S3 stores (`dataverse.files.<id>.payload-signing`) has been removed because it is no longer necessary. With the updated SDK, a payload signature will automatically be sent when required (and not sent when not required).
6+
7+
Dataverse developers should note that LocalStack is used to test S3 and older versions appear to be incompatible. The development environment has been upgraded to LocalStack v2.3.2 to v4.2.0, which seems to work fine.
8+
9+
See also #11073 and #11360.
10+
11+
### Settings Removed
12+
13+
- `dataverse.files.<id>.payload-signing`
Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,14 @@
1+
The External/Curation Status Label mechanism has been enhanced:
2+
3+
- adding tracking of who creates the status label and when,
4+
- keeping a history of past statuses
5+
- updating the CSV report to include the creation time and assigner of a status
6+
- updating the getCurationStatus api call to return a JSON object for the status with label, assigner, and create time
7+
- adding an includeHistory query param for these API calls to allow seeing prior statuses
8+
- adding a facet to allow filtering by curation status (for users able to set them)
9+
- adding the creation time to solr as a pdate to support search by time period, e.g. current status set prior to a give date
10+
- standardizing the language around 'curation status' vs 'external status'
11+
- adding a 'curation-status' class to displayed labels to allow styling
12+
- adding a dataverse.ui.show-curation-status-to-all feature flag that allows users who can see a draft but not publish it to also view the curation status
13+
14+
Due to changes in the solr schema, updating the solr schema and reindexing is required. Background reindexing should be OK.

doc/sphinx-guides/source/api/changelog.rst

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -11,6 +11,9 @@ v6.7
1111
----
1212

1313
- An undocumented :doc:`search` parameter called "show_my_data" has been removed. It was never exercised by tests and is believed to be unused. API users should use the :ref:`api-mydata` API instead.
14+
- /api/datasets/{id}/curationStatus API now includes a JSON object with curation label, createtime, and assigner rather than a string 'label' and it supports a new boolean includeHistory parameter (default false) that returns a JSON array of statuses
15+
- /api/datasets/{id}/listCurationStates includes new columns "Status Set Time" and "Status Set By" columns listing the time the current status was applied and by whom. It also supports the boolean includeHistory parameter.
16+
- Due to updates in libraries used by Dataverse, XML serialization may have changed slightly with respect to whether self-closing tags are used for empty elements. This primiarily affects XML-based metadata exports. The XML structure of the export itself has not changed, so this is only an incompatibility if you are not using an XML parser.
1417

1518
v6.6
1619
----

doc/sphinx-guides/source/api/curation-labels.rst

Lines changed: 27 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -1,16 +1,19 @@
1-
Dataset Curation Label API
2-
==========================
1+
Dataset Curation Status API
2+
===========================
33

44
When the :ref:`:AllowedCurationLabels <:AllowedCurationLabels>` setting has been used to define Curation Labels, this API can be used to set these labels on draft datasets.
55
Superusers can define which set of labels are allowed for a given datasets in a collection/an individual dataset using the api described in the :doc:`/admin/dataverses-datasets` section.
66
The API here can be used by curators/those who have permission to publish the dataset to get/set/change/delete the label currently assigned to a draft dataset.
7+
If the :ref:`dataverse.ui.show-curation-status-to-all` flag is enabled, users who can see the draft dataset version can use the get API call.
78

89
This functionality is intended as a mechanism to integrate the Dataverse software with an external curation process/application: it is a way to make the state of a draft dataset,
910
as defined in the external process, visible within Dataverse. These labels have no other effect in Dataverse and are only visible to curators/those with permission to publish the dataset.
1011
Any curation label assigned to a draft dataset will be removed upon publication.
12+
13+
Dataverse tracks the Curation Label as well as when it was assigned and by whom. It also keeps track of the history of prior assignments.
1114

12-
Get a Draft Dataset's Curation Label
13-
------------------------------------
15+
Get a Draft Dataset's Curation Status
16+
-------------------------------------
1417

1518
.. code-block:: bash
1619
@@ -27,8 +30,13 @@ Get a Draft Dataset's Curation Label
2730
2831
curl -H X-Dataverse-key:$API_TOKEN "$SERVER_URL/api/datasets/:persistentId/curationStatus?persistentId=$DATASET_PID"
2932
30-
You should expect a 200 ("OK") response and the draft dataset's curation status label contained in a JSON 'data' object.
33+
You should expect a 200 ("OK") response and the draft dataset's curation status as a JSON object contained in a JSON 'data' object. The status will include a 'label','createTime', and the 'assigner'.
34+
35+
If the optional includeHistory query parameter is set to true, the responses 'data' entry will be a JSON array of curation status objects
36+
37+
curl -H X-Dataverse-key:$API_TOKEN "$SERVER_URL/api/datasets/:persistentId/curationStatus?persistentId=$DATASET_PID&includeHistory=true"
3138

39+
For draft datasets that were created prior to v6.7, it is possible that curation status objects will have no createTime or assigner.
3240

3341
Set a Draft Dataset's Curation Label
3442
------------------------------------
@@ -53,6 +61,8 @@ To add a curation label for a draft Dataset, specify the Dataset ID (DATASET_ID)
5361
5462
You should expect a 200 ("OK") response indicating that the label has been set. 403/Forbidden and 400/Bad Request responses are also possible, i.e. if you don't have permission to make this change or are trying to add a label that isn't in the allowed set or to add a label to a dataset with no draft version.
5563

64+
Note that Dataverse will add the current time as the createTime and the user as the 'assigner' of the label.
65+
5666

5767
Delete a Draft Dataset's Curation Label
5868
---------------------------------------
@@ -98,7 +108,7 @@ You should expect a 200 ("OK") response with a comma-separated list of allowed l
98108
Get a Report on the Curation Status of All Datasets
99109
---------------------------------------------------
100110

101-
To get a CSV file listing the curation label assigned to each Dataset with a draft version, along with the creation and last modification dates, and list of those with permissions to publish the version.
111+
To get a CSV file listing the curation statuses assigned to each Dataset with a draft version, along with the creation and last modification dates, and list of those with permissions to publish the version.
102112

103113
This API call is restricted to superusers.
104114

@@ -112,3 +122,14 @@ This API call is restricted to superusers.
112122
curl -H X-Dataverse-key:$API_TOKEN "$SERVER_URL/api/datasets/listCurationStates"
113123
114124
You should expect a 200 ("OK") response with a CSV formatted response.
125+
126+
The CSV response includes the following columns in order:
127+
#. Dataset Title (as a hyperlink to the dataset page)
128+
#. Creation Date of the draft dataset version
129+
#. Latest Modification Date of the draft dataset version
130+
#. Assigned curation status or '<none>' if no curation status is assigned but was previously, null if no curation state has every been set.
131+
#. Time when the curation status was applied to the draft dataset version
132+
#. The user who assigned this curation status
133+
#. (and beyond): Users (comma separated list) with the Roles (column headings) that can publish datasets and therefore see/set curation status
134+
When includeHistory is true, multiple rows may be present for each dataset, showing the full history of curation statuses.
135+

doc/sphinx-guides/source/installation/config.rst

Lines changed: 18 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1321,7 +1321,6 @@ List of S3 Storage Options
13211321
dataverse.files.<id>.profile <?> Allows the use of AWS profiles for storage spanning multiple AWS accounts. (none)
13221322
dataverse.files.<id>.proxy-url <?> URL of a proxy protecting the S3 store. Optional. (none)
13231323
dataverse.files.<id>.path-style-access ``true``/``false`` Use path style buckets instead of subdomains. Optional. ``false``
1324-
dataverse.files.<id>.payload-signing ``true``/``false`` Enable payload signing. Optional ``false``
13251324
dataverse.files.<id>.chunked-encoding ``true``/``false`` Disable chunked encoding. Optional ``true``
13261325
dataverse.files.<id>.connection-pool-size <?> The maximum number of open connections to the S3 server ``256``
13271326
dataverse.files.<id>.disable-tagging ``true``/``false`` Do not place the ``temp`` tag when redirecting the upload to the S3 server. ``false``
@@ -1370,12 +1369,12 @@ Reported Working S3-Compatible Storage
13701369
possibly slow) https://play.minio.io:9000 service.
13711370

13721371
`StorJ Object Store <https://www.storj.io>`_
1373-
StorJ is a distributed object store that can be configured with an S3 gateway. Per the S3 Storage instructions above, you'll first set up the StorJ S3 store by defining the id, type, and label. After following the general installation, set the following configurations to use a StorJ object store: ``dataverse.files.<id>.payload-signing=true`` and ``dataverse.files.<id>.chunked-encoding=false``. For step-by-step instructions see https://docs.storj.io/dcs/how-tos/dataverse-integration-guide/
1372+
StorJ is a distributed object store that can be configured with an S3 gateway. Per the S3 Storage instructions above, you'll first set up the StorJ S3 store by defining the id, type, and label. After following the general installation, set the following configuration to use a StorJ object store: ``dataverse.files.<id>.chunked-encoding=false``. For step-by-step instructions see https://docs.storj.io/dcs/how-tos/dataverse-integration-guide/
13741373

13751374
Note that for direct uploads and downloads, Dataverse redirects to the proxy-url but presigns the urls based on the ``dataverse.files.<id>.custom-endpoint-url``. Also, note that if you choose to enable ``dataverse.files.<id>.download-redirect`` the S3 URLs expire after 60 minutes by default. You can change that minute value to reflect a timeout value that’s more appropriate by using ``dataverse.files.<id>.url-expiration-minutes``.
13761375

13771376
`Surf Object Store v2019-10-30 <https://www.surf.nl/en>`_
1378-
Set ``dataverse.files.<id>.payload-signing=true``, ``dataverse.files.<id>.chunked-encoding=false`` and ``dataverse.files.<id>.path-style-request=true`` to use Surf Object
1377+
Set ``dataverse.files.<id>.chunked-encoding=false`` and ``dataverse.files.<id>.path-style-request=true`` to use Surf Object
13791378
Store. You will need the Swift client (documented at <http://doc.swift.surfsara.nl/en/latest/Pages/Clients/s3cred.html>) to create the access key and secret key for the S3 interface.
13801379

13811380
Note that the ``dataverse.files.<id>.proxy-url`` setting can be used in installations where the object store is proxied, but it should be considered an advanced option that will require significant expertise to properly configure.
@@ -2265,7 +2264,7 @@ The S3 Archiver defines one custom setting, a required :S3ArchiverConfig. It can
22652264

22662265
The credentials for your S3 account, can be stored in a profile in a standard credentials file (e.g. ~/.aws/credentials) referenced via "profile" key in the :S3ArchiverConfig setting (will default to the default entry), or can via MicroProfile settings as described for S3 stores (dataverse.s3archiver.access-key and dataverse.s3archiver.secret-key)
22672266

2268-
The :S3ArchiverConfig setting is a JSON object that must include an "s3_bucket_name" and may include additional S3-related parameters as described for S3 Stores, including "profile", "connection-pool-size","custom-endpoint-url", "custom-endpoint-region", "path-style-access", "payload-signing", and "chunked-encoding".
2267+
The :S3ArchiverConfig setting is a JSON object that must include an "s3_bucket_name" and may include additional S3-related parameters as described for S3 Stores, including "profile", "connection-pool-size","custom-endpoint-url", "custom-endpoint-region", "path-style-access", and "chunked-encoding".
22692268

22702269
\:S3ArchiverConfig - minimally includes the name of the bucket to use. For example:
22712270

@@ -3270,6 +3269,21 @@ Defaults to ``true``.
32703269
Can also be set via any `supported MicroProfile Config API source`_, e.g. the environment variable
32713270
``DATAVERSE_API_SHOW_LABEL_FOR_INCOMPLETE_WHEN_PUBLISHED``. Will accept ``[tT][rR][uU][eE]|1|[oO][nN]`` as "true" expressions.
32723271

3272+
.. _dataverse.ui.show-curation-status-to-all:
3273+
3274+
dataverse.ui.show-curation-status-to-all
3275+
++++++++++++++++++++++++++++++++++++++++
3276+
3277+
By default the curation status assigned to a draft dataset versioncan only be seen by those who can publish it. When this flag is true, anyone who can see the draft dataset can see the assigned status.
3278+
These users will also get notifications/emails about changes to the status.
3279+
See :ref:`:AllowedCurationLabels <:AllowedCurationLabels>` and the :doc:`/admin/dataverses-datasets` section for more information about curation status.
3280+
3281+
Defaults to ``false``.
3282+
3283+
Can also be set via any `supported MicroProfile Config API source`_, e.g. the environment variable
3284+
``DATAVERSE_API_SHOW_CURATION_STATUS_TO_ALL``. Will accept ``[tT][rR][uU][eE]|1|[oO][nN]`` as "true" expressions.
3285+
3286+
32733287
.. _dataverse.signposting.level1-author-limit:
32743288

32753289
dataverse.signposting.level1-author-limit

docker-compose-dev.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -209,7 +209,7 @@ services:
209209
dev_localstack:
210210
container_name: "dev_localstack"
211211
hostname: "localstack"
212-
image: localstack/localstack:2.3.2
212+
image: localstack/localstack:4.2.0
213213
restart: on-failure
214214
ports:
215215
- "127.0.0.1:4566:4566"

modules/dataverse-parent/pom.xml

Lines changed: 4 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -32,12 +32,13 @@
3232
<scope>import</scope>
3333
</dependency>
3434
<dependency>
35-
<groupId>com.amazonaws</groupId>
36-
<artifactId>aws-java-sdk-bom</artifactId>
35+
<groupId>software.amazon.awssdk</groupId>
36+
<artifactId>bom</artifactId>
3737
<version>${aws.version}</version>
3838
<type>pom</type>
3939
<scope>import</scope>
4040
</dependency>
41+
4142
<dependency>
4243
<groupId>com.google.cloud</groupId>
4344
<artifactId>libraries-bom</artifactId>
@@ -151,7 +152,7 @@
151152
<payara.version>6.2025.3</payara.version>
152153
<postgresql.version>42.7.7</postgresql.version>
153154
<solr.version>9.8.0</solr.version>
154-
<aws.version>1.12.748</aws.version>
155+
<aws.version>2.31.3</aws.version>
155156
<google.library.version>26.30.0</google.library.version>
156157

157158
<!-- Basic libs, logging -->

pom.xml

Lines changed: 29 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -77,6 +77,18 @@
7777
<groupId>org.apache.geronimo.specs</groupId>
7878
<artifactId>geronimo-javamail_1.4_spec</artifactId>
7979
</exclusion>
80+
<exclusion>
81+
<groupId>org.apache.geronimo.specs</groupId>
82+
<artifactId>geronimo-stax-api_1.0_spec</artifactId>
83+
</exclusion>
84+
<exclusion>
85+
<groupId>org.codehaus.woodstox</groupId>
86+
<artifactId>wstx-asl</artifactId>
87+
</exclusion>
88+
<exclusion>
89+
<groupId>org.codehaus.woodstox</groupId>
90+
<artifactId>woodstox-core-asl</artifactId>
91+
</exclusion>
8092
</exclusions>
8193
</dependency>
8294
<!-- Dependency for Apache Abdera and Apache Tika. Tika needs newer version. -->
@@ -167,10 +179,24 @@
167179
</exclusion>
168180
</exclusions>
169181
</dependency>
170-
171182
<dependency>
172-
<groupId>com.amazonaws</groupId>
173-
<artifactId>aws-java-sdk-s3</artifactId>
183+
<groupId>software.amazon.awssdk</groupId>
184+
<artifactId>s3</artifactId>
185+
<!-- no version here as managed by BOM above! -->
186+
</dependency>
187+
<dependency>
188+
<groupId>software.amazon.awssdk</groupId>
189+
<artifactId>s3-transfer-manager</artifactId>
190+
<!-- no version here as managed by BOM above! -->
191+
</dependency>
192+
<!--dependency>
193+
<groupId>software.amazon.awssdk</groupId>
194+
<artifactId>apache-client</artifactId-->
195+
<!-- no version here as managed by BOM above! -->
196+
<!--/dependency-->
197+
<dependency>
198+
<groupId>software.amazon.awssdk</groupId>
199+
<artifactId>netty-nio-client</artifactId>
174200
<!-- no version here as managed by BOM above! -->
175201
</dependency>
176202
<dependency>
@@ -181,7 +207,6 @@
181207
<dependency>
182208
<groupId>com.google.code.gson</groupId>
183209
<artifactId>gson</artifactId>
184-
<version>2.9.1</version>
185210
<scope>compile</scope>
186211
</dependency>
187212
<!-- Should be refactored and moved to transitive section above once on Java EE 8 (makes WAR smaller) -->

scripts/zipdownload/pom.xml

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -23,8 +23,9 @@
2323
<artifactId>postgresql</artifactId>
2424
</dependency>
2525
<dependency>
26-
<groupId>com.amazonaws</groupId>
27-
<artifactId>aws-java-sdk-s3</artifactId>
26+
<groupId>software.amazon.awssdk</groupId>
27+
<artifactId>s3</artifactId>
28+
<!-- no version here as managed by BOM above! -->
2829
</dependency>
2930
</dependencies>
3031
<build>

0 commit comments

Comments
 (0)