Skip to content

Commit 976d10c

Browse files
authored
Merge pull request #11360 from GlobalDataverseCommunityConsortium/AWSv2
AWS v2 SDK for S3 (v1 EOL soon)
2 parents a9872c9 + 151029c commit 976d10c

File tree

19 files changed

+1231
-1090
lines changed

19 files changed

+1231
-1090
lines changed
Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,13 @@
1+
## Upgrade to AWS SDK v2 (for S3), v1 EOL in December 2025
2+
3+
To support S3 storage, Dataverse uses the AWS SDK. We have upgraded to v2 of this SDK because v1 reaches End Of Life (EOL) in [December 2025](https://aws.amazon.com/fr/blogs/developer/announcing-end-of-support-for-aws-sdk-for-java-v1-x-on-december-31-2025/).
4+
5+
As part of the upgrade, the payload-signing setting for S3 stores (`dataverse.files.<id>.payload-signing`) has been removed because it is no longer necessary. With the updated SDK, a payload signature will automatically be sent when required (and not sent when not required).
6+
7+
Dataverse developers should note that LocalStack is used to test S3 and older versions appear to be incompatible. The development environment has been upgraded to LocalStack v2.3.2 to v4.2.0, which seems to work fine.
8+
9+
See also #11073 and #11360.
10+
11+
### Settings Removed
12+
13+
- `dataverse.files.<id>.payload-signing`

doc/sphinx-guides/source/api/changelog.rst

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -9,9 +9,11 @@ This API changelog is experimental and we would love feedback on its usefulness.
99

1010
v6.7
1111
----
12+
1213
- An undocumented :doc:`search` parameter called "show_my_data" has been removed. It was never exercised by tests and is believed to be unused. API users should use the :ref:`api-mydata` API instead.
1314
- /api/datasets/{id}/curationStatus API now includes a JSON object with curation label, createtime, and assigner rather than a string 'label' and it supports a new boolean includeHistory parameter (default false) that returns a JSON array of statuses
1415
- /api/datasets/{id}/listCurationStates includes new columns "Status Set Time" and "Status Set By" columns listing the time the current status was applied and by whom. It also supports the boolean includeHistory parameter.
16+
- Due to updates in libraries used by Dataverse, XML serialization may have changed slightly with respect to whether self-closing tags are used for empty elements. This primiarily affects XML-based metadata exports. The XML structure of the export itself has not changed, so this is only an incompatibility if you are not using an XML parser.
1517

1618
v6.6
1719
----

doc/sphinx-guides/source/installation/config.rst

Lines changed: 3 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1321,7 +1321,6 @@ List of S3 Storage Options
13211321
dataverse.files.<id>.profile <?> Allows the use of AWS profiles for storage spanning multiple AWS accounts. (none)
13221322
dataverse.files.<id>.proxy-url <?> URL of a proxy protecting the S3 store. Optional. (none)
13231323
dataverse.files.<id>.path-style-access ``true``/``false`` Use path style buckets instead of subdomains. Optional. ``false``
1324-
dataverse.files.<id>.payload-signing ``true``/``false`` Enable payload signing. Optional ``false``
13251324
dataverse.files.<id>.chunked-encoding ``true``/``false`` Disable chunked encoding. Optional ``true``
13261325
dataverse.files.<id>.connection-pool-size <?> The maximum number of open connections to the S3 server ``256``
13271326
dataverse.files.<id>.disable-tagging ``true``/``false`` Do not place the ``temp`` tag when redirecting the upload to the S3 server. ``false``
@@ -1370,12 +1369,12 @@ Reported Working S3-Compatible Storage
13701369
possibly slow) https://play.minio.io:9000 service.
13711370

13721371
`StorJ Object Store <https://www.storj.io>`_
1373-
StorJ is a distributed object store that can be configured with an S3 gateway. Per the S3 Storage instructions above, you'll first set up the StorJ S3 store by defining the id, type, and label. After following the general installation, set the following configurations to use a StorJ object store: ``dataverse.files.<id>.payload-signing=true`` and ``dataverse.files.<id>.chunked-encoding=false``. For step-by-step instructions see https://docs.storj.io/dcs/how-tos/dataverse-integration-guide/
1372+
StorJ is a distributed object store that can be configured with an S3 gateway. Per the S3 Storage instructions above, you'll first set up the StorJ S3 store by defining the id, type, and label. After following the general installation, set the following configuration to use a StorJ object store: ``dataverse.files.<id>.chunked-encoding=false``. For step-by-step instructions see https://docs.storj.io/dcs/how-tos/dataverse-integration-guide/
13741373

13751374
Note that for direct uploads and downloads, Dataverse redirects to the proxy-url but presigns the urls based on the ``dataverse.files.<id>.custom-endpoint-url``. Also, note that if you choose to enable ``dataverse.files.<id>.download-redirect`` the S3 URLs expire after 60 minutes by default. You can change that minute value to reflect a timeout value that’s more appropriate by using ``dataverse.files.<id>.url-expiration-minutes``.
13761375

13771376
`Surf Object Store v2019-10-30 <https://www.surf.nl/en>`_
1378-
Set ``dataverse.files.<id>.payload-signing=true``, ``dataverse.files.<id>.chunked-encoding=false`` and ``dataverse.files.<id>.path-style-request=true`` to use Surf Object
1377+
Set ``dataverse.files.<id>.chunked-encoding=false`` and ``dataverse.files.<id>.path-style-request=true`` to use Surf Object
13791378
Store. You will need the Swift client (documented at <http://doc.swift.surfsara.nl/en/latest/Pages/Clients/s3cred.html>) to create the access key and secret key for the S3 interface.
13801379

13811380
Note that the ``dataverse.files.<id>.proxy-url`` setting can be used in installations where the object store is proxied, but it should be considered an advanced option that will require significant expertise to properly configure.
@@ -2265,7 +2264,7 @@ The S3 Archiver defines one custom setting, a required :S3ArchiverConfig. It can
22652264

22662265
The credentials for your S3 account, can be stored in a profile in a standard credentials file (e.g. ~/.aws/credentials) referenced via "profile" key in the :S3ArchiverConfig setting (will default to the default entry), or can via MicroProfile settings as described for S3 stores (dataverse.s3archiver.access-key and dataverse.s3archiver.secret-key)
22672266

2268-
The :S3ArchiverConfig setting is a JSON object that must include an "s3_bucket_name" and may include additional S3-related parameters as described for S3 Stores, including "profile", "connection-pool-size","custom-endpoint-url", "custom-endpoint-region", "path-style-access", "payload-signing", and "chunked-encoding".
2267+
The :S3ArchiverConfig setting is a JSON object that must include an "s3_bucket_name" and may include additional S3-related parameters as described for S3 Stores, including "profile", "connection-pool-size","custom-endpoint-url", "custom-endpoint-region", "path-style-access", and "chunked-encoding".
22692268

22702269
\:S3ArchiverConfig - minimally includes the name of the bucket to use. For example:
22712270

docker-compose-dev.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -209,7 +209,7 @@ services:
209209
dev_localstack:
210210
container_name: "dev_localstack"
211211
hostname: "localstack"
212-
image: localstack/localstack:2.3.2
212+
image: localstack/localstack:4.2.0
213213
restart: on-failure
214214
ports:
215215
- "127.0.0.1:4566:4566"

modules/dataverse-parent/pom.xml

Lines changed: 4 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -32,12 +32,13 @@
3232
<scope>import</scope>
3333
</dependency>
3434
<dependency>
35-
<groupId>com.amazonaws</groupId>
36-
<artifactId>aws-java-sdk-bom</artifactId>
35+
<groupId>software.amazon.awssdk</groupId>
36+
<artifactId>bom</artifactId>
3737
<version>${aws.version}</version>
3838
<type>pom</type>
3939
<scope>import</scope>
4040
</dependency>
41+
4142
<dependency>
4243
<groupId>com.google.cloud</groupId>
4344
<artifactId>libraries-bom</artifactId>
@@ -151,7 +152,7 @@
151152
<payara.version>6.2025.3</payara.version>
152153
<postgresql.version>42.7.7</postgresql.version>
153154
<solr.version>9.8.0</solr.version>
154-
<aws.version>1.12.748</aws.version>
155+
<aws.version>2.31.3</aws.version>
155156
<google.library.version>26.30.0</google.library.version>
156157

157158
<!-- Basic libs, logging -->

pom.xml

Lines changed: 29 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -77,6 +77,18 @@
7777
<groupId>org.apache.geronimo.specs</groupId>
7878
<artifactId>geronimo-javamail_1.4_spec</artifactId>
7979
</exclusion>
80+
<exclusion>
81+
<groupId>org.apache.geronimo.specs</groupId>
82+
<artifactId>geronimo-stax-api_1.0_spec</artifactId>
83+
</exclusion>
84+
<exclusion>
85+
<groupId>org.codehaus.woodstox</groupId>
86+
<artifactId>wstx-asl</artifactId>
87+
</exclusion>
88+
<exclusion>
89+
<groupId>org.codehaus.woodstox</groupId>
90+
<artifactId>woodstox-core-asl</artifactId>
91+
</exclusion>
8092
</exclusions>
8193
</dependency>
8294
<!-- Dependency for Apache Abdera and Apache Tika. Tika needs newer version. -->
@@ -167,10 +179,24 @@
167179
</exclusion>
168180
</exclusions>
169181
</dependency>
170-
171182
<dependency>
172-
<groupId>com.amazonaws</groupId>
173-
<artifactId>aws-java-sdk-s3</artifactId>
183+
<groupId>software.amazon.awssdk</groupId>
184+
<artifactId>s3</artifactId>
185+
<!-- no version here as managed by BOM above! -->
186+
</dependency>
187+
<dependency>
188+
<groupId>software.amazon.awssdk</groupId>
189+
<artifactId>s3-transfer-manager</artifactId>
190+
<!-- no version here as managed by BOM above! -->
191+
</dependency>
192+
<!--dependency>
193+
<groupId>software.amazon.awssdk</groupId>
194+
<artifactId>apache-client</artifactId-->
195+
<!-- no version here as managed by BOM above! -->
196+
<!--/dependency-->
197+
<dependency>
198+
<groupId>software.amazon.awssdk</groupId>
199+
<artifactId>netty-nio-client</artifactId>
174200
<!-- no version here as managed by BOM above! -->
175201
</dependency>
176202
<dependency>
@@ -181,7 +207,6 @@
181207
<dependency>
182208
<groupId>com.google.code.gson</groupId>
183209
<artifactId>gson</artifactId>
184-
<version>2.9.1</version>
185210
<scope>compile</scope>
186211
</dependency>
187212
<!-- Should be refactored and moved to transitive section above once on Java EE 8 (makes WAR smaller) -->

scripts/zipdownload/pom.xml

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -23,8 +23,9 @@
2323
<artifactId>postgresql</artifactId>
2424
</dependency>
2525
<dependency>
26-
<groupId>com.amazonaws</groupId>
27-
<artifactId>aws-java-sdk-s3</artifactId>
26+
<groupId>software.amazon.awssdk</groupId>
27+
<artifactId>s3</artifactId>
28+
<!-- no version here as managed by BOM above! -->
2829
</dependency>
2930
</dependencies>
3031
<build>

scripts/zipdownload/src/main/java/edu/harvard/iq/dataverse/custom/service/util/DirectAccessUtil.java

Lines changed: 21 additions & 34 deletions
Original file line numberDiff line numberDiff line change
@@ -20,12 +20,14 @@
2020

2121
package edu.harvard.iq.dataverse.custom.service.util;
2222

23-
import com.amazonaws.SdkClientException;
24-
import com.amazonaws.auth.profile.ProfileCredentialsProvider;
25-
import com.amazonaws.services.s3.AmazonS3;
26-
import com.amazonaws.services.s3.AmazonS3ClientBuilder;
27-
import com.amazonaws.services.s3.model.GetObjectRequest;
28-
import com.amazonaws.services.s3.model.ObjectMetadata;
23+
import software.amazon.awssdk.auth.credentials.ProfileCredentialsProvider;
24+
import software.amazon.awssdk.core.ResponseInputStream;
25+
import software.amazon.awssdk.regions.Region;
26+
import software.amazon.awssdk.services.s3.S3Client;
27+
import software.amazon.awssdk.services.s3.model.GetObjectRequest;
28+
import software.amazon.awssdk.services.s3.model.GetObjectResponse;
29+
import software.amazon.awssdk.services.s3.model.S3Exception;
30+
2931
import java.io.File;
3032
import java.io.FileInputStream;
3133
import java.io.IOException;
@@ -38,9 +40,9 @@
3840
*
3941
* @author Leonid Andreev
4042
*/
41-
public class DirectAccessUtil implements java.io.Serializable {
43+
public class DirectAccessUtil implements java.io.Serializable {
4244

43-
private AmazonS3 s3 = null;
45+
private S3Client s3 = null;
4446

4547
public InputStream openDirectAccess(String storageLocation) {
4648
InputStream inputStream = null;
@@ -57,31 +59,17 @@ public InputStream openDirectAccess(String storageLocation) {
5759
String bucket = storageLocation.substring(0, storageLocation.indexOf('/'));
5860
String key = storageLocation.substring(storageLocation.indexOf('/') + 1);
5961

60-
//System.out.println("bucket: "+bucket);
61-
//System.out.println("key: "+key);
62-
63-
/* commented-out code below is for looking up S3 metatadata
64-
properties, such as size, etc. prior to making an access call:
65-
ObjectMetadata objectMetadata = null;
66-
long fileSize = 0L;
67-
try {
68-
objectMetadata = s3.getObjectMetadata(bucket, key);
69-
fileSize = objectMetadata.getContentLength();
70-
//System.out.println("byte size: "+objectMetadata.getContentLength());
71-
} catch (SdkClientException sce) {
72-
System.err.println("Cannot get S3 object metadata " + key + " from bucket " + bucket);
73-
}*/
74-
7562
try {
76-
inputStream = s3.getObject(new GetObjectRequest(bucket, key)).getObjectContent();
77-
} catch (SdkClientException sce) {
63+
ResponseInputStream<GetObjectResponse> s3Object = s3.getObject(GetObjectRequest.builder()
64+
.bucket(bucket)
65+
.key(key)
66+
.build());
67+
inputStream = s3Object;
68+
} catch (S3Exception se) {
7869
System.err.println("Cannot get S3 object " + key + " from bucket " + bucket);
7970
}
8071

8172
} else if (storageLocation.startsWith("file://")) {
82-
// This could be a static method; since no reusable client/maintainable
83-
// state is required
84-
8573
storageLocation = storageLocation.substring(7);
8674

8775
try {
@@ -98,14 +86,13 @@ public InputStream openDirectAccess(String storageLocation) {
9886
private void createOrReuseAwsClient() {
9987
if (this.s3 == null) {
10088
try {
101-
AmazonS3ClientBuilder s3CB = AmazonS3ClientBuilder.standard();
102-
s3CB.setCredentials(new ProfileCredentialsProvider("default"));
103-
this.s3 = s3CB.build();
104-
89+
this.s3 = S3Client.builder()
90+
.region(Region.US_EAST_1) // You may want to make this configurable
91+
.credentialsProvider(ProfileCredentialsProvider.create("default"))
92+
.build();
10593
} catch (Exception e) {
106-
System.err.println("cannot instantiate an S3 client");
94+
System.err.println("Cannot instantiate an S3 client: " + e.getMessage());
10795
}
10896
}
10997
}
110-
11198
}

src/main/java/edu/harvard/iq/dataverse/EditDatafilesPage.java

Lines changed: 0 additions & 20 deletions
Original file line numberDiff line numberDiff line change
@@ -1687,26 +1687,6 @@ public String getRsyncScriptFilename() {
16871687
return rsyncScriptFilename;
16881688
}
16891689

1690-
@Deprecated
1691-
public void requestDirectUploadUrl() {
1692-
1693-
S3AccessIO<?> s3io = FileUtil.getS3AccessForDirectUpload(dataset);
1694-
if (s3io == null) {
1695-
FacesContext.getCurrentInstance().addMessage(uploadComponentId, new FacesMessage(FacesMessage.SEVERITY_ERROR, BundleUtil.getStringFromBundle("dataset.file.uploadWarning"), "Direct upload not supported for this dataset"));
1696-
}
1697-
String url = null;
1698-
String storageIdentifier = null;
1699-
try {
1700-
url = s3io.generateTemporaryS3UploadUrl();
1701-
storageIdentifier = FileUtil.getStorageIdentifierFromLocation(s3io.getStorageLocation());
1702-
} catch (IOException io) {
1703-
logger.warning(io.getMessage());
1704-
FacesContext.getCurrentInstance().addMessage(uploadComponentId, new FacesMessage(FacesMessage.SEVERITY_ERROR, BundleUtil.getStringFromBundle("dataset.file.uploadWarning"), "Issue in connecting to S3 store for direct upload"));
1705-
}
1706-
1707-
PrimeFaces.current().executeScript("uploadFileDirectly('" + url + "','" + storageIdentifier + "')");
1708-
}
1709-
17101690
public void requestDirectUploadUrls() {
17111691

17121692
Map<String, String> paramMap = FacesContext.getCurrentInstance().getExternalContext().getRequestParameterMap();

0 commit comments

Comments
 (0)