Skip to content

Commit b8f5c1e

Browse files
11744 CORS: echo request Origin + add Vary (#11745)
* 11744: CORS: echo request Origin and add Vary: Origin; sanitize CSV lists; prefer comma-separated origins; rely on JVM options/MicroProfile only; add tests and release notes * Centralize CSV parsing (CsvUtil) + CORS origin echo & Vary header improvements * Make CORS origin list optional in CorsFilter initialization * Refactor GlobusOverlayAccessIO and CsvUtil for improved endpoint handling and CSV parsing * updated release note and comments * test fixes * Clarify CORS requirements for browser-based external tools in documentation * Update CORS documentation to clarify configuration requirements and deprecate legacy settings * Remove unused CSV lookup methods * Update JvmSettings documentation to clarify CSV list return types * Refactor doc structure for improved readability and maintainability * wording * Removed deprecated (and removed from code) AllowCors setting from doc * Fix formatting inconsistencies in dataset management documentation * rename: CsvUtil -> ListSplitUtil * Refactor CSV list lookup methods to join array elements before splitting * Rename CSV list lookup methods to use 'lookupSplittedList' for consistency * revert whitespace changes done by automated formatting tool * revert whitespace-only changes done by automatic tool * code cleanup * code cleanup * revert whitespace changes done by automated formatting tool * revert whitespace changes done by automated formatting tool * revert whitespace changes done by automated formatting tool * revert whitespace changes done by automated formatting tool * remove legacy dependency on SettingsServiceBean in CorsFilterTest * refactor: replace Arrays.stream with ListSplitUtil.split in CorsFilter * refactor: replace ListSplitUtil.split with Arrays.stream for list processing in JvmSettings * Enhance JvmSettings: Add trimming options for lookupSplittedList methods to handle whitespace in tokenized values --------- Co-authored-by: Steven Winship <[email protected]>
1 parent 5c8c9d7 commit b8f5c1e

File tree

29 files changed

+614
-133
lines changed

29 files changed

+614
-133
lines changed
Lines changed: 41 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,41 @@
1+
# 11744: CORS handling improvements
2+
3+
Modernizes CORS so browser integrations (previewers, external tools, JS clients) work correctly with multiple origins and proper caching.
4+
5+
## Highlights
6+
7+
- Echoes the request origin (`Access-Control-Allow-Origin`) when it matches `dataverse.cors.origin`.
8+
- Adds `Vary: Origin` for per-origin responses (not for wildcard).
9+
- Supports comma‑separated origin list; any `*` in the list = wildcard mode.
10+
- CORS now only enabled when `dataverse.cors.origin` is set (removed `:AllowCors` no longer enables it).
11+
- All comma-separated configuration settings (database properties and MicroProfile config) now ignore spaces around commas; tokens remain unchanged (no quote parsing). Examples: `dataverse.cors.methods`, `dataverse.cors.headers.allow`, `dataverse.cors.headers.expose`. See "Comma-separated configuration values" in the Installation Guide.
12+
- Docs updated (Installation, Big Data Support, External Tools, File Previews); new tests cover edge cases.
13+
14+
## Admin Action
15+
16+
Set `dataverse.cors.origin` explicitly (required). Use explicit origins (not `*`) for credentialed requests. Ensure proxies keep `Vary: Origin`.
17+
18+
Examples:
19+
20+
```
21+
dataverse.cors.origin=https://example.org
22+
dataverse.cors.origin=https://libis.github.io,https://gdcc.github.io
23+
dataverse.cors.origin=*
24+
```
25+
26+
Optional (unquoted):
27+
28+
```
29+
dataverse.cors.methods=GET, POST, OPTIONS, PUT, DELETE
30+
```
31+
32+
## Compatibility
33+
34+
- Must configure `dataverse.cors.origin`; `:AllowCors` was deprecated and has now been removed.
35+
- Any `*` triggers wildcard (no per-origin echo / no Vary header).
36+
37+
## Docs
38+
39+
See updated `dataverse.cors.origin` section and related notes in Big Data Support (S3), External Tools, and File Previews.
40+
41+
<!-- Maintainer note: The generic behavior for comma-separated settings has been documented centrally under Installation Guide > Configuration > "Comma-separated configuration values". Keep this item here as a cross-reference. -->

doc/sphinx-guides/source/api/external-tools.rst

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -11,6 +11,9 @@ Introduction
1111

1212
External tools are additional applications the user can access or open from your Dataverse installation to preview, explore, and manipulate data files and datasets. The term "external" is used to indicate that the tool is not part of the main Dataverse Software.
1313

14+
.. note::
15+
Browser-based tools must have CORS explicitly enabled via :ref:`dataverse.cors.origin <dataverse.cors.origin>`. List every origin that will host your tool (or use ``*`` when a wildcard is acceptable). If an origin is not listed, the browser will block that tool's API requests even if the tool page itself loads.
16+
1417
Once you have created the external tool itself (which is most of the work!), you need to teach a Dataverse installation how to construct URLs that your tool needs to operate. For example, if you've deployed your tool to fabulousfiletool.com your tool might want the ID of a file and the siteUrl of the Dataverse installation like this: https://fabulousfiletool.com?fileId=42&siteUrl=https://demo.dataverse.org
1518

1619
In short, you will be creating a manifest in JSON format that describes not only how to construct URLs for your tool, but also what types of files your tool operates on, where it should appear in the Dataverse installation web interfaces, etc.

doc/sphinx-guides/source/developers/big-data-support.rst

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -57,6 +57,15 @@ Allow CORS for S3 Buckets
5757
**IMPORTANT:** One additional step that is required to enable direct uploads via a Dataverse installation and for direct download to work with previewers and direct upload to work with dvwebloader (:ref:`folder-upload`) is to allow cross site (CORS) requests on your S3 store.
5858
The example below shows how to enable CORS rules (to support upload and download) on a bucket using the AWS CLI command line tool. Note that you may want to limit the AllowedOrigins and/or AllowedHeaders further. https://github.com/gdcc/dataverse-previewers/wiki/Using-Previewers-with-download-redirects-from-S3 has some additional information about doing this.
5959

60+
Dataverse itself will only emit the necessary ``Access-Control-*`` headers to browsers when CORS has been explicitly enabled via the JVM/MicroProfile setting :ref:`dataverse.cors.origin <dataverse.cors.origin>`. You must both:
61+
62+
* Configure an appropriate ``dataverse.cors.origin`` value (single origin, comma-separated list, or ``*``) on the Dataverse application server; and
63+
* Configure a matching/compatible CORS policy on each S3 bucket (and any CDN/proxy in front of it) that will be used for direct upload or for redirect (download-redirect) operations consumed by previewers.
64+
65+
If you specify multiple origins in ``dataverse.cors.origin`` Dataverse will echo back the requesting origin (when it matches) and will include ``Vary: Origin`` so that shared caches do not serve one origin's response to another. If you configure ``*`` Dataverse will respond with ``Access-Control-Allow-Origin: *`` (note that browsers will not allow credentialed requests with a wildcard).
66+
67+
Make sure the bucket CORS configuration ``AllowedOrigins`` is at least as permissive as the origins you configure in ``dataverse.cors.origin``. If the bucket allows ``*`` but the Dataverse application only allows a subset, the browser will still enforce the more restrictive application response.
68+
6069
If you'd like to check the CORS configuration on your bucket before making changes:
6170

6271
``aws s3api get-bucket-cors --bucket <BUCKET_NAME>``

doc/sphinx-guides/source/installation/config.rst

Lines changed: 28 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -10,6 +10,27 @@ Once you have finished securing and configuring your Dataverse installation, you
1010
.. contents:: |toctitle|
1111
:local:
1212

13+
.. _comma-separated-config-values:
14+
15+
Comma-separated configuration values
16+
------------------------------------
17+
18+
Many configuration options (both MicroProfile/JVM settings and database settings) accept comma-separated lists. For all such settings, Dataverse applies consistent, lightweight parsing:
19+
20+
- Whitespace immediately around commas is ignored (e.g., ``GET, POST`` is equivalent to ``GET,POST``).
21+
- Tokens are otherwise preserved exactly as typed. There is no quote parsing and no escape processing.
22+
- Embedded commas within a token are not supported.
23+
24+
Examples include (but are not limited to):
25+
26+
- :ref:`dataverse.cors.origin <dataverse.cors.origin>`
27+
- :ref:`dataverse.cors.methods <dataverse.cors.methods>`
28+
- :ref:`dataverse.cors.headers.allow <dataverse.cors.headers.allow>`
29+
- :ref:`dataverse.cors.headers.expose <dataverse.cors.headers.expose>`
30+
- :ref:`:UploadMethods`
31+
32+
This behavior is implemented centrally and applies across all Dataverse settings that accept comma-separated values.
33+
1334
.. _securing-your-installation:
1435

1536
Securing Your Installation
@@ -3704,17 +3725,21 @@ The following settings control Cross-Origin Resource Sharing (CORS) for your Dat
37043725
dataverse.cors.origin
37053726
+++++++++++++++++++++
37063727

3707-
Allowed origins for CORS requests. The default with no value set is to not include CORS headers. However, if the deprecated :AllowCors setting is explicitly set to true the default is "\*" (all origins).
3708-
When the :AllowsCors setting is not used, you must set this setting to "\*" or a list of origins to enable CORS headers.
3728+
Allowed origins for CORS requests. If this setting is not defined, CORS headers are not added. Set to ``*`` to allow all origins (note that browsers will not allow credentialed requests with ``*``) or provide a comma-separated list of explicit origins.
37093729

3710-
Multiple origins can be specified as a comma-separated list.
3730+
Multiple origins can be specified as a comma-separated list (whitespace is ignored):
37113731

37123732
Example:
37133733

37143734
``./asadmin create-jvm-options '-Ddataverse.cors.origin=https://example.com,https://subdomain.example.com'``
37153735

37163736
Can also be set via any `supported MicroProfile Config API source`_, e.g. the environment variable ``DATAVERSE_CORS_ORIGIN``.
37173737

3738+
Behavior:
3739+
3740+
* When a list of origins is configured, Dataverse echoes the single matching request ``Origin`` value in ``Access-Control-Allow-Origin`` and adds ``Vary: Origin`` to support correct proxy/CDN caching.
3741+
* When ``*`` is configured, ``Access-Control-Allow-Origin: *`` is sent and ``Vary`` is not modified.
3742+
37183743
.. _dataverse.cors.methods:
37193744

37203745
dataverse.cors.methods
@@ -5028,20 +5053,6 @@ This can be helpful in situations where multiple organizations are sharing one D
50285053
or
50295054
``curl -X PUT -d '*' http://localhost:8080/api/admin/settings/:InheritParentRoleAssignments``
50305055

5031-
:AllowCors (Deprecated)
5032-
+++++++++++++++++++++++
5033-
5034-
.. note::
5035-
This setting is deprecated. Please use the JVM settings above instead.
5036-
This legacy setting will only be used if the newer JVM settings are not set.
5037-
5038-
Enable or disable support for Cross-Origin Resource Sharing (CORS) by setting ``:AllowCors`` to ``true`` or ``false``.
5039-
5040-
``curl -X PUT -d true http://localhost:8080/api/admin/settings/:AllowCors``
5041-
5042-
.. note::
5043-
New values for this setting will only be used after a server restart.
5044-
50455056
:ChronologicalDateFacets
50465057
++++++++++++++++++++++++
50475058

doc/sphinx-guides/source/user/dataset-management.rst

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -175,6 +175,9 @@ File Previews
175175

176176
Dataverse installations can add previewers for common file types uploaded by their research communities. The previews appear on the file page. If a preview tool for a specific file type is available, the preview will be created and will display automatically, after terms have been agreed to or a guestbook entry has been made, if necessary. File previews are not available for restricted files unless they are being accessed using a Preview URL. See also :ref:`previewUrl`. When the dataset license is not the default license, users will be prompted to accept the license/data use agreement before the preview is shown. See also :ref:`license-terms`.
177177

178+
.. note::
179+
Some previewers run purely in the browser and make direct (JavaScript) requests back to the Dataverse API endpoints to retrieve file contents, metadata, or signed URLs. For these previewers to function when hosted on a different origin (e.g., a CDN or a separate previewer service), the Dataverse installation must have CORS enabled via :ref:`dataverse.cors.origin <dataverse.cors.origin>`. Administrators should configure the list of allowed origins to include the host serving the previewers.
180+
178181
Previewers are available for the following file types:
179182

180183
- Text

src/main/java/edu/harvard/iq/dataverse/DatasetFieldServiceBean.java

Lines changed: 5 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -53,6 +53,7 @@
5353
import org.apache.http.protocol.HttpContext;
5454
import org.apache.http.util.EntityUtils;
5555
import edu.harvard.iq.dataverse.settings.SettingsServiceBean;
56+
import edu.harvard.iq.dataverse.util.ListSplitUtil;
5657

5758
/**
5859
*
@@ -908,12 +909,12 @@ public String getFieldLanguage(String languages, String localeCode) {
908909
// If the fields list of supported languages contains the current locale (e.g.
909910
// the lang of the UI, or the current metadata input/display lang (tbd)), use
910911
// that. Otherwise, return the first in the list
911-
String[] langStrings = languages.split("\\s*,\\s*");
912-
if (langStrings.length > 0) {
913-
if (Arrays.asList(langStrings).contains(localeCode)) {
912+
final List<String> langStrings = ListSplitUtil.split(languages);
913+
if (!langStrings.isEmpty()) {
914+
if (langStrings.contains(localeCode)) {
914915
return localeCode;
915916
} else {
916-
return langStrings[0];
917+
return langStrings.get(0);
917918
}
918919
}
919920
return null;

src/main/java/edu/harvard/iq/dataverse/FileMetadata.java

Lines changed: 9 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -49,6 +49,7 @@
4949
import edu.harvard.iq.dataverse.datavariable.VarGroup;
5050
import edu.harvard.iq.dataverse.datavariable.VariableMetadata;
5151
import edu.harvard.iq.dataverse.util.DateUtil;
52+
import edu.harvard.iq.dataverse.util.ListSplitUtil;
5253
import edu.harvard.iq.dataverse.util.StringUtil;
5354
import java.util.HashSet;
5455
import java.util.Set;
@@ -605,18 +606,18 @@ public int compare(FileMetadata o1, FileMetadata o2) {
605606
}
606607
};
607608

608-
static Map<String,Long> categoryMap=null;
609+
static Map<String, Long> categoryMap = null;
609610

610611
public static void setCategorySortOrder(String categories) {
611-
categoryMap=new HashMap<String, Long>();
612-
long i=1;
613-
for(String cat: categories.split(",\\s*")) {
614-
categoryMap.put(cat.toUpperCase(), i);
615-
i++;
616-
}
612+
categoryMap = new HashMap<String, Long>();
613+
long i = 1;
614+
for (String cat : ListSplitUtil.split(categories)) {
615+
categoryMap.put(cat.toUpperCase(), i);
616+
i++;
617+
}
617618
}
618619

619-
public static Map<String,Long> getCategorySortOrder() {
620+
public static Map<String, Long> getCategorySortOrder() {
620621
return categoryMap;
621622
}
622623

src/main/java/edu/harvard/iq/dataverse/SettingsWrapper.java

Lines changed: 13 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -14,6 +14,7 @@
1414
import edu.harvard.iq.dataverse.settings.SettingsServiceBean;
1515
import edu.harvard.iq.dataverse.settings.SettingsServiceBean.Key;
1616
import edu.harvard.iq.dataverse.util.BundleUtil;
17+
import edu.harvard.iq.dataverse.util.ListSplitUtil;
1718
import edu.harvard.iq.dataverse.util.StringUtil;
1819
import edu.harvard.iq.dataverse.util.SystemConfig;
1920
import edu.harvard.iq.dataverse.UserNotification.Type;
@@ -50,8 +51,7 @@
5051
public class SettingsWrapper implements java.io.Serializable {
5152

5253
static final Logger logger = Logger.getLogger(SettingsWrapper.class.getCanonicalName());
53-
public static final String COMMA_BETWEEN_OPTIONAL_WHITE_SPACE = "\\s*,\\s*";
54-
54+
5555
@EJB
5656
SettingsServiceBean settingsService;
5757

@@ -393,10 +393,12 @@ public boolean isRsyncOnly() {
393393
rsyncOnly = false;
394394
} else {
395395
String uploadMethods = getValueForKey(SettingsServiceBean.Key.UploadMethods);
396-
if (uploadMethods==null){
396+
if (uploadMethods == null) {
397397
rsyncOnly = false;
398398
} else {
399-
rsyncOnly = Arrays.asList(uploadMethods.toLowerCase().split(COMMA_BETWEEN_OPTIONAL_WHITE_SPACE)).size() == 1 && uploadMethods.toLowerCase().equals(SystemConfig.FileUploadMethods.RSYNC.toString());
399+
String normalizedUploadMethods = uploadMethods.toLowerCase();
400+
rsyncOnly = ListSplitUtil.split(normalizedUploadMethods).size() == 1
401+
&& normalizedUploadMethods.equals(SystemConfig.FileUploadMethods.RSYNC.toString());
400402
}
401403
}
402404
}
@@ -424,11 +426,11 @@ public String getSupportTeamEmail() {
424426

425427
public Integer getUploadMethodsCount() {
426428
if (uploadMethodsCount == null) {
427-
String uploadMethods = getValueForKey(SettingsServiceBean.Key.UploadMethods);
428-
if (uploadMethods==null){
429+
String uploadMethods = getValueForKey(SettingsServiceBean.Key.UploadMethods);
430+
if (uploadMethods == null) {
429431
uploadMethodsCount = 0;
430432
} else {
431-
uploadMethodsCount = Arrays.asList(uploadMethods.toLowerCase().split(COMMA_BETWEEN_OPTIONAL_WHITE_SPACE)).size();
433+
uploadMethodsCount = ListSplitUtil.split(uploadMethods).size();
432434
}
433435
}
434436
return uploadMethodsCount;
@@ -502,7 +504,7 @@ public boolean shouldBeAnonymized(DatasetField df) {
502504
if (anonymizedFieldTypes == null) {
503505
anonymizedFieldTypes = new ArrayList<String>();
504506
String names = get(SettingsServiceBean.Key.AnonymizedFieldTypeNames.toString(), "");
505-
anonymizedFieldTypes.addAll(Arrays.asList(names.split(COMMA_BETWEEN_OPTIONAL_WHITE_SPACE)));
507+
anonymizedFieldTypes.addAll(ListSplitUtil.split(names));
506508
}
507509
return anonymizedFieldTypes.contains(df.getDatasetFieldType().getName());
508510
}
@@ -826,11 +828,11 @@ public String getMetricsUrl() {
826828
}
827829

828830
private Boolean getUploadMethodAvailable(String method){
829-
String uploadMethods = getValueForKey(SettingsServiceBean.Key.UploadMethods);
830-
if (uploadMethods==null){
831+
String uploadMethods = getValueForKey(SettingsServiceBean.Key.UploadMethods);
832+
if (uploadMethods == null) {
831833
return false;
832834
} else {
833-
return Arrays.asList(uploadMethods.toLowerCase().split(COMMA_BETWEEN_OPTIONAL_WHITE_SPACE)).contains(method);
835+
return ListSplitUtil.splitToLowerCaseSet(uploadMethods).contains(method);
834836
}
835837
}
836838

src/main/java/edu/harvard/iq/dataverse/api/Admin.java

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -115,6 +115,7 @@
115115
import edu.harvard.iq.dataverse.util.ArchiverUtil;
116116
import edu.harvard.iq.dataverse.util.BundleUtil;
117117
import edu.harvard.iq.dataverse.util.FileUtil;
118+
import edu.harvard.iq.dataverse.util.ListSplitUtil;
118119
import edu.harvard.iq.dataverse.util.SystemConfig;
119120
import edu.harvard.iq.dataverse.util.URLTokenUtil;
120121
import edu.harvard.iq.dataverse.util.UrlSignerUtil;
@@ -2243,7 +2244,7 @@ public Response addRoleAssignementsToChildren(@Context ContainerRequestContext c
22432244
boolean inheritAllRoles = false;
22442245
String rolesString = settingsSvc.getValueForKey(SettingsServiceBean.Key.InheritParentRoleAssignments, "");
22452246
if (rolesString.length() > 0) {
2246-
ArrayList<String> rolesToInherit = new ArrayList<String>(Arrays.asList(rolesString.split("\\s*,\\s*")));
2247+
ArrayList<String> rolesToInherit = new ArrayList<>(ListSplitUtil.split(rolesString));
22472248
if (!rolesToInherit.isEmpty()) {
22482249
if (rolesToInherit.contains("*")) {
22492250
inheritAllRoles = true;

src/main/java/edu/harvard/iq/dataverse/api/Datasets.java

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -5317,7 +5317,8 @@ public Response getPrivateUrlDatasetVersion(@PathParam("privateUrlToken") String
53175317
}
53185318
JsonObjectBuilder responseJson;
53195319
if (isAnonymizedAccess) {
5320-
List<String> anonymizedFieldTypeNamesList = new ArrayList<>(Arrays.asList(anonymizedFieldTypeNames.split(SettingsWrapper.COMMA_BETWEEN_OPTIONAL_WHITE_SPACE)));
5320+
// Use ListSplitUtil for consistent CSV parsing
5321+
List<String> anonymizedFieldTypeNamesList = new ArrayList<>(ListSplitUtil.split(anonymizedFieldTypeNames));
53215322
responseJson = json(dsv, anonymizedFieldTypeNamesList, true, returnOwners);
53225323
} else {
53235324
responseJson = json(dsv, null, true, returnOwners);
@@ -5343,7 +5344,8 @@ public Response getPreviewUrlDatasetVersion(@PathParam("previewUrlToken") String
53435344
}
53445345
JsonObjectBuilder responseJson;
53455346
if (isAnonymizedAccess) {
5346-
List<String> anonymizedFieldTypeNamesList = new ArrayList<>(Arrays.asList(anonymizedFieldTypeNames.split(SettingsWrapper.COMMA_BETWEEN_OPTIONAL_WHITE_SPACE)));
5347+
// Use ListSplitUtil for consistent CSV parsing
5348+
List<String> anonymizedFieldTypeNamesList = new ArrayList<>(ListSplitUtil.split(anonymizedFieldTypeNames));
53475349
responseJson = json(dsv, anonymizedFieldTypeNamesList, true, returnOwners);
53485350
} else {
53495351
responseJson = json(dsv, null, true, returnOwners);

0 commit comments

Comments
 (0)