Skip to content

Commit 70b6e26

Browse files
authored
Merge pull request #12002 from IQSS/11996-fix-settings
Fix settings
2 parents cb1ad3d + 8ddd5b8 commit 70b6e26

File tree

18 files changed

+789
-262
lines changed

18 files changed

+789
-262
lines changed

doc/release-notes/11639-db-opts-idempotency.md

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -43,3 +43,13 @@ The following database settings are were added to the official list within the c
4343
- `:LDNAnnounceRequiredFields`
4444
- `:LDNTarget`
4545
- `:WorkflowsAdminIpWhitelist` - formerly `WorkflowsAdmin#IP_WHITELIST_KEY`
46+
- `:PrePublishDatasetWorkflowId` - formerly `WorkflowServiceBean.WorkflowId:PrePublishDataset`
47+
- `:PostPublishDatasetWorkflowId` - formerly `WorkflowServiceBean.WorkflowId:PostPublishDataset`
48+
49+
### Important Considerations During Upgrade Of Your Installation
50+
51+
1. Running a customized fork? Make sure to add any custom settings to the SettingsServiceBean.Key enum before deploying!
52+
2. Any database settings not contained in the `SettingServiceBean.Key` will be removed from your database during each deployment cycle.
53+
3. As always when upgrading, make sure to backup your database beforehand!
54+
You can also use the existing API endpoint `/api/admin/settings` to retrieve all settings as JSONish data for a quick backup before upgrading.
55+

doc/sphinx-guides/source/developers/workflows.rst

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -27,6 +27,8 @@ If a step in a workflow fails, the Dataverse installation makes an effort to rol
2727
provider offers two steps for sending and receiving customizable HTTP requests.
2828
*http/sr* and *http/authExt*, detailed below, with the latter able to use the API to make changes to the dataset being processed. (Both lock the dataset to prevent other processes from changing the dataset between the time the step is launched to when the external process responds to the Dataverse instance.)
2929

30+
.. _workflow_admin:
31+
3032
Administration
3133
~~~~~~~~~~~~~~
3234

@@ -36,6 +38,8 @@ At the moment, defining a workflow for each trigger is done for the entire insta
3638

3739
In order to prevent unauthorized resuming of workflows, the Dataverse installation maintains a "white list" of IP addresses from which resume requests are honored. This list is maintained using the ``/api/admin/workflows/ip-whitelist`` endpoint of the :doc:`/api/native-api`. By default, the Dataverse installation honors resume requests from localhost only (``127.0.0.1;::1``), so set-ups that use a single server work with no additional configuration.
3840

41+
Note: these settings are also exposed and manageable via the Settings API.
42+
See :ref:`:WorkflowsAdminIpWhitelist`, :ref:`:PrePublishDatasetWorkflowId` and :ref:`:PostPublishDatasetWorkflowId`
3943

4044
Available Steps
4145
~~~~~~~~~~~~~~~

doc/sphinx-guides/source/installation/config.rst

Lines changed: 45 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2417,6 +2417,9 @@ The workflow id returned in this call (or available by doing a GET of /api/admin
24172417

24182418
Once these steps are taken, new publication requests will automatically trigger submission of an archival copy to the specified archiver, Chronopolis' DuraCloud component in this example. For Chronopolis, as when using the API, it is currently the admin's responsibility to snap-shot the DuraCloud space and monitor the result. Failure of the workflow, (e.g. if DuraCloud is unavailable, the configuration is wrong, or the space for this dataset already exists due to a prior publication action or use of the API), will create a failure message but will not affect publication itself.
24192419

2420+
Note: setting the default workflow is also available via the Settings API.
2421+
See :ref:`:WorkflowsAdminIpWhitelist`, :ref:`:PrePublishDatasetWorkflowId` and :ref:`:PostPublishDatasetWorkflowId`
2422+
24202423
.. _bag-info.txt:
24212424

24222425
Configuring bag-info.txt
@@ -4536,7 +4539,11 @@ Using a JSON-based setting, you can set a global default and per-format limits f
45364539

45374540
(In previous releases of Dataverse, a colon-separated form was used to specify per-format limits, such as ``:TabularIngestSizeLimit:Rdata``, but this is no longer supported. Now JSON is used.)
45384541

4539-
The expected JSON is an object with key/value pairs like the following. Format names are case-insensitive, and all fields are optional. The size limits must be strings with double quotes around them (e.g. ``"10"``) rather than numbers (e.g. ``10``).
4542+
The expected JSON is an object with key/value pairs like the following.
4543+
Format names are case-insensitive, and all fields are optional (an empty JSON object equals not restricted).
4544+
The size limits must be whole numbers, either presented as strings with double quotes around them (e.g. ``"10"``) or numeric values (e.g. ``10`` or ``10.0``).
4545+
Note that decimal numbers like ``10.5`` are invalid.
4546+
Any invalid setting will temporarily disable tabular ingest until corrected.
45404547

45414548
.. code:: json
45424549
@@ -5155,6 +5162,43 @@ Number of errors to display to the user when creating DataFiles from a file uplo
51555162

51565163
``curl -X PUT -d '1' http://localhost:8080/api/admin/settings/:CreateDataFilesMaxErrorsToDisplay``
51575164

5165+
.. _:WorkflowsAdminIpWhitelist:
5166+
5167+
:WorkflowsAdminIpWhitelist
5168+
++++++++++++++++++++++++++
5169+
5170+
A semicolon-separated list of IP addresses from which workflow resume requests are honored.
5171+
By default, the Dataverse installation honors resume requests from localhost only (``127.0.0.1;::1``).
5172+
This setting allows for preventing unauthorized resuming of workflows.
5173+
5174+
``curl -X PUT -d '127.0.0.1;::1;192.168.0.1' http://localhost:8080/api/admin/settings/:WorkflowsAdminIpWhitelist``
5175+
5176+
See :ref:`Workflow Admin section <workflow_admin>` for more details and context.
5177+
5178+
.. _:PrePublishDatasetWorkflowId:
5179+
5180+
:PrePublishDatasetWorkflowId
5181+
++++++++++++++++++++++++++++
5182+
5183+
The identifier of the workflow to be executed prior to dataset publication.
5184+
This pre-publish workflow is useful for preparing a dataset for public access (e.g., moving files, checking metadata) or starting an approval process.
5185+
5186+
``curl -X PUT -d '1' http://localhost:8080/api/admin/settings/:PrePublishDatasetWorkflowId``
5187+
5188+
See :ref:`Workflow Admin section <workflow_admin>` for more details and context.
5189+
5190+
.. _:PostPublishDatasetWorkflowId:
5191+
5192+
:PostPublishDatasetWorkflowId
5193+
+++++++++++++++++++++++++++++
5194+
5195+
The identifier of the workflow to be executed after a dataset has been successfully published.
5196+
This post-publish workflow is useful for actions such as sending notifications about the newly published dataset or archiving.
5197+
5198+
``curl -X PUT -d '2' http://localhost:8080/api/admin/settings/:PostPublishDatasetWorkflowId``
5199+
5200+
See :ref:`Workflow Admin section <workflow_admin>` for more details and context.
5201+
51585202
.. _:BagItHandlerEnabled:
51595203

51605204
:BagItHandlerEnabled

modules/dataverse-parent/pom.xml

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -152,6 +152,7 @@
152152
<payara.version>6.2025.10</payara.version>
153153
<postgresql.version>42.7.7</postgresql.version>
154154
<solr.version>9.8.0</solr.version>
155+
<postgresql.server.version>16</postgresql.server.version>
155156
<aws.version>2.33.0</aws.version>
156157
<google.library.version>26.30.0</google.library.version>
157158

@@ -168,7 +169,7 @@
168169
<gdcc.xoai.version>5.3.0</gdcc.xoai.version>
169170

170171
<!-- Testing dependencies -->
171-
<testcontainers.version>1.19.7</testcontainers.version>
172+
<testcontainers.version>2.0.2</testcontainers.version>
172173
<smallrye-mpconfig.version>3.7.1</smallrye-mpconfig.version>
173174
<junit.jupiter.version>5.10.2</junit.jupiter.version>
174175
<mockito.version>5.11.0</mockito.version>

pom.xml

Lines changed: 14 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -20,7 +20,7 @@
2020
<properties>
2121
<skipUnitTests>false</skipUnitTests>
2222
<skipIntegrationTests>false</skipIntegrationTests>
23-
<it.groups>integration</it.groups>
23+
<it.groups>integration,migration</it.groups>
2424
<!-- Provide a fallback value that won't break things if JaCoCo prepare-agent steps don't set it. -->
2525
<!-- Note: you must use @{} style late variable binding in argLine, otherwise JaCoCo cannot inject the right settings! -->
2626
<surefire.jacoco.args>-Ddummy.jacoco.property=true</surefire.jacoco.args>
@@ -748,36 +748,36 @@
748748
</exclusion>
749749
</exclusions>
750750
</dependency>
751+
<dependency>
752+
<groupId>org.dbunit</groupId>
753+
<artifactId>dbunit</artifactId>
754+
<version>3.0.0</version>
755+
<scope>test</scope>
756+
</dependency>
751757
<dependency>
752758
<groupId>org.testcontainers</groupId>
753759
<artifactId>testcontainers</artifactId>
754760
<scope>test</scope>
755-
<exclusions>
756-
<exclusion>
757-
<groupId>junit</groupId>
758-
<artifactId>junit</artifactId>
759-
</exclusion>
760-
</exclusions>
761761
</dependency>
762762
<dependency>
763763
<groupId>org.testcontainers</groupId>
764-
<artifactId>junit-jupiter</artifactId>
764+
<artifactId>testcontainers-junit-jupiter</artifactId>
765765
<scope>test</scope>
766766
</dependency>
767767
<dependency>
768768
<groupId>org.testcontainers</groupId>
769-
<artifactId>postgresql</artifactId>
769+
<artifactId>testcontainers-postgresql</artifactId>
770770
<scope>test</scope>
771771
</dependency>
772772
<dependency>
773773
<groupId>com.github.dasniko</groupId>
774774
<artifactId>testcontainers-keycloak</artifactId>
775-
<version>3.6.0</version>
775+
<version>4.0.0</version>
776776
<scope>test</scope>
777777
</dependency>
778778
<dependency>
779779
<groupId>org.testcontainers</groupId>
780-
<artifactId>localstack</artifactId>
780+
<artifactId>testcontainers-localstack</artifactId>
781781
<scope>test</scope>
782782
</dependency>
783783
<!--
@@ -1070,6 +1070,9 @@
10701070
-->
10711071
<argLine>@{failsafe.jacoco.args} ${argLine}</argLine>
10721072
<skip>${skipIntegrationTests}</skip>
1073+
<systemPropertyVariables>
1074+
<postgresql.server.version>${postgresql.server.version}</postgresql.server.version>
1075+
</systemPropertyVariables>
10731076
</configuration>
10741077
<executions>
10751078
<execution>

src/main/java/edu/harvard/iq/dataverse/flyway/SettingsCleanupCallback.java

Lines changed: 21 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -11,7 +11,9 @@
1111
import java.sql.ResultSet;
1212
import java.sql.SQLException;
1313
import java.util.ArrayList;
14+
import java.util.HashMap;
1415
import java.util.List;
16+
import java.util.Map;
1517
import java.util.logging.Level;
1618
import java.util.logging.Logger;
1719

@@ -40,6 +42,7 @@ public boolean canHandleInTransaction(Event event, Context context) {
4042

4143
@Override
4244
public void handle(Event event, Context context) {
45+
// Failsafe - we only run _after_ all migrations are done.
4346
if (event != Event.AFTER_MIGRATE) {
4447
return;
4548
}
@@ -61,10 +64,19 @@ public String getCallbackName() {
6164
return "SettingsCleanup";
6265
}
6366

67+
/**
68+
* Cleans up invalid settings from the database by identifying and removing
69+
* rows in the `setting` table where the `name` attribute does not correspond
70+
* to a valid SettingsServiceBean.Key.
71+
*
72+
* @param connection the database connection to use for querying and updating the `setting` table
73+
* @throws SQLException if a database access error occurs or the query fails
74+
*/
6475
private void cleanupInvalidSettings(Connection connection) throws SQLException {
65-
// Collect IDs of rows to delete
66-
List<Long> idsToDelete = new ArrayList<>();
76+
// Collect IDs of rows to delete, together with the setting's "name" attribute.
77+
Map<Long, String> entriesToDelete = new HashMap<>();
6778

79+
// IMPORTANT: as we cannot use JPQL mid-Flyway, this query needs to be carefully aligned with the Setting class!
6880
String selectSql = "SELECT id, name FROM setting";
6981
try (PreparedStatement ps = connection.prepareStatement(selectSql);
7082
ResultSet rs = ps.executeQuery()) {
@@ -77,24 +89,25 @@ private void cleanupInvalidSettings(Connection connection) throws SQLException {
7789
// to a SettingsServiceBean.Key is considered invalid and will be removed.
7890
SettingsServiceBean.Key key = SettingsServiceBean.Key.parse(name);
7991
if (key == null) {
80-
idsToDelete.add(id);
92+
entriesToDelete.put(id, name);
8193
}
8294
}
8395
}
8496

85-
if (idsToDelete.isEmpty()) {
97+
if (entriesToDelete.isEmpty()) {
8698
logger.fine("Settings cleanup: no invalid settings found");
8799
return;
88100
}
89101

90-
logger.info(() -> "Settings cleanup: found " + idsToDelete.size()
91-
+ " invalid settings; deleting them");
102+
logger.info(() -> "Settings cleanup: found " + entriesToDelete.size()
103+
+ " invalid/obsolete settings; deleting them.");
92104

93105
String deleteSql = "DELETE FROM setting WHERE id = ?";
94106
try (PreparedStatement delete = connection.prepareStatement(deleteSql)) {
95-
for (Long id : idsToDelete) {
96-
delete.setLong(1, id);
107+
for (Map.Entry<Long, String> entry : entriesToDelete.entrySet()) {
108+
delete.setLong(1, entry.getKey());
97109
delete.addBatch();
110+
logger.info("Settings cleanup: deleting \"" + entry.getValue() + "\"");
98111
}
99112
int[] counts = delete.executeBatch();
100113
logger.info(() -> "Settings cleanup: deleted " + counts.length + " rows with invalid keys");

src/main/java/edu/harvard/iq/dataverse/settings/SettingsServiceBean.java

Lines changed: 20 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -177,6 +177,18 @@ public enum Key {
177177
*/
178178
WorkflowsAdminIpWhitelist,
179179

180+
/**
181+
* Represents the workflow identifier for the "pre-publish dataset" operation.
182+
* This identifier is used to manage and define the specific workflow
183+
* triggered before a dataset is published within the application.
184+
*/
185+
PrePublishDatasetWorkflowId,
186+
/**
187+
* Represents the configuration key for specifying the workflow identifier that
188+
* will be executed after a dataset has been published.
189+
*/
190+
PostPublishDatasetWorkflowId,
191+
180192
/**
181193
* A special secret that, if set, needs to be given when trying to manage internal users.
182194
* This key was formerly known as "BuiltinUsers.KEY", which never was a setting name aligning with the others.
@@ -291,13 +303,14 @@ public enum Key {
291303
*/
292304
@Deprecated(since = "6.2", forRemoval = true)
293305
SystemEmail,
294-
/* size limit for Tabular data file ingests */
295-
/* (can be set separately for specific ingestable formats; in which
296-
case the actual stored option will be TabularIngestSizeLimit:{FORMAT_NAME}
297-
where {FORMAT_NAME} is the format identification tag returned by the
298-
getFormatName() method in the format-specific plugin; "sav" for the
299-
SPSS/sav format, "RData" for R, etc.
300-
for example: :TabularIngestSizeLimit:RData */
306+
307+
/**
308+
<p>Size limit (in bytes) for tabular file ingest. Accepts either a single numeric value or JSON for per-format control.</p>
309+
<p>Values: -1 (or absent) = no limit, 0 = disable ingest, >0 = byte threshold, or JSON object.</p>
310+
<p>JSON object allows setting a "default" (same as single byte value) and override limits per-format for: CSV, DTA, POR, Rdata, SAV, XLSX.
311+
Example: <code>{"default": "536870912", "CSV": "0", "Rdata": "1000000"}</code></p>
312+
<p>Format names are case-insensitive. Invalid settings disable ingest until corrected.</p>
313+
*/
301314
TabularIngestSizeLimit,
302315
/* Validate physical files in the dataset when publishing, if the dataset size less than the threshold limit */
303316
DatasetChecksumValidationSizeLimit,

0 commit comments

Comments
 (0)