Skip to content

Commit 6275ea9

Browse files
authored
Merge pull request #11485 from vera/mpconfig-personororg
feat: migrate personOrOrg settings to MPConfig
2 parents c021fcf + e37fbce commit 6275ea9

File tree

6 files changed

+71
-43
lines changed

6 files changed

+71
-43
lines changed
Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,7 @@
1+
The settings `dataverse.personOrOrg.assumeCommaInPersonName` and `dataverse.personOrOrg.orgPhraseArray` now support configuration via MicroProfile Config.
2+
3+
They have been renamed to `dataverse.person-or-org.assume-comma-in-person-name` and `dataverse.person-or-org.org-phrase-array` for consistency with naming conventions.
4+
5+
In addition to the existing `asadmin` JVM option method, any [supported MicroProfile Config API source](https://docs.payara.fish/community/docs/Technical%20Documentation/MicroProfile/Config/Overview.html) can now be used to set their values.
6+
7+
For backwards compatibility, `dataverse.personOrOrg.assumeCommaInPersonName` is still supported. However, `dataverse.personOrOrg.orgPhraseArray` is not, due to a change in the expected value format. `dataverse.person-or-org.org-phrase-array` now expects a comma-separated list of phrases as a value instead of a JsonArray of strings. Please update both the name and value format if using the old setting.

doc/sphinx-guides/source/admin/metadataexport.rst

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -65,5 +65,5 @@ Two exporters - Schema.org JSONLD and OpenAire - use an algorithm to determine w
6565

6666
The Dataverse software implements two jvm-options that can be used to tune the algorithm:
6767

68-
- :ref:`dataverse.personOrOrg.assumeCommaInPersonName` - boolean, default false. If true, Dataverse will assume any name without a comma must be an organization. This may be most useful for curated Dataverse instances that enforce the "family name, given name" convention.
69-
- :ref:`dataverse.personOrOrg.orgPhraseArray` - a JsonArray of strings. Any name that contains one of the strings is assumed to be an organization. For example, "Project" is a word that is not otherwise associated with being an organization.
68+
- :ref:`dataverse.person-or-org.assume-comma-in-person-name` - boolean, default false. If true, Dataverse will assume any name without a comma must be an organization. This may be most useful for curated Dataverse instances that enforce the "family name, given name" convention.
69+
- :ref:`dataverse.person-or-org.org-phrase-array` - a JsonArray of strings. Any name that contains one of the strings is assumed to be an organization. For example, "Project" is a word that is not otherwise associated with being an organization.

doc/sphinx-guides/source/installation/config.rst

Lines changed: 16 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -3163,27 +3163,36 @@ This setting is useful in cases such as running your Dataverse installation behi
31633163
"HTTP_VIA",
31643164
"REMOTE_ADDR"
31653165
3166-
.. _dataverse.personOrOrg.assumeCommaInPersonName:
3166+
.. _dataverse.person-or-org.assume-comma-in-person-name:
31673167

3168-
dataverse.personOrOrg.assumeCommaInPersonName
3169-
+++++++++++++++++++++++++++++++++++++++++++++
3168+
dataverse.person-or-org.assume-comma-in-person-name
3169+
+++++++++++++++++++++++++++++++++++++++++++++++++++
31703170

31713171
Please note that this setting is experimental.
31723172

31733173
The Schema.org metadata and OpenAIRE exports and the Schema.org metadata included in DatasetPages try to infer whether each entry in the various fields (e.g. Author, Contributor) is a Person or Organization. If you are sure that
31743174
users are following the guidance to add people in the recommended family name, given name order, with a comma, you can set this true to always assume entries without a comma are for Organizations. The default is false.
31753175

3176-
.. _dataverse.personOrOrg.orgPhraseArray:
3176+
``./asadmin create-jvm-options '-Ddataverse.person-or-org.assume-comma-in-person-name=true'``
31773177

3178-
dataverse.personOrOrg.orgPhraseArray
3179-
++++++++++++++++++++++++++++++++++++
3178+
Can also be set via *MicroProfile Config API* sources, e.g. the environment variable ``DATAVERSE_PERSON_OR_ORG_ASSUME_COMMA_IN_PERSON_NAME``.
3179+
3180+
**Note:** This setting was previously called `dataverse.personOrOrg.assumeCommaInPersonName`, which is still available as an alias for backwards compatiblity.
3181+
3182+
.. _dataverse.person-or-org.org-phrase-array:
3183+
3184+
dataverse.person-or-org.org-phrase-array
3185+
++++++++++++++++++++++++++++++++++++++++
31803186

31813187
Please note that this setting is experimental.
31823188

31833189
The Schema.org metadata and OpenAIRE exports and the Schema.org metadata included in DatasetPages try to infer whether each entry in the various fields (e.g. Author, Contributor) is a Person or Organization.
31843190
If you have examples where an orgization name is being inferred to belong to a person, you can use this setting to force it to be recognized as an organization.
3185-
The value is expected to be a JsonArray of strings. Any name that contains one of the strings is assumed to be an organization. For example, "Project" is a word that is not otherwise associated with being an organization.
3191+
The value is expected to be a comma-separated list of strings. Any name that contains one of the strings is assumed to be an organization. For example, "Project" is a word that is not otherwise associated with being an organization.
3192+
3193+
Can also be set via *MicroProfile Config API* sources, e.g. the environment variable ``DATAVERSE_PERSON_OR_ORG_ORG_PHRASE_ARRAY``.
31863194

3195+
**Note:** This setting was previously called `dataverse.personOrOrg.orgPhraseArray` and expected a JsonArray of strings. Please update both the name and value format if using the old setting.
31873196

31883197
.. _dataverse.api.signature-secret:
31893198

src/main/java/edu/harvard/iq/dataverse/settings/JvmSettings.java

Lines changed: 7 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -75,7 +75,7 @@ public enum JvmSettings {
7575
// INDEX CONCURENCY
7676
SCOPE_SOLR_CONCURENCY(SCOPE_SOLR, "concurrency"),
7777
MAX_ASYNC_INDEXES(SCOPE_SOLR_CONCURENCY, "max-async-indexes"),
78-
78+
7979
// RSERVE CONNECTION
8080
SCOPE_RSERVE(PREFIX, "rserve"),
8181
RSERVE_HOST(SCOPE_RSERVE, "host"),
@@ -279,7 +279,12 @@ public enum JvmSettings {
279279
//CSL CITATION SETTINGS
280280
SCOPE_CSL(PREFIX, "csl"),
281281
CSL_COMMON_STYLES(SCOPE_CSL, "common-styles"),
282-
282+
283+
// PersonOrOrgUtil SETTINGS
284+
SCOPE_PERSONORORG(PREFIX, "person-or-org"),
285+
ASSUME_COMMA_IN_PERSON_NAME(SCOPE_PERSONORORG, "assume-comma-in-person-name", "dataverse.personOrOrg.assumeCommaInPersonName"),
286+
ORG_PHRASE_ARRAY(SCOPE_PERSONORORG, "org-phrase-array"),
287+
283288
// CORS SETTINGS
284289
SCOPE_CORS(PREFIX, "cors"),
285290
CORS_ORIGIN(SCOPE_CORS, "origin"),

src/main/java/edu/harvard/iq/dataverse/util/PersonOrOrgUtil.java

Lines changed: 8 additions & 19 deletions
Original file line numberDiff line numberDiff line change
@@ -4,12 +4,10 @@
44
import java.util.List;
55
import java.util.logging.Logger;
66

7-
import jakarta.json.JsonArray;
7+
import edu.harvard.iq.dataverse.settings.JvmSettings;
88
import jakarta.json.JsonObject;
99
import jakarta.json.JsonObjectBuilder;
10-
import jakarta.json.JsonString;
1110

12-
import edu.harvard.iq.dataverse.util.json.JsonUtil;
1311
import edu.harvard.iq.dataverse.util.json.NullSafeJsonBuilder;
1412

1513
/**
@@ -42,8 +40,8 @@ public class PersonOrOrgUtil {
4240
static List<String> orgPhrases;
4341

4442
static {
45-
setAssumeCommaInPersonName(Boolean.parseBoolean(System.getProperty("dataverse.personOrOrg.assumeCommaInPersonName", "false")));
46-
setOrgPhraseArray(System.getProperty("dataverse.personOrOrg.orgPhraseArray", null));
43+
setAssumeCommaInPersonName(JvmSettings.ASSUME_COMMA_IN_PERSON_NAME.lookupOptional(Boolean.class).orElse(false));
44+
setOrgPhraseArray(JvmSettings.ORG_PHRASE_ARRAY.lookupOptional(String[].class).orElse(new String[]{}));
4745
}
4846

4947
/**
@@ -137,25 +135,16 @@ public static JsonObject getPersonOrOrganization(String name, boolean organizati
137135
}
138136

139137
// Public for testing
140-
public static void setOrgPhraseArray(String phraseArray) {
141-
orgPhrases = new ArrayList<String>();
142-
if (!StringUtil.isEmpty(phraseArray)) {
143-
try {
144-
JsonArray phrases = JsonUtil.getJsonArray(phraseArray);
145-
phrases.forEach(val -> {
146-
JsonString strVal = (JsonString) val;
147-
orgPhrases.add(strVal.getString());
148-
});
149-
} catch (Exception e) {
150-
logger.warning("Could not parse Org phrase list");
151-
}
138+
public static void setOrgPhraseArray(String[] phraseArray) {
139+
if (phraseArray == null) {
140+
orgPhrases = new ArrayList<>();
141+
} else {
142+
orgPhrases = List.of(phraseArray);
152143
}
153-
154144
}
155145

156146
// Public for testing
157147
public static void setAssumeCommaInPersonName(boolean assume) {
158148
assumeCommaInPersonName = assume;
159149
}
160-
161150
}

src/test/java/edu/harvard/iq/dataverse/util/PersonOrOrgUtilTest.java

Lines changed: 31 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -1,13 +1,17 @@
11
package edu.harvard.iq.dataverse.util;
22

3+
import edu.harvard.iq.dataverse.settings.JvmSettings;
34
import edu.harvard.iq.dataverse.util.json.JsonUtil;
45

6+
import edu.harvard.iq.dataverse.util.testing.JvmSetting;
7+
import edu.harvard.iq.dataverse.util.testing.LocalJvmSettings;
58
import org.junit.jupiter.api.Disabled;
69
import org.junit.jupiter.api.Test;
710
import static org.junit.jupiter.api.Assertions.*;
811

912
import jakarta.json.JsonObject;
1013

14+
@LocalJvmSettings
1115
public class PersonOrOrgUtilTest {
1216

1317
public PersonOrOrgUtilTest() {
@@ -26,27 +30,41 @@ public void testOrganizationCOMPLEXName() {
2630
verifyIsOrganization("The Ford Foundation");
2731
verifyIsOrganization("United Nations Economic and Social Commission for Asia and the Pacific (UNESCAP)");
2832
verifyIsOrganization("Michael J. Fox Foundation for Parkinson's Research");
29-
// The next example is one known to be asserted to be a Person without an entry
30-
// in the OrgWordArray
31-
// So we test with it in the array and then when the array is empty to verify
32-
// the array works, resetting the array works, and the problem still exists in
33+
// The next examples are known to be asserted to be a Person without an entry in the OrgWordArray
34+
// So we test when no array is set via JvmSetting to verify the problem still exists in
3335
// the underlying algorithm
34-
PersonOrOrgUtil.setOrgPhraseArray("[\"Portable\"]");
35-
verifyIsOrganization("Portable Antiquities of the Netherlands");
36-
PersonOrOrgUtil.setOrgPhraseArray(null);
3736
JsonObject obj = PersonOrOrgUtil.getPersonOrOrganization("Portable Antiquities of the Netherlands", false, false);
3837
assertTrue(obj.getBoolean("isPerson"));
38+
JsonObject obj2 = PersonOrOrgUtil.getPersonOrOrganization("Max Mustermann GmbH", false, false);
39+
assertTrue(obj2.getBoolean("isPerson"));
40+
}
41+
42+
@Test
43+
public void testOrganizationWithOrgPhraseArray() {
44+
PersonOrOrgUtil.setOrgPhraseArray(new String[]{"Portable", "GmbH"});
45+
// The next examples are known to be asserted to be a Person without an entry in the OrgWordArray
46+
// So we test with the array set via JvmSetting to verify the array works
47+
verifyIsOrganization("Portable Antiquities of the Netherlands");
48+
verifyIsOrganization("Max Mustermann GmbH");
49+
PersonOrOrgUtil.setOrgPhraseArray(null);
3950
}
4051

4152
@Test
4253
public void testOrganizationAcademicName() {
54+
verifyIsOrganization("John Smith Center");
55+
verifyIsOrganization("John Smith Group");
56+
// An example the base algorithm doesn't handle:
57+
JsonObject obj = PersonOrOrgUtil.getPersonOrOrganization("John Smith Project", false, false);
58+
assertTrue(obj.getBoolean("isPerson"));
59+
}
4360

44-
verifyIsOrganization("John Smith Center");
45-
verifyIsOrganization("John Smith Group");
46-
//An example the base algorithm doesn't handle:
47-
PersonOrOrgUtil.setAssumeCommaInPersonName(true);
48-
verifyIsOrganization("John Smith Project");
49-
PersonOrOrgUtil.setAssumeCommaInPersonName(false);
61+
@Test
62+
public void testOrganizationAcademicNameWithAssumeComma() {
63+
PersonOrOrgUtil.setAssumeCommaInPersonName(true);
64+
verifyIsOrganization("John Smith Center");
65+
verifyIsOrganization("John Smith Group");
66+
verifyIsOrganization("John Smith Project");
67+
PersonOrOrgUtil.setAssumeCommaInPersonName(false);
5068
}
5169

5270

0 commit comments

Comments
 (0)