Skip to content

Commit bf08caf

Browse files
authored
Merge pull request #11767 from IQSS/11766-new-io.gdcc.dataverse-spi
Extended io.gdcc.dataverse-spi (ExportDataProvider) interface
2 parents 154466c + 97e3803 commit bf08caf

File tree

8 files changed

+185
-15
lines changed

8 files changed

+185
-15
lines changed

.github/workflows/spi_release.yml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -42,7 +42,7 @@ jobs:
4242
with:
4343
java-version: '17'
4444
distribution: 'adopt'
45-
server-id: ossrh
45+
server-id: central
4646
server-username: MAVEN_USERNAME
4747
server-password: MAVEN_PASSWORD
4848
- uses: actions/cache@v4
@@ -80,7 +80,7 @@ jobs:
8080
with:
8181
java-version: '17'
8282
distribution: 'adopt'
83-
server-id: ossrh
83+
server-id: central
8484
server-username: MAVEN_USERNAME
8585
server-password: MAVEN_PASSWORD
8686
gpg-private-key: ${{ secrets.DATAVERSEBOT_GPG_KEY }}
Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,2 @@
1+
The ExportDataProvider framework in the dataverse-spi package has been extended, adding some extra options for developers of metadata exporter plugins.
2+
See the [documentation](https://guides.dataverse.org/en/latest/developers/metadataexport.html#building-an-exporter) in the Metadata Export guide for details.

doc/sphinx-guides/source/developers/making-library-releases.rst

Lines changed: 26 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -36,6 +36,32 @@ Releasing a Snapshot Version to Maven Central
3636

3737
That is to say, to make a snapshot release, you only need to get one or more commits into the default branch.
3838

39+
It's possible, of course, to make snapshot releases outside of GitHub Actions, from environments such as your laptop. Generally, you'll want to look at the GitHub Action and try to do the equivalent. You'll need a file set up locally at ``~/.m2/settings.xml`` with the following (contact a core developer for the redacted bits):
40+
41+
.. code-block:: bash
42+
43+
<settings>
44+
<servers>
45+
<server>
46+
<id>central</id>
47+
<username>REDACTED</username>
48+
<password>REDACTED</password>
49+
</server>
50+
</servers>
51+
</settings>
52+
53+
Then, study the GitHub Action and perform similar commands from your local environment. For example, as of this writing, for the dataverse-spi project, you can run the following commands, substituting the suffix you need:
54+
55+
``mvn -f modules/dataverse-spi -Dproject.version.suffix="2.1.0-PR11767-SNAPSHOT" verify``
56+
57+
``mvn -f modules/dataverse-spi -Dproject.version.suffix="2.1.0-PR11767-SNAPSHOT" deploy``
58+
59+
This will upload the snapshot here, for example: https://central.sonatype.com/repository/maven-snapshots/io/gdcc/dataverse-spi/2.1.02.1.0-PR11767-SNAPSHOT/dataverse-spi-2.1.02.1.0-PR11767-20250827.182026-1.jar
60+
61+
Before OSSRH was retired, you could browse through snapshot jars you published at https://s01.oss.sonatype.org/content/repositories/snapshots/io/gdcc/dataverse-spi/2.0.0-PR9685-SNAPSHOT/, for example. Now, even though you may see the URL of the jar as shown above during the "deploy" step, if you try to browse the various snapshot jars at https://central.sonatype.com/repository/maven-snapshots/io/gdcc/dataverse-spi/2.1.02.1.0-PR11767-SNAPSHOT/ you'll see "This maven2 hosted repository is not directly browseable at this URL. Please use the browse or HTML index views to inspect the contents of this repository." Sadly, the "browse" and "HTML index" links don't work, as noted in a `question <https://community.sonatype.com/t/this-maven2-group-repository-is-not-directly-browseable-at-this-url/8991>`_ on the Sonatype Community forum. Below is a suggestion for confirming that the jar was uploaded properly, which is to use Maven to copy the jar to your local directory. You could then compare checksums.
62+
63+
``mvn dependency:copy -DrepoUrl=https://central.sonatype.com/repository/maven-snapshots/ -Dartifact=io.gdcc:dataverse-spi:2.1.02.1.0-PR11767-SNAPSHOT -DoutputDirectory=.``
64+
3965
Releasing a Release (Non-Snapshot) Version to Maven Central
4066
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
4167

modules/dataverse-spi/pom.xml

Lines changed: 7 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -13,7 +13,7 @@
1313

1414
<groupId>io.gdcc</groupId>
1515
<artifactId>dataverse-spi</artifactId>
16-
<version>2.0.0${project.version.suffix}</version>
16+
<version>2.1.0${project.version.suffix}</version>
1717
<packaging>jar</packaging>
1818

1919
<name>Dataverse SPI Plugin API</name>
@@ -64,11 +64,13 @@
6464

6565
<distributionManagement>
6666
<snapshotRepository>
67-
<id>ossrh</id>
68-
<url>https://s01.oss.sonatype.org/content/repositories/snapshots</url>
67+
<id>central</id>
68+
<url>https://central.sonatype.com/repository/maven-snapshots/</url>
6969
</snapshotRepository>
7070
<repository>
71+
<!--TODO: change this from ossrh to central?-->
7172
<id>ossrh</id>
73+
<!--TODO: change this url?-->
7274
<url>https://s01.oss.sonatype.org/service/local/staging/deploy/maven2/</url>
7375
</repository>
7476
</distributionManagement>
@@ -110,7 +112,9 @@
110112
<artifactId>nexus-staging-maven-plugin</artifactId>
111113
<extensions>true</extensions>
112114
<configuration>
115+
<!--TODO: change this from ossrh to central?-->
113116
<serverId>ossrh</serverId>
117+
<!--TODO: change this URL?-->
114118
<nexusUrl>https://s01.oss.sonatype.org</nexusUrl>
115119
<autoReleaseAfterClose>true</autoReleaseAfterClose>
116120
</configuration>
Lines changed: 61 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,61 @@
1+
package io.gdcc.spi.export;
2+
3+
/**
4+
*
5+
* @author landreev
6+
* Provides an optional mechanism for defining various data retrieval options
7+
* for the export subsystem in a way that should allow us adding support for
8+
* more options going forward with minimal or no changes to the already
9+
* implemented export plugins.
10+
*/
11+
public class ExportDataContext {
12+
private boolean datasetMetadataOnly = false;
13+
private boolean publicFilesOnly = false;
14+
private Integer offset = null;
15+
private Integer length = null;
16+
17+
private ExportDataContext() {
18+
19+
}
20+
21+
public static ExportDataContext context() {
22+
ExportDataContext context = new ExportDataContext();
23+
return context;
24+
}
25+
26+
public ExportDataContext withDatasetMetadataOnly() {
27+
this.datasetMetadataOnly = true;
28+
return this;
29+
}
30+
31+
public ExportDataContext withPublicFilesOnly() {
32+
this.publicFilesOnly = true;
33+
return this;
34+
}
35+
36+
public ExportDataContext withOffset(Integer offset) {
37+
this.offset = offset;
38+
return this;
39+
}
40+
41+
public ExportDataContext withLength(Integer length) {
42+
this.length = length;
43+
return this;
44+
}
45+
46+
public boolean isDatasetMetadataOnly() {
47+
return datasetMetadataOnly;
48+
}
49+
50+
public boolean isPublicFilesOnly() {
51+
return publicFilesOnly;
52+
}
53+
54+
public Integer getOffset() {
55+
return offset;
56+
}
57+
58+
public Integer getLength() {
59+
return length;
60+
}
61+
}
Lines changed: 51 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,51 @@
1+
package io.gdcc.spi.export;
2+
3+
/**
4+
*
5+
* @author landreev
6+
* Provides a mechanism for defining various data retrieval options for the
7+
* export subsystem in a way that should allow us adding support for more
8+
* options going forward with minimal or no changes to the existing code in
9+
* export plugins.
10+
*/
11+
@Deprecated
12+
public class ExportDataOption {
13+
14+
public enum SupportedOptions {
15+
DatasetMetadataOnly,
16+
PublicFilesOnly;
17+
}
18+
19+
private SupportedOptions optionType;
20+
21+
/*public static ExportDataOption addOption(String option) {
22+
ExportDataOption ret = new ExportDataOption();
23+
24+
for (SupportedOptions supported : SupportedOptions.values()) {
25+
if (supported.toString().equals(option)) {
26+
ret.optionType = supported;
27+
}
28+
}
29+
return ret;
30+
}*/
31+
32+
public static ExportDataOption addDatasetMetadataOnly() {
33+
ExportDataOption ret = new ExportDataOption();
34+
ret.optionType = SupportedOptions.DatasetMetadataOnly;
35+
return ret;
36+
}
37+
38+
public static ExportDataOption addPublicFilesOnly() {
39+
ExportDataOption ret = new ExportDataOption();
40+
ret.optionType = SupportedOptions.PublicFilesOnly;
41+
return ret;
42+
}
43+
44+
public boolean isDatasetMetadataOnly() {
45+
return SupportedOptions.DatasetMetadataOnly.equals(optionType);
46+
}
47+
48+
public boolean isPublicFilesOnly() {
49+
return SupportedOptions.PublicFilesOnly.equals(optionType);
50+
}
51+
}

modules/dataverse-spi/src/main/java/io/gdcc/spi/export/ExportDataProvider.java

Lines changed: 36 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -21,8 +21,14 @@ public interface ExportDataProvider {
2121
* OAI_ORE export are the only two that provide 'complete'
2222
* dataset-level metadata along with basic file metadata for each file
2323
* in the dataset.
24+
* @param context - supplies optional parameters. Needs to support
25+
* context.isDatasetMetadataOnly(). In a situation where we
26+
* need to generate a format like DC that has no use for the
27+
* file-level metadata, it makes sense to skip retrieving and
28+
* formatting it, since there can be a very large number of
29+
* files in a dataset.
2430
*/
25-
JsonObject getDatasetJson();
31+
JsonObject getDatasetJson(ExportDataContext... context);
2632

2733
/**
2834
*
@@ -32,24 +38,42 @@ public interface ExportDataProvider {
3238
* @apiNote - THis, and the JSON format are the only two that provide complete
3339
* dataset-level metadata along with basic file metadata for each file
3440
* in the dataset.
41+
* @param context - supplies optional parameters.
3542
*/
36-
JsonObject getDatasetORE();
43+
JsonObject getDatasetORE(ExportDataContext... context);
3744

3845
/**
3946
* Dataverse is capable of extracting DDI-centric metadata from tabular
4047
* datafiles. This detailed metadata, which is only available for successfully
4148
* "ingested" tabular files, is not included in the output of any other methods
42-
* in this interface.
49+
* in this interface.
4350
*
4451
* @return - a JSONArray with one entry per ingested tabular dataset file.
4552
* @apiNote - there is no JSON schema available for this output and the format
4653
* is not well documented. Implementers may wish to expore the @see
4754
* edu.harvard.iq.dataverse.export.DDIExporter and the @see
4855
* edu.harvard.iq.dataverse.util.json.JSONPrinter classes where this
4956
* output is used/generated (respectively).
57+
* @param context - supplies optional parameters.
5058
*/
51-
JsonArray getDatasetFileDetails();
59+
JsonArray getDatasetFileDetails(ExportDataContext... context);
5260

61+
/**
62+
* Similar to the above, but
63+
* a) retrieves the information for the ingested/tabular data files _only_
64+
* b) provides an option for retrieving this stuff in batches
65+
* c) provides an option for skipping restricted/embargoed etc. files.
66+
* Intended for datasets with massive numbers of tabular files and datavariables.
67+
* @param context - supplies optional parameters.
68+
* current (2.1.0) known use cases:
69+
* context.isPublicFilesOnly();
70+
* context.getOffset();
71+
* context.getLength();
72+
* @return json array containing the datafile/filemetadata->datatable->datavariable metadata
73+
* @throws ExportException
74+
*/
75+
JsonArray getTabularDataDetails(ExportDataContext ... context) throws ExportException;
76+
5377
/**
5478
*
5579
* @return - the subset of metadata conforming to the schema.org standard as
@@ -58,8 +82,9 @@ public interface ExportDataProvider {
5882
* @apiNote - as this metadata export is not complete, it should only be used as
5983
* a starting point for an Exporter if it simplifies your exporter
6084
* relative to using the JSON or OAI_ORE exports.
85+
* @param context - supplies optional parameters.
6186
*/
62-
JsonObject getDatasetSchemaDotOrg();
87+
JsonObject getDatasetSchemaDotOrg(ExportDataContext... context);
6388

6489
/**
6590
*
@@ -68,8 +93,9 @@ public interface ExportDataProvider {
6893
* @apiNote - as this metadata export is not complete, it should only be used as
6994
* a starting point for an Exporter if it simplifies your exporter
7095
* relative to using the JSON or OAI_ORE exports.
96+
* @param context - supplies optional parameters.
7197
*/
72-
String getDataCiteXml();
98+
String getDataCiteXml(ExportDataContext... context);
7399

74100
/**
75101
* If an Exporter has specified a prerequisite format name via the
@@ -88,9 +114,10 @@ public interface ExportDataProvider {
88114
* malfunction, e.g. if you depend on format "ddi" and a third party
89115
* Exporter is configured to replace the internal ddi Exporter in
90116
* Dataverse.
117+
* @param context - supplies optional parameters.
91118
*/
92-
default Optional<InputStream> getPrerequisiteInputStream() {
119+
default Optional<InputStream> getPrerequisiteInputStream(ExportDataContext... context) {
93120
return Optional.empty();
94121
}
95-
96-
}
122+
123+
}

modules/dataverse-spi/src/main/java/io/gdcc/spi/export/Exporter.java

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -85,7 +85,6 @@ default Optional<String> getPrerequisiteFormatName() {
8585
return Optional.empty();
8686
}
8787

88-
8988
/**
9089
* Harvestable Exporters will be available as options in Dataverse's Harvesting mechanism.
9190
* @return true to make this exporter available as a harvesting option.

0 commit comments

Comments
 (0)