Skip to content

Publication fails with file PIDs enabled for some files #11546

@qqmyers

Description

@qqmyers

What steps does it take to reproduce the issue?
As first reported in #11494 QA testing, publication can fail for some datasets. The full use case appears to involve instances where file PIDs are enabled, there are files that were created and still in draft before #7334 / v6.4 (and thus did not have a global id yet) but were created before #10790 / v6.6.

The root cause is related to

if (dvObject.getSeparator() == null) {
dvObject.setSeparator(getSeparator());
} else {
if (!dvObject.getSeparator().equals(getSeparator())) {
logger.warning("The separator of the DvObject (" + dvObject.getSeparator()
+ ") does not match the configured separator (" + getSeparator() + ")");
throw new IllegalArgumentException("The separator of the DvObject (" + dvObject.getSeparator()
+ ") doesn't match that of the provider, id: " + getId());
}
which fails if the separator is non-null and not the same as that of the PidProvider being used, and
ALTER TABLE dvobject ADD COLUMN IF NOT EXISTS separator character varying(255) DEFAULT '';
which set all the existing null separators to the non-null empty string.

In retrospect, that should probably have been limited entries using protocol 'perma'.

  • When does this issue occur?
    Attempts to publish datasets meeting the criteria above.

  • What happens?
    *Publication fails and the log shows: The separator of the DvObject () does not match the configured separator (/)

Which version of Dataverse are you using?
v6.6+

What would fix this?
A work-around would be to to change the db so that the separator is null. I think something like
update dvobject set separator=null where separator='' and identifier is null;
should work but it's possible that there are cases related to permalinks where that may be too broad.

This could potentially be added as a fix in v6.7 without much work.

A more thorough fix would be to change the code in the AbstractPidProvider. Minimally, the separator checkcould/should only be done if the authority or identifier were non-null. More broadly, this code may not be needed as indicated by the comment before the method - it was used before multiple pid support was added (with the separator part added in #10790 following the example of the earlier code), or it should at least be refactored to only allow a new PID to be based on the existing values when authority and protocol are not null.

My (possibly wrong) understanding of the purpose of the original code was to handle cases where one PID account could mint identifiers with more than one authority and you'd want something like a dependent file to use the authority of the parent dataset if that were possible. With multiple pid providers, I don't think this is necessary any more - one account could be configured with two pid providers, each with a different authority. Whether there's really any case where the code currently sends in a dvobject that has a non-null protocol and authority (but doesn't already have an identifier/PID itself) is TBD - I didn't see any looking quickly.

Screenshots:

No matter the issue, screenshots are always welcome.

To add a screenshot, please use one of the following formats and/or methods described here:

Are you thinking about creating a pull request for this issue?
Help is always welcome, is this bug something you or your organization plan to fix?
At some point - I see the issue at QDR at least. My guess is doing a little research to confirm the work-around sql change and getting that (or whatever the final version is) into 6.7 would be worthwhile. I might have time for that but would be happy to hand it off to someone else. If anyone wants to look into the more thorough fix, I would be even happier to hand it off.

Metadata

Metadata

Assignees

Labels

FY25 Sprint 26FY25 Sprint 26 (2025-06-18 - 2025-07-02)Size: 3A percentage of a sprint. 2.1 hours.Type: Buga defect

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions