Skip to content

Duplicate detection during external source import should escape metadata #3712

@aprilherron

Description

@aprilherron

Describe the bug

When importing an item from an external source (e.g. OpenAIRE), DSpace now checks if the item already exists locally and alerts the user. However, the strategy for doing this is simply a solr query using the name metadata. This works for most cases, but breaks when certain characters are not escaped in the title. e.g. :

This means any titles with colons will necessarily create duplicates every time they are imported.

I suspect these other characters may present issues as well: +, --, -, &&, ||, !, (, ), ", ~, *, ?, :

To Reproduce

  1. Setup OpenAIRE to be used during Publication submission to attach a related Project entity.
  2. During submission, do a lookup using the "Funding OpenAIRE API" tab with the query 655609. You should see an item with the title "Adriatic Perspectives: Memory and Identity on a Transnational European Periphery".
  3. Import this item and ensure the Project is created and installed as a DSpace entity.
  4. Ensure solr core is up-to-date
  5. Repeat step 2, we should expect the first Project to appear in the "Select a local match" section, but it is blank, because the colon is not escaped and no results were returned.

Expected behavior

During step 5, the original Project should be detected and displayed for the user to select

Related work

TBD

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugcomponent: configurable entitiesrelated to configurable entitieshelp wantedNeeds a volunteer to claim to move forwardintegration: OpenAIRERelated to integration with OpenAIREtools:import-sourcesRelated to "Live Import" Sources feature, allowing import of content via external APIs.

    Type

    No type

    Projects

    Status

    📋 To Do

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions