Skip to content

E-ARK SIP parse ignores CONTENTINFORMATIONTYPE and misreads TYPE/OTHERTYPE for contentType #372

@ThomasEdvardsen

Description

@ThomasEdvardsen

Summary

Parsing an E-ARK SIP via EARKSIP.parse(...) has two related issues:

  1. contentInformationType is never populated from METS, so it stays at default MIXED.
  2. contentType is derived from CONTENTINFORMATIONTYPE / OTHERCONTENTINFORMATIONTYPE instead of TYPE / OTHERTYPE.

Steps to Reproduce

  1. Create a SIP with root METS attributes:
    • TYPE="Other"
    • csip:OTHERTYPE="Moving images - on tangible media"
    • csip:CONTENTINFORMATIONTYPE="OTHER"
    • csip:OTHERCONTENTINFORMATIONTYPE="MOVINGIMAGES-PROFILE-1.0"
  2. Parse it with new EARKSIP().parse(path).
  3. Inspect results:
    • sip.getContentType().asString()incorrect (taken from content information fields)
    • sip.getContentInformationType().asString() → remains MIXED

Expected

  • contentType should come from TYPE / OTHERTYPE
  • contentInformationType should come from csip:CONTENTINFORMATIONTYPE / csip:OTHERCONTENTINFORMATIONTYPE

Actual

  • contentType incorrectly uses content information attributes
  • contentInformationType remains default MIXED

Likely Cause

In EARKUtils.setIPContentType(...), the value is taken from CONTENTINFORMATIONTYPE / OTHERCONTENTINFORMATIONTYPE rather than TYPE / OTHERTYPE.
Also, processMainMets(...) (and representation parsing) never call a setter for contentInformationType.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions