Skip to content

Add support for document-level key-value metadata #1156

@reckart

Description

@reckart

Add support for document-level key-value metadata. I imagine something like this:

=== Variant 1

MetaDataEntry extends Annotation  {
  String: key
  String: value
}

// Simplest option only allowing String key-value pairs

=== Variant 2

// Option only allowing basic typed key-value pairs with values represented as strings
// The type would be set if the value is not a string - and it would be set e.g. to `int`, `bool`, etc.

MetaDataEntry extends Annotation  {
  String: key
  String: value
  String: type
}

=== Variant 3

// Rather have everything in one FS; either value or ref would be set, but not both
// If ref is set, then values would be retrieved from the linked FS (key-values again)

MetaDataEntry extends Annotation  {
  String: key
  String: value
  FeatureStructure: ref
  String: type
}

=== Variant 4

// Full support for all kinds of structures, even nested entries - basically "schemaless"

MetaDataEntry extends Annotation {
  String: key
}

PrimitiveMetaDataEntry extends MetaDataEntry  {
  String: value
  String: type
}

MetaDataEntryGroup extends MetaDataEntry  {
  MetaDataEntry[]: items
}

Instead of adding the MetaDataEntry to a view, adding it to a list of MetaDataEntry that could be created on DocumentMetaData:

DocumentMetaData extends DocumentAnnotation {
   // ... all the stuff we already have in DocumentMetaData ...
   MetaDataEntry[]: entries
}

Alternative to extending Annotation would be to extend TOP and then only adding it to DocumentMetaData and not to the CAS view directly. That would mean that the MetaDataEntry could not be retrieved via the annotation index / via offsets. But it is expected that the offsets would always cover the whole document anyway. This could be a problem and require special handling if the annotations are added before the text is materialized; the respective code would have to know that all the MetaDataEntry annotations would need to be updated to match the materialized text in the end. UIMA handles this automatically for us for the DocumentAnnotation.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions