Skip to content

Feature: content/metadata file splitter #3

@Uinelj

Description

@Uinelj

Since the new pipeline outputs a unique .jsonl file containing both textual content and metadata, working exclusively with textual data may be less comfortable:

  1. Forced download of metadata that will be discarded
  2. Forced usage of a jsonl-compatible parser in order to extract content

For now, we should provide a tool that splits up a given OSCAR Schema v2 file into a OSCAR Schema v1.2-compatible file.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    Status

    Selected

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions