Skip to content

Dump raw training data for the LLM-jp-3 series #46

@hkiyomaru

Description

@hkiyomaru

Dump raw training data for the LLM-jp-3 series. For each training instance, the following fields should be included at least:

  • token_ids: A list of token IDs for the training instance
  • training_step: Training step at which the training instance was processed
  • dataset: Name of the dataset from which the instance was sourced
  • document_ids: IDs of the documents associated with the training instance

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions