Skip to content
Discussion options

You must be logged in to vote

Hi venti07!

How can I convert the individual dataframes or jsonl files to .spacy format?

You can check this documentation on how to perform serialization. In summary, you need to convert your texts into a spaCy Doc object, then collate them into a DocBin object, then save to disk. Converting to Doc depends on what kind of data you already have.

Since you annotated via Prodigy, you can export your dataset straight to spaCy by using the data-to-spacy command

How can I create a loop to train with the individual datasets? Spacy can be used primarily via CLI.

You have the following options:

  1. Write a bash script that loops through your dataset and constantly call spacy train.
  2. In Python, you…

Replies: 2 comments 3 replies

Comment options

You must be logged in to vote
2 replies
@venti07
Comment options

@ljvmiranda921
Comment options

Answer selected by venti07
Comment options

You must be logged in to vote
1 reply
@ljvmiranda921
Comment options

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feat / training Feature: Training utils, Example, Corpus and converters
2 participants