Data set conversion and cross validation with spacy #11308
-
I used prodi.gy to annotate a NER dataset. After that, I exported the data as .jsonl file. I now want to create datasets to be able to perform cross validation. To do this, I convert the files to a dataframe and then back to a .jsonl. I have a few questions about this: How can I convert the individual dataframes or jsonl files to .spacy format? Many thanks for your help! |
Beta Was this translation helpful? Give feedback.
Replies: 2 comments 3 replies
-
Hi venti07!
You can check this documentation on how to perform serialization. In summary, you need to convert your texts into a spaCy Since you annotated via Prodigy, you can export your dataset straight to spaCy by using the
You have the following options:
You can check the |
Beta Was this translation helpful? Give feedback.
-
I have implemented the whole thing. Input is a dataframe which was created from the .jsonl of prodigy. Maybe you can take a look at it again, if it looks good for you? I was able to use the SpaCy files to train a model. I am unsure if I have thought of everything though and look forward to feedback.
|
Beta Was this translation helpful? Give feedback.
Hi venti07!
You can check this documentation on how to perform serialization. In summary, you need to convert your texts into a spaCy
Doc
object, then collate them into aDocBin
object, then save to disk. Converting toDoc
depends on what kind of data you already have.Since you annotated via Prodigy, you can export your dataset straight to spaCy by using the
data-to-spacy
commandYou have the following options: