Is there a way to create models without Prodigy - i.e. use spaCy without Prodigy? #10285
-
I'm trying to create an Entity Linker model using my own texts and identifiers. It seems that the only way to create the meta data for the model is to use Prodigy. Prodigy isn't available for free and all the documentation seems to point to Prodigy as the only way to create models or create your own custom Entity Linker. Am I missing the documentation do this without Prodigy? |
Beta Was this translation helpful? Give feedback.
Replies: 2 comments 2 replies
-
Prodigy provides a UI for creating data, but you can also use another tool, create the data using a script, or create it manually by writing files directly. The code for actually training a model is all in spaCy. What documentation were you looking at that made you think Prodigy was required? Prodigy has some tools to help automate an iterative annotations/training flow, but it's not required for anything in spaCy, including entity linking. |
Beta Was this translation helpful? Give feedback.
-
I experienced a similar issue. I do have access to Prodigy but in some cases have data that has been annotated in some way other than prodigy. It may not be that the documentation says prodigy is required for training an entity linker but the tutorial and examples can cause that to be inferred. I used the https://github.com/explosion/projects/tree/v3/tutorials/nel_emerson repo as the source for information. For the case of the entity linker example the annotated jsonl that comes from prodigy will look something like the first bit of jsonl below. A bulk of what is seen there are artifacts from the prodigy annotation session, like all the options you were able to choose from. Many of them are not needed for actually training an entity linker but that can require having prodigy to know how it structures its output and then also understanding how the code in the example repo processes those items. As an example any observations that do not have an answer of accept is not moved from the starting jsonl annotated data to the .spacy file that is used to train, line 18 here. The second bit of json below is all that is really needed to train this component but that took some trial and error to figure out what all expectations exist for the training data. It is the same observation as the other, just stripped of all the content that prodigy adds, most of which does not make it to the spacy file that gets used but there is not really an easy way to read the spacy file to see what they look like or a definition somewhere as to what that component needs to train.
|
Beta Was this translation helpful? Give feedback.
Prodigy provides a UI for creating data, but you can also use another tool, create the data using a script, or create it manually by writing files directly.
The code for actually training a model is all in spaCy. What documentation were you looking at that made you think Prodigy was required? Prodigy has some tools to help automate an iterative annotations/training flow, but it's not required for anything in spaCy, including entity linking.