LLM models in spaCy requiring OpenAI key #13080
-
#The following code will throw the error (marked below) import spacy spaCy Error: Why is this defaulting to the OpenAI model? Is there a way to bypass this such that other models from HuggingFace (e.g. Dolly) or spaCy's own LLM models can be used for NER recognition? Thanks for your help. My Environment spaCy version 3.7.2 |
Beta Was this translation helpful? Give feedback.
Replies: 4 comments 2 replies
-
Hi @rshahrabani, Sorry that this has been confusing. Basically there are two ways to instantiate an
Example:
(Note that this will download the model from HF and will attempt to run it on your local machine!) We'll update the docs to make this more clear: #13082 |
Beta Was this translation helpful? Give feedback.
-
Hi Sofie,
Thanks for your reply. It appears that Dolly requires the usage of a GPU -
is that correct?.
I also had a question with regards to LLM vs. training the regular spaCy
large english model to make the NER recognize new entities using Prodigy.
The question is: will this be more efficient (both in terms of speed and
accuracy) versus using an LLM on machines that may or may not have a GPU.
What are the tradeoffs in using one over the other? Also, realistically
speaking, if we only have several hundred examples that we can train the
model on, can we expect accuracy in predictions?
Thanks for your help.
Ronny
…On Mon, Oct 23, 2023 at 6:18 AM Sofie Van Landeghem < ***@***.***> wrote:
Hi @rshahrabani <https://github.com/rshahrabani>,
Sorry that this has been confusing. Basically there are two ways to
instantiate an llm component for a spaCy nlp pipeline:
1. Specifying a config file. If you want to run various experiments
and test different things, I'd highly recommend you get familiar with the
config file, as it gives you a lot of flexibility and power. You can find
more information here
<https://github.com/explosion/spacy-llm/tree/main#using-a-config-file>
and various examples can be found in our examples
<https://github.com/explosion/spacy-llm/tree/main/usage_examples>
folder.
Example:
[components.llm]
factory = "llm"
[components.llm.model]
@llm_models = "spacy.Dolly.v1"
name = "dolly-v2-3b"
[components.llm.task]
@llm_tasks = "spacy.NER.v3"
labels = PERSON,ORGANISATION,LOCATION
examples = null
1. Using the built-in factories like "llm_ner". This is a shortcut
designed specifically to let users run quick experiments directly in
Python, and it indeed uses the GPT-3-5 model from OpenAI by default. Using
the same mechanism as the config from point 1, you can customize this
though, e.g.:
llm_ner = nlp.add_pipe("llm_ner", config={"model": ***@***.***_models": "spacy.Dolly.v1", "name": "dolly-v2-3b"}})
(Note that this will download the model from HF and will attempt to run it
on your local machine!)
We'll update the docs to make this more clear.
—
Reply to this email directly, view it on GitHub
<#13080 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AI5PW7DDD2WWH63LSUZISFTYAY77NAVCNFSM6AAAAAA6L2K2ZSVHI2DSMVQWIX3LMV43SRDJONRXK43TNFXW4Q3PNVWWK3TUHM3TGNJXGI4DI>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
Beta Was this translation helpful? Give feedback.
-
Hi Sofie, thanks for the info.
I had few more questions if you could answer them it would be a great help:
a) What is the recommended approach to splitting training versus evaluation
data? 50%-50% or some other variant?
b) I imagine train.json is the annotated file used for training the model -
what is the dev.json file in the assets folder of the ner_demo project?
c) given the requirement of finding the plaintiff and defendant in text and
the necessity of understanding the context of the language is it sufficient
to simply train NER pipeline for the new entities (PLAINTIFF & DEFENDANT)
or is training on other pipelines (dependencies, etc) also necessary for
greater accuracy? Here is an example of the text we might encounter (yellow
is the plaintiff in this case and blue is the defendant):
ABC Corporation (“Plaintiff”) today announced that it has submitted a claim
in Delaware Chancery Court (the "Claim") against XYZ Company (“XYZ”), an
entity owned by certain funds managed by 123 Capital Partners LLC ("123")
in partnership with 456, a leading alternative investment firm specializing
in infrastructure and real assets, pursuant to which the Plaintiff has
asked for a court order and injunction to block XYZ from further emissions
in its HMZ plant operations until such time as a proper environmental
assessment has taken place.
d) the ner_demo project creates ner, tok2vec and vocab folders - do we use
the model file from the ner folder for evaluating the performance, or some
other combination thereof?
e) The workflows section in the project.yml file runs the train command and the
train-with-vectors command is commented out. Can you explain the difference
between the two and when I should use one or the other:
workflows:
all: ... - train # - train-with-vectors ...
Thanks,
Ronny
…On Wed, Oct 25, 2023 at 5:04 AM Sofie Van Landeghem < ***@***.***> wrote:
Hi Ronny,
Yes - to run an LLM you'll need a proper GPU.
While LLMs can be quite powerful, for efficiency reasons it's often
recommended to train a smaller, task-specific supervised model instead. For
some of our company's thinking around this topic, you can have a look at
some of our recent talks <https://explosion.ai/events> or blog post
<https://explosion.ai/blog/against-llm-maximalism>.
—
Reply to this email directly, view it on GitHub
<#13080 (reply in thread)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AI5PW7DYTQERS2MV3V5RRO3YBDIY7AVCNFSM6AAAAAA6L2K2ZSVHI2DSMVQWIX3LMV43SRDJONRXK43TNFXW4Q3PNVWWK3TUHM3TGNZYHEZDQ>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
Beta Was this translation helpful? Give feedback.
-
Sofie,
With regards to item c) below, I have a corrected text as example - please
use this in your analysis:
ABC Corporation (“ABC”) today announced that it has submitted a claim in
Delaware Chancery Court (the "Claim") against XYZ Company (“XYZ”), an
entity owned by certain funds managed by 123 Capital Partners LLC ("123")
in partnership with 456, a leading alternative investment firm specializing
in infrastructure and real assets, pursuant to which the Plaintiff has
asked for a court order and injunction to block XYZ from further emissions
in its HMZ plant operations until such time as a proper environmental
assessment has taken place.
Thanks,
Ronny
On Wed, Oct 25, 2023 at 9:41 AM Ronny Shahrabani ***@***.***>
wrote:
… Hi Sofie, thanks for the info.
I had few more questions if you could answer them it would be a great help:
a) What is the recommended approach to splitting training versus
evaluation data? 50%-50% or some other variant?
b) I imagine train.json is the annotated file used for training the model
- what is the dev.json file in the assets folder of the ner_demo project?
c) given the requirement of finding the plaintiff and defendant in text
and the necessity of understanding the context of the language is it
sufficient to simply train NER pipeline for the new entities (PLAINTIFF &
DEFENDANT) or is training on other pipelines (dependencies, etc) also
necessary for greater accuracy? Here is an example of the text we might
encounter (yellow is the plaintiff in this case and blue is the defendant):
ABC Corporation (“Plaintiff”) today announced that it has submitted a
claim in Delaware Chancery Court (the "Claim") against XYZ Company (“XYZ”),
an entity owned by certain funds managed by 123 Capital Partners LLC
("123") in partnership with 456, a leading alternative investment firm
specializing in infrastructure and real assets, pursuant to which the
Plaintiff has asked for a court order and injunction to block XYZ from
further emissions in its HMZ plant operations until such time as a proper
environmental assessment has taken place.
d) the ner_demo project creates ner, tok2vec and vocab folders - do we use
the model file from the ner folder for evaluating the performance, or some
other combination thereof?
e) The workflows section in the project.yml file runs the train command
and the train-with-vectors command is commented out. Can you explain the difference
between the two and when I should use one or the other:
workflows:
all: ... - train # - train-with-vectors ...
Thanks,
Ronny
On Wed, Oct 25, 2023 at 5:04 AM Sofie Van Landeghem <
***@***.***> wrote:
> Hi Ronny,
>
> Yes - to run an LLM you'll need a proper GPU.
>
> While LLMs can be quite powerful, for efficiency reasons it's often
> recommended to train a smaller, task-specific supervised model instead. For
> some of our company's thinking around this topic, you can have a look at
> some of our recent talks <https://explosion.ai/events> or blog post
> <https://explosion.ai/blog/against-llm-maximalism>.
>
> —
> Reply to this email directly, view it on GitHub
> <#13080 (reply in thread)>,
> or unsubscribe
> <https://github.com/notifications/unsubscribe-auth/AI5PW7DYTQERS2MV3V5RRO3YBDIY7AVCNFSM6AAAAAA6L2K2ZSVHI2DSMVQWIX3LMV43SRDJONRXK43TNFXW4Q3PNVWWK3TUHM3TGNZYHEZDQ>
> .
> You are receiving this because you were mentioned.Message ID:
> ***@***.***>
>
|
Beta Was this translation helpful? Give feedback.
Hi @rshahrabani,
Sorry that this has been confusing. Basically there are two ways to instantiate an
llm
component for a spaCynlp
pipeline:Example:
"ll…