LLM models in spaCy requiring OpenAI key #13080

rshahrabani · 2023-10-21T14:47:54Z

rshahrabani
Oct 21, 2023

#The following code will throw the error (marked below)

import spacy
nlp = spacy.blank("en")
#this next line throws the error below
llm_ner = nlp.add_pipe("llm_ner")

spaCy Error:
C:\Program Files\Python311\Lib\site-packages\spacy_llm\models\rest\openai\model.py:25: UserWarning: Could not find the API key to access the OpenAI API. Ensure you have an API key set up via https://platform.openai.com/account/api-keys, then make it available as an environment variable 'OPENAI_API_KEY'.

Why is this defaulting to the OpenAI model? Is there a way to bypass this such that other models from HuggingFace (e.g. Dolly) or spaCy's own LLM models can be used for NER recognition?

Thanks for your help.
Ronny

My Environment
============================== Info about spaCy ==============================

spaCy version 3.7.2
Location C:\Program Files\Python\Lib\site-packages\spacy
Platform Windows-11-10.0.22621-SP0
Python version 3.12.0
Pipelines en_core_web_lg (3.7.0), en_core_web_md (3.7.0), en_core_web_sm (3.7.0)

Answered by svlandeg

Oct 23, 2023

Hi @rshahrabani,

Sorry that this has been confusing. Basically there are two ways to instantiate an llm component for a spaCy nlp pipeline:

Specifying a config file. If you want to run various experiments and test different things, I'd highly recommend you get familiar with the config file, as it gives you a lot of flexibility and power. You can find more information here and various examples can be found in our examples folder.

Example:

[components.llm]
factory = "llm"

[components.llm.model]
@llm_models = "spacy.Dolly.v1"
name = "dolly-v2-3b"

[components.llm.task]
@llm_tasks = "spacy.NER.v3"
labels = PERSON,ORGANISATION,LOCATION
examples = null

Using the built-in factories like "ll…

View full answer

svlandeg · 2023-10-23T10:18:18Z

svlandeg
Oct 23, 2023

Hi @rshahrabani,

Sorry that this has been confusing. Basically there are two ways to instantiate an llm component for a spaCy nlp pipeline:

Specifying a config file. If you want to run various experiments and test different things, I'd highly recommend you get familiar with the config file, as it gives you a lot of flexibility and power. You can find more information here and various examples can be found in our examples folder.

Example:

[components.llm]
factory = "llm"

[components.llm.model]
@llm_models = "spacy.Dolly.v1"
name = "dolly-v2-3b"

[components.llm.task]
@llm_tasks = "spacy.NER.v3"
labels = PERSON,ORGANISATION,LOCATION
examples = null

Using the built-in factories like "llm_ner". This is a shortcut designed specifically to let users run quick experiments directly in Python, and it indeed uses the GPT-3-5 model from OpenAI by default. Using the same mechanism as the config from point 1, you can customize this though, e.g.:

llm_ner = nlp.add_pipe("llm_ner", config={"model": {"@llm_models": "spacy.Dolly.v1", "name": "dolly-v2-3b"}})

(Note that this will download the model from HF and will attempt to run it on your local machine!)

We'll update the docs to make this more clear: #13082

0 replies

rshahrabani · 2023-10-23T14:00:19Z

rshahrabani
Oct 23, 2023
Author

Hi Sofie, Thanks for your reply. It appears that Dolly requires the usage of a GPU - is that correct?. I also had a question with regards to LLM vs. training the regular spaCy large english model to make the NER recognize new entities using Prodigy. The question is: will this be more efficient (both in terms of speed and accuracy) versus using an LLM on machines that may or may not have a GPU. What are the tradeoffs in using one over the other? Also, realistically speaking, if we only have several hundred examples that we can train the model on, can we expect accuracy in predictions? Thanks for your help. Ronny

…

On Mon, Oct 23, 2023 at 6:18 AM Sofie Van Landeghem < ***@***.***> wrote: Hi @rshahrabani <https://github.com/rshahrabani>, Sorry that this has been confusing. Basically there are two ways to instantiate an llm component for a spaCy nlp pipeline: 1. Specifying a config file. If you want to run various experiments and test different things, I'd highly recommend you get familiar with the config file, as it gives you a lot of flexibility and power. You can find more information here <https://github.com/explosion/spacy-llm/tree/main#using-a-config-file> and various examples can be found in our examples <https://github.com/explosion/spacy-llm/tree/main/usage_examples> folder. Example: [components.llm] factory = "llm" [components.llm.model] @llm_models = "spacy.Dolly.v1" name = "dolly-v2-3b" [components.llm.task] @llm_tasks = "spacy.NER.v3" labels = PERSON,ORGANISATION,LOCATION examples = null 1. Using the built-in factories like "llm_ner". This is a shortcut designed specifically to let users run quick experiments directly in Python, and it indeed uses the GPT-3-5 model from OpenAI by default. Using the same mechanism as the config from point 1, you can customize this though, e.g.: llm_ner = nlp.add_pipe("llm_ner", config={"model": ***@***.***_models": "spacy.Dolly.v1", "name": "dolly-v2-3b"}}) (Note that this will download the model from HF and will attempt to run it on your local machine!) We'll update the docs to make this more clear. — Reply to this email directly, view it on GitHub <#13080 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AI5PW7DDD2WWH63LSUZISFTYAY77NAVCNFSM6AAAAAA6L2K2ZSVHI2DSMVQWIX3LMV43SRDJONRXK43TNFXW4Q3PNVWWK3TUHM3TGNJXGI4DI> . You are receiving this because you were mentioned.Message ID: ***@***.***>

1 reply

svlandeg Oct 25, 2023

Hi Ronny,

Yes - to run an LLM you'll need a proper GPU.

While LLMs can be quite powerful, for efficiency reasons it's often recommended to train a smaller, task-specific supervised model instead. For some of our company's thinking around this topic, you can have a look at some of our recent talks or blog post.

rshahrabani · 2023-10-25T13:41:42Z

rshahrabani
Oct 25, 2023
Author

Hi Sofie, thanks for the info. I had few more questions if you could answer them it would be a great help: a) What is the recommended approach to splitting training versus evaluation data? 50%-50% or some other variant? b) I imagine train.json is the annotated file used for training the model - what is the dev.json file in the assets folder of the ner_demo project? c) given the requirement of finding the plaintiff and defendant in text and the necessity of understanding the context of the language is it sufficient to simply train NER pipeline for the new entities (PLAINTIFF & DEFENDANT) or is training on other pipelines (dependencies, etc) also necessary for greater accuracy? Here is an example of the text we might encounter (yellow is the plaintiff in this case and blue is the defendant): ABC Corporation (“Plaintiff”) today announced that it has submitted a claim in Delaware Chancery Court (the "Claim") against XYZ Company (“XYZ”), an entity owned by certain funds managed by 123 Capital Partners LLC ("123") in partnership with 456, a leading alternative investment firm specializing in infrastructure and real assets, pursuant to which the Plaintiff has asked for a court order and injunction to block XYZ from further emissions in its HMZ plant operations until such time as a proper environmental assessment has taken place. d) the ner_demo project creates ner, tok2vec and vocab folders - do we use the model file from the ner folder for evaluating the performance, or some other combination thereof? e) The workflows section in the project.yml file runs the train command and the train-with-vectors command is commented out. Can you explain the difference between the two and when I should use one or the other: workflows: all: ... - train # - train-with-vectors ... Thanks, Ronny

…

On Wed, Oct 25, 2023 at 5:04 AM Sofie Van Landeghem < ***@***.***> wrote: Hi Ronny, Yes - to run an LLM you'll need a proper GPU. While LLMs can be quite powerful, for efficiency reasons it's often recommended to train a smaller, task-specific supervised model instead. For some of our company's thinking around this topic, you can have a look at some of our recent talks <https://explosion.ai/events> or blog post <https://explosion.ai/blog/against-llm-maximalism>. — Reply to this email directly, view it on GitHub <#13080 (reply in thread)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AI5PW7DYTQERS2MV3V5RRO3YBDIY7AVCNFSM6AAAAAA6L2K2ZSVHI2DSMVQWIX3LMV43SRDJONRXK43TNFXW4Q3PNVWWK3TUHM3TGNZYHEZDQ> . You are receiving this because you were mentioned.Message ID: ***@***.***>

1 reply

svlandeg Nov 7, 2023

Hi @rshahrabani,

It's difficult for us to engage with support threads if they're a heterogeneous set of questions that are unrelated to eachother, contain general, non-spaCy specific questions and duplicates. Please consider that we provide this software and support for free, and in the interest of being able to help everyone who runs into specific issues with spaCy, we have to limit support on more generic questions or things that are abundantly described in our documentation.

As this particular thread was opened on the topic of spacy-llm, I'll follow up with your questions on the ner_demo project in the other topic.

rshahrabani · 2023-10-25T13:45:17Z

rshahrabani
Oct 25, 2023
Author

Sofie, With regards to item c) below, I have a corrected text as example - please use this in your analysis: ABC Corporation (“ABC”) today announced that it has submitted a claim in Delaware Chancery Court (the "Claim") against XYZ Company (“XYZ”), an entity owned by certain funds managed by 123 Capital Partners LLC ("123") in partnership with 456, a leading alternative investment firm specializing in infrastructure and real assets, pursuant to which the Plaintiff has asked for a court order and injunction to block XYZ from further emissions in its HMZ plant operations until such time as a proper environmental assessment has taken place. Thanks, Ronny On Wed, Oct 25, 2023 at 9:41 AM Ronny Shahrabani ***@***.***> wrote:

…

Hi Sofie, thanks for the info. I had few more questions if you could answer them it would be a great help: a) What is the recommended approach to splitting training versus evaluation data? 50%-50% or some other variant? b) I imagine train.json is the annotated file used for training the model - what is the dev.json file in the assets folder of the ner_demo project? c) given the requirement of finding the plaintiff and defendant in text and the necessity of understanding the context of the language is it sufficient to simply train NER pipeline for the new entities (PLAINTIFF & DEFENDANT) or is training on other pipelines (dependencies, etc) also necessary for greater accuracy? Here is an example of the text we might encounter (yellow is the plaintiff in this case and blue is the defendant): ABC Corporation (“Plaintiff”) today announced that it has submitted a claim in Delaware Chancery Court (the "Claim") against XYZ Company (“XYZ”), an entity owned by certain funds managed by 123 Capital Partners LLC ("123") in partnership with 456, a leading alternative investment firm specializing in infrastructure and real assets, pursuant to which the Plaintiff has asked for a court order and injunction to block XYZ from further emissions in its HMZ plant operations until such time as a proper environmental assessment has taken place. d) the ner_demo project creates ner, tok2vec and vocab folders - do we use the model file from the ner folder for evaluating the performance, or some other combination thereof? e) The workflows section in the project.yml file runs the train command and the train-with-vectors command is commented out. Can you explain the difference between the two and when I should use one or the other: workflows: all: ... - train # - train-with-vectors ... Thanks, Ronny On Wed, Oct 25, 2023 at 5:04 AM Sofie Van Landeghem < ***@***.***> wrote: > Hi Ronny, > > Yes - to run an LLM you'll need a proper GPU. > > While LLMs can be quite powerful, for efficiency reasons it's often > recommended to train a smaller, task-specific supervised model instead. For > some of our company's thinking around this topic, you can have a look at > some of our recent talks <https://explosion.ai/events> or blog post > <https://explosion.ai/blog/against-llm-maximalism>. > > — > Reply to this email directly, view it on GitHub > <#13080 (reply in thread)>, > or unsubscribe > <https://github.com/notifications/unsubscribe-auth/AI5PW7DYTQERS2MV3V5RRO3YBDIY7AVCNFSM6AAAAAA6L2K2ZSVHI2DSMVQWIX3LMV43SRDJONRXK43TNFXW4Q3PNVWWK3TUHM3TGNZYHEZDQ> > . > You are receiving this because you were mentioned.Message ID: > ***@***.***> >

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

LLM models in spaCy requiring OpenAI key #13080

Uh oh!

{{title}}

Uh oh!

Replies: 4 comments 2 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Uh oh!

LLM models in spaCy requiring OpenAI key #13080

Uh oh!

rshahrabani Oct 21, 2023

Replies: 4 comments · 2 replies

Uh oh!

Uh oh!

svlandeg Oct 23, 2023

Uh oh!

rshahrabani Oct 23, 2023 Author

Uh oh!

svlandeg Oct 25, 2023

Uh oh!

rshahrabani Oct 25, 2023 Author

Uh oh!

svlandeg Nov 7, 2023

Uh oh!

rshahrabani Oct 25, 2023 Author

rshahrabani
Oct 21, 2023

Replies: 4 comments 2 replies

svlandeg
Oct 23, 2023

rshahrabani
Oct 23, 2023
Author

rshahrabani
Oct 25, 2023
Author

rshahrabani
Oct 25, 2023
Author