NER of spacy 3.0.6 does not recognize GPE which was recognized in version 2.2.4 #8478
-
Using the doc= nlp('haifa, israel') and checking entities, in version 2.4.4 both were recognized as GPE, whereas in version 3.0.6 only 'israel' is recognized. I tested it on Colab and on PythonAnywhere - same results For the time being I went back to version 2.2.4 My email is [email protected] and my name is Zvi |
Beta Was this translation helpful? Give feedback.
Replies: 6 comments 2 replies
-
In the v2 models we used data augmentation to reduce sensitivity to wrong case usage, like in your example text. We accidentally didn't use that for the models for 3.0, but will be adding it back in the next time we release the models. Unfortunately this does mean that at the moment the v3 models are overly sensitive to case changes. If you have a specific list of entities you want to recognize you can work around this by using an EntityRuler. In extremely short text like your example, there's no useful context, so that's going to be a difficult case in general, and if you know the place names you expect to see using a simple list can be very effective. |
Beta Was this translation helpful? Give feedback.
-
I do not think it is ‘wrong case usage’
It does not recognize ‘poland’ alone.
It does recognize ‘israel’ alone but not ‘, israel’ with a space in front of ‘israel’
All this cases work fine in version 2.2.4
From: polm ***@***.***
Sent: יום ד 23 יוני 2021 08:38
To: explosion/spaCy
Cc: Zvi Barak; Author
Subject: Re: [explosion/spaCy] NER of spacy 3.0.6 does not recognize GPE which was recognized in version 2.2.4 (#8478)
In the v2 models we used data augmentation to reduce sensitivity to wrong case usage, like in your example text. We accidentally didn't use that for the models for 3.0, but will be adding it back in the next time we release the models. Unfortunately this does mean that at the moment the v3 models are overly sensitive to case changes.
If you have a specific list of entities you want to recognize you can work around this by using an EntityRuler. In extremely short text like your example, there's no useful context, so that's going to be a difficult case in general, and if you know the place names you expect to see using a simple list can be very effective.
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub <#8478 (comment)> , or unsubscribe <https://github.com/notifications/unsubscribe-auth/ADEXVQN5G2PX42O22TDERKDTUFXLXANCNFSM47FBREKA> . <https://github.com/notifications/beacon/ADEXVQIJU6HUUETCC7QZLV3TUFXLXA5CNFSM47FBREKKYY3PNVWWK3TUL52HS4DFWFCGS43DOVZXG2LPNZBW63LNMVXHJKTDN5WW2ZLOORPWSZGOAAG5XHI.gif>
|
Beta Was this translation helpful? Give feedback.
-
It does not recognize also “Poland”, “, Israel” and not “Israel” in the sentence “what is the weather in summer, Israel”
From: polm ***@***.***
Sent: יום ד 23 יוני 2021 09:54
To: explosion/spaCy
Cc: Zvi Barak; Author
Subject: Re: [explosion/spaCy] NER of spacy 3.0.6 does not recognize GPE which was recognized in version 2.2.4 (#8478)
When I say "wrong case usage" I mean that you have words that would normally be written in upper case (like "Israel", "Poland") rather than lower case. Since spaCy's training data is based largely on newspapers and other relatively formal texts, they never have words like that written in lower case in their training data. Without augmentation the model has never seen those words in lower case and won't recognize them as entities.
We know this is a regression and are working on fixing it, but updating the models is a big release so it's not something we can do overnight.
It does recognize ‘israel’ alone but not ‘, israel’ with a space in front of ‘israel’
That is strange, but behavior with short texts is less predictable anyway. I understand this is frustrating but keep in mind that debugging individual cases is difficult, see #3052 <#3052> .
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub <#8478 (reply in thread)> , or unsubscribe <https://github.com/notifications/unsubscribe-auth/ADEXVQJC6DVRTWOTS4OMGPTTUGAHBANCNFSM47FBREKA> . <https://github.com/notifications/beacon/ADEXVQMJJFVE3ODQPP4QBYTTUGAHBA5CNFSM47FBREKKYY3PNVWWK3TUL52HS4DFWFCGS43DOVZXG2LPNZBW63LNMVXHJKTDN5WW2ZLOORPWSZGOAAG5Y5Y.gif>
|
Beta Was this translation helpful? Give feedback.
-
Hi,
Why is the lemma_ of ‘clouds’ is ‘clouds’ in the case of ("rain, clouds, weather in toronto, canada")
but in the sentence "clouds, weather in toronto, canada" the lemma_ is properly ‘cloud’
Spacy version I am using is 2.2.4
Zvi
|
Beta Was this translation helpful? Give feedback.
-
Hi,
Still waiting for a response
Zvi
From: drbarak ***@***.***
Sent: יום ד 30 יוני 2021 01:28
To: 'explosion/spaCy'
Subject: Lemma_ of 'clouds' not giving 'cloud' sometimes
Hi,
Why is the lemma_ of ‘clouds’ is ‘clouds’ in the case of ("rain, clouds, weather in toronto, canada")
but in the sentence "clouds, weather in toronto, canada" the lemma_ is properly ‘cloud’
Spacy version I am using is 2.2.4
Zvi
|
Beta Was this translation helpful? Give feedback.
-
Hi
Using version 2.2.4
Why is that?
Zvi
From: drbarak ***@***.***
Sent: יום ד 30 יוני 2021 01:28
To: 'explosion/spaCy'
Subject: Lemma_ of 'clouds' not giving 'cloud' sometimes
Hi,
Why is the lemma_ of ‘clouds’ is ‘clouds’ in the case of ("rain, clouds, weather in toronto, canada")
but in the sentence "clouds, weather in toronto, canada" the lemma_ is properly ‘cloud’
Spacy version I am using is 2.2.4
Zvi
|
Beta Was this translation helpful? Give feedback.
In the v2 models we used data augmentation to reduce sensitivity to wrong case usage, like in your example text. We accidentally didn't use that for the models for 3.0, but will be adding it back in the next time we release the models. Unfortunately this does mean that at the moment the v3 models are overly sensitive to case changes.
If you have a specific list of entities you want to recognize you can work around this by using an EntityRuler. In extremely short text like your example, there's no useful context, so that's going to be a difficult case in general, and if you know the place names you expect to see using a simple list can be very effective.