Part of Speech Tagger: "this/that/these/those" Pronouns / Determiners distinction not made #7769
Replies: 3 comments 1 reply
-
Hi Joseph! Thanks for your kind words :-) While the trained pipelines that we provide achieve relatively high performance scores on general-purpose text, no Machine Learning model will always be 100% accurate. Some lexical constructs may for instance be underrepresented in the training data, causing the ML models to error on them when they occur during prediction time. If you find a structural problem with some of the predictions of the provided models on your specific texts, it's a good idea to provide them with more training data. This section in the docs details the process of training for spaCy v3: https://spacy.io/usage/training, and specifically you'll want to start from the pretrained model by sourcing the correct component. When you're retraining/finetuning the model, it's important to realise that the ML algorithms will start adjusting themselves to the new data you're feeding them, meaning that eventually they may forget about the old patterns they learned initially. This is what we call the problem of "catastrophic forgetting". To prevent this from happening, you typically want to make sure that the new training data consists of a mixture of new data + old data. Happy to discuss in more details if you have specific questions! |
Beta Was this translation helpful? Give feedback.
-
At first I thought this might be something to do with this usage being rare in the training data, but taking a look at it, it seems that it's a quirk of the annotation scheme, and "this" is always annotated as a determiner, even in sentences like "This is Japan". This doesn't seem to be an error. From the Penn Treebank Annotation guidelines on the DT tag:
The Penn Treebank doesn't seem to have any other suitable tag for "this", as the pronoun tags are only personal, possessive, or wh- pronouns (including "that"). PRON is used in UD corpora. I think you would need a different training corpus to get a different tag for non-modifier instances of "this". |
Beta Was this translation helpful? Give feedback.
-
Thank you for your answers! Much appreciated. You are right that it's
characteristic of the corpus and hence learned. I will have to adjust.
It seems to be a historical accident in training corpora (BTW, CHILDES
corpus also uses PRON). Very weird, the semantics of "this" as determiner
don't add up in the pronoun uses. I wonder how would lambda calculus
evaluate the sentence, But that's another question.. :)
I will supply the training corpus with the correct tags or add a post
processing step to restore correct semantics.
Thank you
Joseph
…On Wed, 14 Apr 2021, 08:36 Sofie Van Landeghem, ***@***.***> wrote:
Ah, interesting, so the ML algorithms are actually doing it "right"
considering the data they saw...
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#7769 (reply in thread)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ACL5YYYQJATMIKFOQQPQUFTTIVAWNANCNFSM423P3COA>
.
|
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Dear spaCy,
I hope you are well, your product is fantastic!
Running your default English POS tagger, I find that sentences such as "this tastes nice", tag "this" as DET.
It seems to be incorrect, a determiner modifies the referent of a following noun.
However, "this" can also be referential in itself (and very often is), you can easily see it by substituting "it": "It tastes nice". In this position, "this" serves as a pronoun. and more precisely - as a demonstrative pronoun.
for instance,
(1) This is a nice book - PRON AUX DET ADJ NOUN
(1a) It is a nice book
(2) This book is nice - DET NOUN AUX ADJ
(2a) *It book is nice
Can you please explain how would I change the default tagging behaviour regarding demonstrative pronouns? Is there a way to teach it to spaCy?
Kind Regards,
Joseph
Beta Was this translation helpful? Give feedback.
All reactions