Pos tag bug when processing contiguous emojis #10276
Answered
by
polm
charx7
asked this question in
Help: Other Questions
-
ProblemAfter loading the How to reproduce the behaviourimport spacy
import es_core_news_sm
# init spacy nlp
nlp = es_core_news_sm.load()
s1 = "que tenga un excelente fin de año líder y que este y los que sigan sean mejores 👏🙏"
s2 = "que tenga un excelente fin de año líder y que este y los que sigan sean mejores 👏🙏🙏"
doc = nlp(s1)
for word in doc:
print(f"The word: {word} has a pos tag {word.pos_}")
# >> observe that the POS tag of the 🙏 is PUNCT
doc = nlp(s2)
for word in doc:
print(f"The word: {word} has a pos tag {word.pos_}")
# >> observe that the POS tag of the 🙏 is PROPN We can observe that depending the number of contiguous 🙏 emojis the POS tag changes :( Your Environment
|
Beta Was this translation helpful? Give feedback.
Answered by
polm
Feb 14, 2022
Replies: 1 comment 1 reply
-
Please see #3052 - the statistical models make mistakes sometimes, and we can't address individual errors. Also note that spaCy v2.2.4 is pretty old at this point - v3 has been out over a year, and I'd recommend you give it a look. |
Beta Was this translation helpful? Give feedback.
1 reply
Answer selected by
svlandeg
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Please see #3052 - the statistical models make mistakes sometimes, and we can't address individual errors.
Also note that spaCy v2.2.4 is pretty old at this point - v3 has been out over a year, and I'd recommend you give it a look.