Skip to content
Discussion options

You must be logged in to vote

The lemmas depend on the POS, so whether the tagger thinks a word is a common noun (NOUN) or proper noun (PROPN) makes a difference in the lemmatization. The lemmas of proper nouns are left unchanged, while the common nouns will be lowercased and converted to singular.

The provided models are more likely to think that capitalized words are proper nouns, but it may also tag words like london as a proper noun because the training data is augmented to include some lowercased data to improve the results for more informal texts.

Examine the doc for the sentence and you can hopefully track down why particular words aren't matching:

for t in textLine:
    print(t.text, t.pos_, t.lemma_)
...
cert…

Replies: 4 comments

Comment options

You must be logged in to vote
0 replies
Comment options

You must be logged in to vote
0 replies
Answer selected by ines
Comment options

You must be logged in to vote
0 replies
Comment options

You must be logged in to vote
0 replies
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feat / matcher Feature: Token, phrase and dependency matcher
4 participants