Skip to content
Discussion options

You must be logged in to vote

The principled way to do this, that will cause the least overall weirdness, is to train a truecasing model (a model that can tell you what the case of words should be) and use that to process text before passing it to spaCy.

I think there should be a way to do this in a less principled way, by changing the lemmatizer to treat all proper nouns as normal nouns and lowercasing them before lookup, but it would require a bit of work with the Lemmatizer implementation. Maybe look at rule_lemmatize and implement something similar in a subclass, say special_lemmatize. Then you can use your own class and pass mode = "special" via the config to use it.

It's unfortunate that's kind of involved, it's…

Replies: 1 comment 1 reply

Comment options

You must be logged in to vote
1 reply
@jademlc
Comment options

Answer selected by polm
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feat / lemmatizer Feature: Rule-based and lookup lemmatization
2 participants