Skip to content
Discussion options

You must be logged in to vote

Hi, the is_stop attribute is designed so that it works without any additional pipeline components like a tagger or a lemmatizer, so by default it currently only checks whether the lowercase form of the token is in the stop word list. You can extend the stop word list by customizing the language defaults before loading the model or by creating a custom language. See: https://spacy.io/usage/linguistic-features#language-subclass

In general, for modern NLP techniques, it's not helpful to remove stop words, so you may not need this step at all. Some relevant discussions: #7228 (comment), #7637 (comment)

Replies: 2 comments

Comment options

You must be logged in to vote
0 replies
Comment options

You must be logged in to vote
0 replies
Answer selected by svlandeg
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feat / doc Feature: Doc, Span and Token objects
2 participants
Converted from issue

This discussion was converted from issue #8230 on May 31, 2021 13:08.