Thanks for your helpful codebase!
I am a bit confused about stop words filtering.
The release code removes the document, if its stop words ratio below the certain cutoff.
|
cond = stopwords_ratio >= stopwords_min_cutoff |
But in
notebook, section 2.5 states
If the stop words ratio for a document is higher than a certain cutoff, it is removed.
I am wondering which one is more useful in your practice.
Thanks in advance!