-
-
Notifications
You must be signed in to change notification settings - Fork 86
Description
The documentation for the Preprocess Text widget doesn't show or explain the POS tags option in the Filtering dialogue at all, which is especially troublesome because 1) it's the only filtering option that requires you to specify the things you'd like to keep rather than the things you'd like to eliminate, and 2) the POS-taggers that are included as preprocessors use Penn Treebank tags (NN, NNS, NNP, VB, etc) not the generalized spacy-like tags that are pre-populated in the options (NOUN, VERB, etc).
That is, simply throwing in either of the POS taggers included in Orange3-text and then checking the "filter POS" box with the presets will eliminate your whole corpus because everything has "_NNS" and "_VB" tags, not "_NOUN" and "_VERB". I just downloaded a clean install of Orange 3.39.0 and Text 1.16.3 from the website, so this is out-of-the-box behavior.
It would be nice to have whitelist/blacklist options or to code in some way of using shortcuts for all verbs/all nouns, but at least for now I think the default presets should be updated to Penn Treebank tags and it should be added to the preprocessing documentation with a link to the full tagset.