Skip to content

POS tag filtering needs documentationΒ #1149

@lhami

Description

@lhami

The documentation for the Preprocess Text widget doesn't show or explain the POS tags option in the Filtering dialogue at all, which is especially troublesome because 1) it's the only filtering option that requires you to specify the things you'd like to keep rather than the things you'd like to eliminate, and 2) the POS-taggers that are included as preprocessors use Penn Treebank tags (NN, NNS, NNP, VB, etc) not the generalized spacy-like tags that are pre-populated in the options (NOUN, VERB, etc).

That is, simply throwing in either of the POS taggers included in Orange3-text and then checking the "filter POS" box with the presets will eliminate your whole corpus because everything has "_NNS" and "_VB" tags, not "_NOUN" and "_VERB". I just downloaded a clean install of Orange 3.39.0 and Text 1.16.3 from the website, so this is out-of-the-box behavior.

It would be nice to have whitelist/blacklist options or to code in some way of using shortcuts for all verbs/all nouns, but at least for now I think the default presets should be updated to Penn Treebank tags and it should be added to the preprocessing documentation with a link to the full tagset.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions