Skip to content

Check for Completely Correlated/Redundant Features and Remove Them #366

@xehu

Description

@xehu

Although it's acceptable to have features that are 'redundant' in the sense of taking different approaches to quantify the same idea (e.g., sentiment), it's less acceptable to just have repeated features that are completely identical. We should remove features that are perfect copies of each other.

To close this issue:

  • Check the features (before returning them to the user) for those that have high or perfect (e.g., >0.99) correlation
  • When features are perfectly correlated copies of one another, remove the copies such that only one copy remains.
  • Modify feature_dict and other objects that track the feature information so that it reflects the correct features that are currently included in the pipeline.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions