Skip to content

ENH: df.corr().top_correlated_features(N) A function to return from a dataframe the feature pairs ordered by strongest correlation (top N) #59639

@vahidnikougoftar

Description

@vahidnikougoftar

Feature Type

  • Adding new functionality to pandas

  • Changing existing functionality in pandas

  • Removing existing functionality in pandas

Problem Description

When I get a df.corr() matrix, I wish there was a function to return only top n pair of features with strongest correlation. This is doable with a few lines of code but having a one-stop shop function might be warranted due to how frequently it will be used.

Feature Description

Add a new method to df.corr() that returns top N pairs of features sorted by correlation strength. Something like :
df.corr().top_correlated_features(top=5)

Results:
colA , colC, 0.952
colB , colE, 0.921
.
.

Alternative Solutions

thanks to this post on stackoverflow, here is an easy solution that can be implemented in a method with a top_N arg:
corr_matrix = df.corr().abs()
corr_matrix.where(np.triu(np.ones(corr_matrix.shape),k=1).astype(bool)).stack().sort_values(ascending=False).head(N)

Additional Context

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    EnhancementNeeds TriageIssue that has not been reviewed by a pandas team member

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions