Added feature importance methods based on cluster differences #102

x-tabdeveloping · 2025-06-03T11:21:13Z

I already started working on a fighting-words based feature importance estimation method here: #75
Since then I have found a better method for measuring semantic difference between clusters based on linear classifiers.
I decided to use LinearDiscrimantAnalysis since it is orders of magnitudes faster than other methods, and it takes forever to estimate feature importance for each of the levels of the hierarchy otherwise when reducing the number of topics.

I am also planning to add supervised topic modelling based on these feature importance methods in the near future.

…ically stable

…pic and Top2Vec more flexible

docs/clustering.md

turftopic/models/cluster.py

docs/clustering.md

turftopic/models/decomp.py

x-tabdeveloping added 9 commits June 3, 2025 10:49

Added feature importance with linear classifier

0d0d22f

Added linear importance to clustering models

b5bcd1f

Added fighting words feature importance

d579036

Removed unnecessary imports

42ef21c

Added fighting words to clustering models and made linkage more numer…

faba545

…ically stable

Added semi-supervised modelling for clustering models, and made BERTo…

ebe1b3a

…pic and Top2Vec more flexible

Added proper label factorization for semi-supervised models

6b7fa4a

Added docs for feature importance and semi-supervision

410b20a

Added supervised S^3

78a5a2a

x-tabdeveloping requested a review from KennethEnevoldsen June 17, 2025 08:41

KennethEnevoldsen reviewed Jun 17, 2025

View reviewed changes

x-tabdeveloping added 2 commits June 19, 2025 11:40

Rephrased documentation and readded formulae

6f78628

Rephrased warning message

d38bcfe

x-tabdeveloping requested a review from KennethEnevoldsen June 19, 2025 09:41

Merge branch 'main' into logreg_clustering

059c656

x-tabdeveloping merged commit 57e7d02 into main Jun 23, 2025
2 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Added feature importance methods based on cluster differences #102

Added feature importance methods based on cluster differences #102

Uh oh!

x-tabdeveloping commented Jun 3, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Added feature importance methods based on cluster differences #102

Added feature importance methods based on cluster differences #102

Uh oh!

Conversation

x-tabdeveloping commented Jun 3, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants