Skip to content

Conversation

@x-tabdeveloping
Copy link
Owner

Added the following features:

Automatic detection of n_components

KeyNMF and GMM can now automatically detect the number of topics using the Bayesian Information Criterion.
The update also contains methods for effectively optimizing this quantity instead of using grid search.

from turftopic import KeyNMF, GMM

model = KeyNMF("auto")
model = GMM("auto")

[BETA] Contextualized Chunk Embeddings

You can now extract contextualized chunks' embeddings from documents with sentence-transformers using encode_chunks. This can sometimes enhance the performance of clustering topic models since they get access to smaller chunks of documents. More functionality coming soon.

from sentence_transformers import SentenceTransformer
from turftopic.encoders.utils import encode_chunks

encoder = SentenceTransformer("all-MiniLM-L6-v2")
chunk_embeddings, chunks = encode_chunks(encoder, sentences, return_chunks=True)

[BETA] Topeax

Added a new topic model, which detects clusters based on density peaks in document embedding space.
More details coming soon.

from turftopic import Topeax

model = Topeax()
model.fit(corpus)

@x-tabdeveloping x-tabdeveloping merged commit dfd7fac into main Oct 30, 2025
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants