Skip to content

Commit b3aa266

Browse files
author
Maarten Grootendorst
authored
v0.9.2 (#239)
* Update default embedding model from 'paraphrase' to 'all' * Fix probability mapping * Optimize cTFIDF topic extraction * Fix algorithm image, update documentation, fix spelling, etc. * Fix #258 * Update README with visualization example
1 parent 0b32167 commit b3aa266

File tree

18 files changed

+269
-129
lines changed

18 files changed

+269
-129
lines changed

.github/workflows/testing.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -26,6 +26,6 @@ jobs:
2626
- name: Install dependencies
2727
run: |
2828
python -m pip install --upgrade pip
29-
pip install -e ".[dev]"
29+
pip install -e ".[test]"
3030
- name: Run Checking Mechanisms
3131
run: make check

README.md

Lines changed: 40 additions & 25 deletions
Original file line numberDiff line numberDiff line change
@@ -111,6 +111,20 @@ topic_model.visualize_topics()
111111

112112
<img src="images/topic_visualization.gif" width="60%" height="60%" align="center" />
113113

114+
We can create an overview of the most frequent topics in a way that they are easily interpretable.
115+
Horizontal barcharts typically convey information rather well and allow for an intuitive representation
116+
of the topics:
117+
118+
```python
119+
topic_model.visualize_barchart()
120+
```
121+
122+
<img src="images/topics.png" width="70%" height="70%" align="center" />
123+
124+
125+
Find all possible visualizations with interactive examples in the documentation
126+
[here](https://maartengr.github.io/BERTopic/tutorial/visualization/visualization.html).
127+
114128
## Embedding Models
115129
BERTopic supports many embedding models that can be used to embed the documents and words:
116130
* Sentence-Transformers
@@ -119,12 +133,12 @@ BERTopic supports many embedding models that can be used to embed the documents
119133
* Gensim
120134
* USE
121135

122-
[**Sentence-Transformers**]() is typically used as it has shown great results embedding documents
136+
[**Sentence-Transformers**](https://github.com/UKPLab/sentence-transformers) is typically used as it has shown great results embedding documents
123137
meant for semantic similarity. Simply select any from their documentation
124138
[here](https://www.sbert.net/docs/pretrained_models.html) and pass it to BERTopic:
125139

126140
```python
127-
topic_model = BERTopic(embedding_model="paraphrase-MiniLM-L6-v2")
141+
topic_model = BERTopic(embedding_model="all-MiniLM-L6-v2")
128142
```
129143

130144
[**Flair**](https://github.com/flairNLP/flair) allows you to choose almost any 🤗 transformers model. Simply
@@ -185,34 +199,35 @@ For quick access to common functions, here is an overview of BERTopic's main met
185199

186200
| Method | Code |
187201
|-----------------------|---|
188-
| Fit the model | `BERTopic().fit(docs)` |
189-
| Fit the model and predict documents | `BERTopic().fit_transform(docs)` |
190-
| Predict new documents | `BERTopic().transform([new_doc])` |
191-
| Access single topic | `BERTopic().get_topic(topic=12)` |
192-
| Access all topics | `BERTopic().get_topics()` |
193-
| Get topic freq | `BERTopic().get_topic_freq()` |
194-
| Get all topic information| `BERTopic().get_topic_info()` |
195-
| Get topics per class | `BERTopic().topics_per_class(docs, topics, classes)` |
196-
| Dynamic Topic Modeling | `BERTopic().topics_over_time(docs, topics, timestamps)` |
197-
| Update topic representation | `BERTopic().update_topics(docs, topics, n_gram_range=(1, 3))` |
198-
| Reduce nr of topics | `BERTopic().reduce_topics(docs, topics, nr_topics=30)` |
199-
| Find topics | `BERTopic().find_topics("vehicle")` |
200-
| Save model | `BERTopic().save("my_model")` |
202+
| Fit the model | `.fit(docs)` |
203+
| Fit the model and predict documents | `.fit_transform(docs)` |
204+
| Predict new documents | `.transform([new_doc])` |
205+
| Access single topic | `.get_topic(topic=12)` |
206+
| Access all topics | `.get_topics()` |
207+
| Get topic freq | `.get_topic_freq()` |
208+
| Get all topic information| `.get_topic_info()` |
209+
| Get representative docs per topic | `.get_representative_docs()` |
210+
| Get topics per class | `.topics_per_class(docs, topics, classes)` |
211+
| Dynamic Topic Modeling | `.topics_over_time(docs, topics, timestamps)` |
212+
| Update topic representation | `.update_topics(docs, topics, n_gram_range=(1, 3))` |
213+
| Reduce nr of topics | `.reduce_topics(docs, topics, nr_topics=30)` |
214+
| Find topics | `.find_topics("vehicle")` |
215+
| Save model | `.save("my_model")` |
201216
| Load model | `BERTopic.load("my_model")` |
202-
| Get parameters | `BERTopic().get_params()` |
217+
| Get parameters | `.get_params()` |
203218

204219
For an overview of BERTopic's visualization methods:
205220

206221
| Method | Code |
207222
|-----------------------|---|
208-
| Visualize Topics | `BERTopic().visualize_topics()` |
209-
| Visualize Topic Hierarchy | `BERTopic().visualize_hierarchy()` |
210-
| Visualize Topic Terms | `BERTopic().visualize_barchart()` |
211-
| Visualize Topic Similarity | `BERTopic().visualize_heatmap()` |
212-
| Visualize Term Score Decline | `BERTopic().visualize_term_rank()` |
213-
| Visualize Topic Probability Distribution | `BERTopic().visualize_distribution(probs[0])` |
214-
| Visualize Topics over Time | `BERTopic().visualize_topics_over_time(topics_over_time)` |
215-
| Visualize Topics per Class | `BERTopic().visualize_topics_per_class(topics_per_class)` |
223+
| Visualize Topics | `.visualize_topics()` |
224+
| Visualize Topic Hierarchy | `.visualize_hierarchy()` |
225+
| Visualize Topic Terms | `.visualize_barchart()` |
226+
| Visualize Topic Similarity | `.visualize_heatmap()` |
227+
| Visualize Term Score Decline | `.visualize_term_rank()` |
228+
| Visualize Topic Probability Distribution | `.visualize_distribution(probs[0])` |
229+
| Visualize Topics over Time | `.visualize_topics_over_time(topics_over_time)` |
230+
| Visualize Topics per Class | `.visualize_topics_per_class(topics_per_class)` |
216231

217232
## Citation
218233
To cite BERTopic in your work, please use the following bibtex reference:
@@ -223,7 +238,7 @@ To cite BERTopic in your work, please use the following bibtex reference:
223238
title = {BERTopic: Leveraging BERT and c-TF-IDF to create easily interpretable topics.},
224239
year = 2020,
225240
publisher = {Zenodo},
226-
version = {v0.7.0},
241+
version = {v0.9.2},
227242
doi = {10.5281/zenodo.4381785},
228243
url = {https://doi.org/10.5281/zenodo.4381785}
229244
}

bertopic/__init__.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
from bertopic._bertopic import BERTopic
22

3-
__version__ = "0.9.1"
3+
__version__ = "0.9.2"
44

55
__all__ = [
66
"BERTopic",

0 commit comments

Comments
 (0)