Skip to content

Commit dab14c2

Browse files
Added benchmark to docs
1 parent e1f52b5 commit dab14c2

File tree

5 files changed

+3931
-41
lines changed

5 files changed

+3931
-41
lines changed

docs/benchmark.md

Lines changed: 15 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -9,8 +9,6 @@ All models were run on an older, but still powerful Dell Precision laptop, with
99
as some models ran out of memory on some of the larger datasets.
1010
Due to this, and the fact that the scale of the scores is different for different tasks, we present the **average percentile** scores on these metrics in the table bellow.
1111

12-
For models that are able to detect the number of topics, we ran the test with this setting, this is marked as ***(Auto)*** in our tables and plots.
13-
For models, where users can set the number of topics, we also ran the benchmark setting the correct number of topics a-priori.
1412

1513
??? info "Click to see Benchmark code"
1614
```python
@@ -257,7 +255,6 @@ For models, where users can set the number of topics, we also ran the benchmark
257255
print("DONE")
258256
```
259257

260-
## Model Performance
261258

262259
<iframe
263260
src="https://kardosdrur-turftopic-benchmark-table.hf.space"
@@ -267,6 +264,8 @@ For models, where users can set the number of topics, we also ran the benchmark
267264
height="620"
268265
></iframe>
269266
267+
For models that are able to detect the number of topics, we ran the test with this setting, this is marked as ***(Auto)*** in our tables and plots.
268+
For models, where users can set the number of topics, we also ran the benchmark setting the correct number of topics a-priori.
270269

271270
#### Topic Quality
272271

@@ -277,6 +276,19 @@ Out of non-auto models, KeyNMF, GMM, ZeroShotTM, FASTopic and SensTopic did best
277276

278277
Clear winners in cluster quality were GMM, Topeax(also GMM-based) and SensTopic. FASTopic also did reasonably well when recovering gold clusters in the data.
279278

279+
<figure >
280+
<iframe
281+
src="../images/radar_chart.html"
282+
frameborder="0"
283+
style="padding: 0; margin: 0;"
284+
width="1000px"
285+
height="520px"
286+
></iframe>
287+
<figcaption>Performance profile of all models on different metrics.
288+
Top 5 models on average performance are highlighted, click on legend to show the others.
289+
</figcaption>
290+
</figure>
291+
280292
## Computational Efficiency
281293

282294
<figure style="text-align: center; float: right;">
149 KB
Loading

docs/images/radar_chart.html

Lines changed: 3888 additions & 0 deletions
Large diffs are not rendered by default.

docs/model_overview.md

Lines changed: 27 additions & 37 deletions
Original file line numberDiff line numberDiff line change
@@ -4,50 +4,42 @@ Turftopic contains implementations of a number of contemporary topic models.
44
Some of these models might be similar to each other in a lot of aspects, but they might be different in others.
55
It is quite important that you choose the right topic model for your use case.
66

7-
|**Speed** | 📖 **Long Documents** | 🐘 **Scalability** | 🔩 **Flexibility** |
8-
|-------------|-----------------------|--------------------|---------------------|
9-
| [SensTopic](SensTopic.md); [SemanticSignalSeparation](s3.md) | [KeyNMF](KeyNMF.md) | [KeyNMF](KeyNMF.md) | [ClusteringTopicModel](clustering.md) |
7+
!!! tip "Looking for Model Performance?"
108

11-
_Table 1: You should tailor your model choice to your needs_
9+
If you are interested in seeing how these models perform on a bunch of datasets, and would like to base your model choice on evaluations,
10+
make sure to check out the [Model Leaderboard](benchmark.md) tab:
1211

12+
<br>
13+
<center>
14+
<img src="../images/leaderboard_screenshot.png" width="700"></img>
15+
</center>
1316

14-
<figure style="width: 50%; text-align: center; float: right;">
15-
<img src="../images/docs_per_second.png">
16-
<figcaption> Figure 1: Speed of Different Models on 20 Newsgroups <br> (Documents per Second; Higher is better) </figcaption>
17-
</figure>
17+
<div style="text-align: center" markdown>
1818

19-
Different models will naturally be good at different things, because they conceptualize topics differently for instance:
20-
21-
22-
- `SemanticSignalSeparation`($S^3$) conceptualizes topics as **semantic axes**, along which topics are distributed
23-
- `ClusteringTopicModel` finds **clusters** of documents and treats those as topics
24-
- `KeyNMF` conceptualizes topics as **factors**, or looked at it from a different angle, it finds **clusters of words**
25-
26-
You can find a detailed overview of how each of these models work in their respective tabs.
19+
| Model | Summary | Strengths | Weaknesses |
20+
| - | - | - | - |
21+
| [Topeax](Topeax.md) | Density peak detection + Gaussian mixture approximation | Cluster quality, Topic quality, Stability, Automatic n-topics | Underestimates N topics, Slower, No inference for new documents |
22+
| [KeyNMF](KeyNMF.md) | Keyword importance estimation + matrix factorization | Reliability, Topic quality, Scalability to large corpora and long documents | Automatic topic number detection, Multilingual performance, Sometimes includes stop words |
23+
| [SensTopic(BETA)](SensTopic.md) | Regularized Semi-nonnegative matrix factorization in embedding space | Very fast, High quality topics and clusters, Can assign multiple soft clusters to documents, GPU support | Automatic n-topics is not too good |
24+
| [GMM](GMM.md) | Soft clustering with Gaussian Mixtures and soft-cTF-IDF | Reliability, Speed, Cluster quality | Manual n-topics, Lower quality keywords, [Curse of dimensionality](https://en.wikipedia.org/wiki/Curse_of_dimensionality) |
25+
| [FASTopic](FASTopic.md) | Neural topic modelling with Dual Semantic-relation Reconstruction | High quality topics and clusters, GPU support | Very slow, Memory hungry, Manual n-topics |
26+
| [$S^3$](s3.md) | Semantic axis discovery in embedding space | Fastest, Human-readable topics | Axes can be very unintuitive, Manual n-topics |
27+
| [BERTopic and Top2Vec](clustering.md) | Embed -> Reduce -> Cluster | Flexible, Feature rich | Slow, Unreliable and unstable, Wildly overestimates number of clusters, Low topic and cluster quality |
28+
| [AutoEncodingTopicModel](ctm.md) | Discover topics by generating BoW with a variational autoencoder | GPU-support | Slow, Sometimes low quality topics |
2729

28-
Some models are also capable of being used in a dynamic context, some can be fitted online, some can detect the number of topics for you and some can detect topic hierarchies. You can find an overview of these features in Table 2 below.
30+
</div>
2931

30-
<figure style="width: 40%; text-align: center; float: left; margin-right: 8px">
31-
<img src="../images/performance_20ng.png">
32-
<figcaption> Figure 2: Models' Coherence and Diversity on 20 Newsgroups <br> (Higher is better) </figcaption>
33-
</figure>
34-
35-
!!! warning
36-
You should take the results presented here with a grain of salt. A more comprehensive and in-depth analysis can be found in [Kardos et al., 2024](https://arxiv.org/abs/2406.09556), though the general tendencies are similar.
37-
Note that some topic models are also less stable than others, and they might require tweaking optimal results (like BERTopic), while others perform well out-of-the-box, but are not as flexible ($S^3$)
38-
39-
The quality of the topics you can get out of your topic model can depend on a lot of things, including your choice of [vectorizer](vectorizers.md) and [encoder model](encoders.md).
40-
More rigorous evaluation regimes can be found in a number of studies on topic modeling.
32+
Different models will naturally be good at different things, because they conceptualize topics differently for instance:
4133

42-
Two usual metrics to evaluate models by are *coherence* and *diversity*.
43-
These metrics indicate how easy it is to interpret the topics provided by the topic model.
44-
Good models typically balance these to metrics, and should produce highly coherent and diverse topics.
45-
On Figure 2 you can see how good different models are on these metrics on 20 Newsgroups.
34+
- `BERTopic`, `Top2Vec`, `GMM` and `Topeax` find **clusters** of documents and treats those as topics
35+
- `KeyNMF`, `SensTopic`, `FASTopic` and `AutoEncodingTopicModel` conceptualize topics as latent nonnegative **factors** that generate the documents.
36+
- `SemanticSignalSeparation`($S^3$) conceptualizes topics as **semantic axes**, along which topics are distributed.
4637

47-
In general, the most balanced models are $S^3$, Clustering models with `centroid` feature importance, GMM and KeyNMF, while FASTopic excels at diversity.
38+
You can find a detailed overview of how each of these models work in their respective tabs.
4839

49-
<br>
40+
## Model Features
5041

42+
Some models are also capable of being used in a dynamic context, some can be fitted online, some can detect the number of topics for you and some can detect topic hierarchies. You can find an overview of these features in the table below.
5143

5244

5345
| Model | :1234: Multiple Topics per Document | :hash: Detecting Number of Topics | :chart_with_upwards_trend: Dynamic Modeling | :evergreen_tree: Hierarchical Modeling | :star: Inference over New Documents | :globe_with_meridians: Cross-Lingual | :ocean: Online Fitting |
@@ -61,9 +53,7 @@ In general, the most balanced models are $S^3$, Clustering models with `centroid
6153
| **[AutoEncodingTopicModel](ctm.md)** | :heavy_check_mark: | :x: | :x: | :x: | :heavy_check_mark: | :heavy_check_mark: | :x: |
6254
| **[FASTopic](fastopic.md)** | :heavy_check_mark: | :x: | :x: | :x: | :heavy_check_mark: | :heavy_check_mark: | :x: |
6355

64-
_Table 2: Comparison of the models based on their capabilities_
65-
6656

67-
## API Reference
57+
## Model API Reference
6858

6959
:::turftopic.base.ContextualModel

mkdocs.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -22,7 +22,7 @@ nav:
2222
- Discourse Analysis on Morality and Religion: tutorials/religious.md
2323
- Discovering a Data-driven Political Compass: tutorials/ideologies.md
2424
- Customer Dissatisfaction Analysis: tutorials/reviews.md
25-
- Topic Models:
25+
- Topic Models (Overview and Performance):
2626
- Model Overview: model_overview.md
2727
- Model Leaderboard: benchmark.md
2828
- Semantic Signal Separation (S³): s3.md

0 commit comments

Comments
 (0)