You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/benchmark.md
+15-3Lines changed: 15 additions & 3 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -9,8 +9,6 @@ All models were run on an older, but still powerful Dell Precision laptop, with
9
9
as some models ran out of memory on some of the larger datasets.
10
10
Due to this, and the fact that the scale of the scores is different for different tasks, we present the **average percentile** scores on these metrics in the table bellow.
11
11
12
-
For models that are able to detect the number of topics, we ran the test with this setting, this is marked as ***(Auto)*** in our tables and plots.
13
-
For models, where users can set the number of topics, we also ran the benchmark setting the correct number of topics a-priori.
14
12
15
13
??? info "Click to see Benchmark code"
16
14
```python
@@ -257,7 +255,6 @@ For models, where users can set the number of topics, we also ran the benchmark
@@ -267,6 +264,8 @@ For models, where users can set the number of topics, we also ran the benchmark
267
264
height="620"
268
265
></iframe>
269
266
267
+
For models that are able to detect the number of topics, we ran the test with this setting, this is marked as ***(Auto)*** in our tables and plots.
268
+
For models, where users can set the number of topics, we also ran the benchmark setting the correct number of topics a-priori.
270
269
271
270
#### Topic Quality
272
271
@@ -277,6 +276,19 @@ Out of non-auto models, KeyNMF, GMM, ZeroShotTM, FASTopic and SensTopic did best
277
276
278
277
Clear winners in cluster quality were GMM, Topeax(also GMM-based) and SensTopic. FASTopic also did reasonably well when recovering gold clusters in the data.
279
278
279
+
<figure >
280
+
<iframe
281
+
src="../images/radar_chart.html"
282
+
frameborder="0"
283
+
style="padding: 0; margin: 0;"
284
+
width="1000px"
285
+
height="520px"
286
+
></iframe>
287
+
<figcaption>Performance profile of all models on different metrics.
288
+
Top 5 models on average performance are highlighted, click on legend to show the others.
<figcaption> Figure 1: Speed of Different Models on 20 Newsgroups <br> (Documents per Second; Higher is better) </figcaption>
17
-
</figure>
17
+
<divstyle="text-align: center"markdown>
18
18
19
-
Different models will naturally be good at different things, because they conceptualize topics differently for instance:
20
-
21
-
22
-
-`SemanticSignalSeparation`($S^3$) conceptualizes topics as **semantic axes**, along which topics are distributed
23
-
-`ClusteringTopicModel` finds **clusters** of documents and treats those as topics
24
-
-`KeyNMF` conceptualizes topics as **factors**, or looked at it from a different angle, it finds **clusters of words**
25
-
26
-
You can find a detailed overview of how each of these models work in their respective tabs.
19
+
| Model | Summary | Strengths | Weaknesses |
20
+
| - | - | - | - |
21
+
|[Topeax](Topeax.md)| Density peak detection + Gaussian mixture approximation | Cluster quality, Topic quality, Stability, Automatic n-topics | Underestimates N topics, Slower, No inference for new documents |
22
+
|[KeyNMF](KeyNMF.md)| Keyword importance estimation + matrix factorization | Reliability, Topic quality, Scalability to large corpora and long documents | Automatic topic number detection, Multilingual performance, Sometimes includes stop words |
23
+
|[SensTopic(BETA)](SensTopic.md)| Regularized Semi-nonnegative matrix factorization in embedding space | Very fast, High quality topics and clusters, Can assign multiple soft clusters to documents, GPU support | Automatic n-topics is not too good |
24
+
|[GMM](GMM.md)| Soft clustering with Gaussian Mixtures and soft-cTF-IDF | Reliability, Speed, Cluster quality | Manual n-topics, Lower quality keywords, [Curse of dimensionality](https://en.wikipedia.org/wiki/Curse_of_dimensionality)|
25
+
|[FASTopic](FASTopic.md)| Neural topic modelling with Dual Semantic-relation Reconstruction | High quality topics and clusters, GPU support | Very slow, Memory hungry, Manual n-topics |
26
+
|[$S^3$](s3.md)| Semantic axis discovery in embedding space | Fastest, Human-readable topics | Axes can be very unintuitive, Manual n-topics |
27
+
|[BERTopic and Top2Vec](clustering.md)| Embed -> Reduce -> Cluster | Flexible, Feature rich | Slow, Unreliable and unstable, Wildly overestimates number of clusters, Low topic and cluster quality |
28
+
|[AutoEncodingTopicModel](ctm.md)| Discover topics by generating BoW with a variational autoencoder | GPU-support | Slow, Sometimes low quality topics |
27
29
28
-
Some models are also capable of being used in a dynamic context, some can be fitted online, some can detect the number of topics for you and some can detect topic hierarchies. You can find an overview of these features in Table 2 below.
<figcaption> Figure 2: Models' Coherence and Diversity on 20 Newsgroups <br> (Higher is better) </figcaption>
33
-
</figure>
34
-
35
-
!!! warning
36
-
You should take the results presented here with a grain of salt. A more comprehensive and in-depth analysis can be found in [Kardos et al., 2024](https://arxiv.org/abs/2406.09556), though the general tendencies are similar.
37
-
Note that some topic models are also less stable than others, and they might require tweaking optimal results (like BERTopic), while others perform well out-of-the-box, but are not as flexible ($S^3$)
38
-
39
-
The quality of the topics you can get out of your topic model can depend on a lot of things, including your choice of [vectorizer](vectorizers.md) and [encoder model](encoders.md).
40
-
More rigorous evaluation regimes can be found in a number of studies on topic modeling.
32
+
Different models will naturally be good at different things, because they conceptualize topics differently for instance:
41
33
42
-
Two usual metrics to evaluate models by are *coherence* and *diversity*.
43
-
These metrics indicate how easy it is to interpret the topics provided by the topic model.
44
-
Good models typically balance these to metrics, and should produce highly coherent and diverse topics.
45
-
On Figure 2 you can see how good different models are on these metrics on 20 Newsgroups.
34
+
-`BERTopic`, `Top2Vec`, `GMM` and `Topeax` find **clusters** of documents and treats those as topics
35
+
-`KeyNMF`, `SensTopic`, `FASTopic` and `AutoEncodingTopicModel` conceptualize topics as latent nonnegative **factors** that generate the documents.
36
+
-`SemanticSignalSeparation`($S^3$) conceptualizes topics as **semantic axes**, along which topics are distributed.
46
37
47
-
In general, the most balanced models are $S^3$, Clustering models with `centroid` feature importance, GMM and KeyNMF, while FASTopic excels at diversity.
38
+
You can find a detailed overview of how each of these models work in their respective tabs.
48
39
49
-
<br>
40
+
## Model Features
50
41
42
+
Some models are also capable of being used in a dynamic context, some can be fitted online, some can detect the number of topics for you and some can detect topic hierarchies. You can find an overview of these features in the table below.
51
43
52
44
53
45
| Model |:1234: Multiple Topics per Document |:hash: Detecting Number of Topics |:chart_with_upwards_trend: Dynamic Modeling |:evergreen_tree: Hierarchical Modeling |:star: Inference over New Documents |:globe_with_meridians: Cross-Lingual |:ocean: Online Fitting |
@@ -61,9 +53,7 @@ In general, the most balanced models are $S^3$, Clustering models with `centroid
0 commit comments