x-tabdeveloping
diff --git a/‎docs/benchmark.md‎
Lines changed: 15 additions & 3 deletions b/‎docs/benchmark.md‎
Lines changed: 15 additions & 3 deletions
diff --git a/‎docs/images/leaderboard_screenshot.png‎
149 KB b/‎docs/images/leaderboard_screenshot.png‎
149 KB
diff --git a/‎docs/images/radar_chart.html‎
Lines changed: 3888 additions & 0 deletions b/‎docs/images/radar_chart.html‎
Lines changed: 3888 additions & 0 deletions
diff --git a/‎docs/model_overview.md‎
Lines changed: 27 additions & 37 deletions b/‎docs/model_overview.md‎
Lines changed: 27 additions & 37 deletions
diff --git a/‎mkdocs.yml‎
Lines changed: 1 addition & 1 deletion b/‎mkdocs.yml‎
Lines changed: 1 addition & 1 deletion
@@ -9,8 +9,6 @@ All models were run on an older, but still powerful Dell Precision laptop, with
 as some models ran out of memory on some of the larger datasets.
 Due to this, and the fact that the scale of the scores is different for different tasks, we present the **average percentile** scores on these metrics in the table bellow.
 
-For models that are able to detect the number of topics, we ran the test with this setting, this is marked as ***(Auto)*** in our tables and plots.
-For models, where users can set the number of topics, we also ran the benchmark setting the correct number of topics a-priori.
 
 ??? info "Click to see Benchmark code"
     ```python
@@ -257,7 +255,6 @@ For models, where users can set the number of topics, we also ran the benchmark
         print("DONE")
     ```
 
-## Model Performance
 
 <iframe
 	src="https://kardosdrur-turftopic-benchmark-table.hf.space"
@@ -267,6 +264,8 @@ For models, where users can set the number of topics, we also ran the benchmark
 	height="620"
 ></iframe>
 
+For models that are able to detect the number of topics, we ran the test with this setting, this is marked as ***(Auto)*** in our tables and plots.
+For models, where users can set the number of topics, we also ran the benchmark setting the correct number of topics a-priori.
 
 #### Topic Quality
 
@@ -277,6 +276,19 @@ Out of non-auto models, KeyNMF, GMM, ZeroShotTM, FASTopic and SensTopic did best
 
 Clear winners in cluster quality were GMM, Topeax(also GMM-based) and SensTopic. FASTopic also did reasonably well when recovering gold clusters in the data.
 
+<figure >
+  <iframe
+    src="../images/radar_chart.html"
+    frameborder="0"
+    style="padding: 0; margin: 0;"
+    width="1000px"
+    height="520px"
+  ></iframe>
+  <figcaption>Performance profile of all models on different metrics.
+    Top 5 models on average performance are highlighted, click on legend to show the others.
+  </figcaption>
+</figure>
+
 ## Computational Efficiency
 
 <figure style="text-align: center; float: right;">
 
@@ -4,50 +4,42 @@ Turftopic contains implementations of a number of contemporary topic models.
 Some of these models might be similar to each other in a lot of aspects, but they might be different in others.
 It is quite important that you choose the right topic model for your use case.
 
-| ⚡ **Speed** | 📖 **Long Documents** | 🐘 **Scalability** | 🔩 **Flexibility** |
-|-------------|-----------------------|--------------------|---------------------|
-| [SensTopic](SensTopic.md); [SemanticSignalSeparation](s3.md) | [KeyNMF](KeyNMF.md) | [KeyNMF](KeyNMF.md) | [ClusteringTopicModel](clustering.md) |
+!!! tip "Looking for Model Performance?"
 
-_Table 1: You should tailor your model choice to your needs_
+    If you are interested in seeing how these models perform on a bunch of datasets, and would like to base your model choice on evaluations,
+    make sure to check out the [Model Leaderboard](benchmark.md) tab:
 
+    <br>
+    <center>
+    <img src="../images/leaderboard_screenshot.png" width="700"></img>
+    </center>
 
-<figure style="width: 50%; text-align: center; float: right;">
-  <img src="../images/docs_per_second.png">
-  <figcaption> Figure 1: Speed of Different Models on 20 Newsgroups <br> (Documents per Second; Higher is better) </figcaption>
-</figure>
+<div style="text-align: center" markdown>
 
-Different models will naturally be good at different things, because they conceptualize topics differently for instance:
-
-
-- `SemanticSignalSeparation`($S^3$) conceptualizes topics as **semantic axes**, along which topics are distributed
-- `ClusteringTopicModel` finds **clusters** of documents and treats those as topics
-- `KeyNMF` conceptualizes topics as **factors**, or looked at it from a different angle, it finds **clusters of words**
-
-You can find a detailed overview of how each of these models work in their respective tabs.
+| Model | Summary | Strengths | Weaknesses |
+| -  | - | - | - |
+| [Topeax](Topeax.md)  | Density peak detection + Gaussian mixture approximation | Cluster quality, Topic quality, Stability, Automatic n-topics | Underestimates N topics, Slower, No inference for new documents |
+| [KeyNMF](KeyNMF.md)  | Keyword importance estimation + matrix factorization | Reliability, Topic quality, Scalability to large corpora and long documents | Automatic topic number detection, Multilingual performance, Sometimes includes stop words |
+| [SensTopic(BETA)](SensTopic.md)  | Regularized Semi-nonnegative matrix factorization in embedding space | Very fast, High quality topics and clusters, Can assign multiple soft clusters to documents, GPU support | Automatic n-topics is not too good |
+| [GMM](GMM.md)  | Soft clustering with Gaussian Mixtures and soft-cTF-IDF | Reliability, Speed, Cluster quality | Manual n-topics, Lower quality keywords, [Curse of dimensionality](https://en.wikipedia.org/wiki/Curse_of_dimensionality) |
+| [FASTopic](FASTopic.md)  | Neural topic modelling with Dual Semantic-relation Reconstruction | High quality topics and clusters, GPU support | Very slow, Memory hungry, Manual n-topics |
+| [$S^3$](s3.md)  | Semantic axis discovery in embedding space | Fastest, Human-readable topics | Axes can be very unintuitive, Manual n-topics |
+| [BERTopic and Top2Vec](clustering.md)  | Embed -> Reduce -> Cluster | Flexible, Feature rich | Slow, Unreliable and unstable, Wildly overestimates number of clusters, Low topic and cluster quality |
+| [AutoEncodingTopicModel](ctm.md)  | Discover topics by generating BoW with a variational autoencoder | GPU-support | Slow, Sometimes low quality topics |
 
-Some models are also capable of being used in a dynamic context, some can be fitted online, some can detect the number of topics for you and some can detect topic hierarchies. You can find an overview of these features in Table 2 below.
+</div>
 
-<figure style="width: 40%; text-align: center; float: left; margin-right: 8px">
-  <img src="../images/performance_20ng.png">
-  <figcaption> Figure 2: Models' Coherence and Diversity on 20 Newsgroups <br> (Higher is better) </figcaption>
-</figure>
-
-!!! warning
-    You should take the results presented here with a grain of salt. A more comprehensive and in-depth analysis can be found in [Kardos et al., 2024](https://arxiv.org/abs/2406.09556), though the general tendencies are similar.
-    Note that some topic models are also less stable than others, and they might require tweaking optimal results (like BERTopic), while others perform well out-of-the-box, but are not as flexible ($S^3$)
-
-The quality of the topics you can get out of your topic model can depend on a lot of things, including your choice of [vectorizer](vectorizers.md) and [encoder model](encoders.md).
-More rigorous evaluation regimes can be found in a number of studies on topic modeling.
+Different models will naturally be good at different things, because they conceptualize topics differently for instance:
 
-Two usual metrics to evaluate models by are *coherence* and *diversity*.
-These metrics indicate how easy it is to interpret the topics provided by the topic model.
-Good models typically balance these to metrics, and should produce highly coherent and diverse topics.
-On Figure 2 you can see how good different models are on these metrics on 20 Newsgroups.
+- `BERTopic`, `Top2Vec`, `GMM` and `Topeax` find **clusters** of documents and treats those as topics
+- `KeyNMF`, `SensTopic`, `FASTopic` and `AutoEncodingTopicModel` conceptualize topics as latent nonnegative **factors** that generate the documents.
+- `SemanticSignalSeparation`($S^3$) conceptualizes topics as **semantic axes**, along which topics are distributed.
 
-In general, the most balanced models are $S^3$, Clustering models with `centroid` feature importance, GMM and KeyNMF, while FASTopic excels at diversity.
+You can find a detailed overview of how each of these models work in their respective tabs.
 
-<br>
+## Model Features
 
+Some models are also capable of being used in a dynamic context, some can be fitted online, some can detect the number of topics for you and some can detect topic hierarchies. You can find an overview of these features in the table below.
 
 
 | Model | :1234: Multiple Topics per Document  | :hash: Detecting Number of Topics  | :chart_with_upwards_trend: Dynamic Modeling  | :evergreen_tree: Hierarchical Modeling  | :star: Inference over New Documents  | :globe_with_meridians: Cross-Lingual  | :ocean: Online Fitting  |
@@ -61,9 +53,7 @@ In general, the most balanced models are $S^3$, Clustering models with `centroid
 | **[AutoEncodingTopicModel](ctm.md)** | :heavy_check_mark: | :x: | :x: | :x: | :heavy_check_mark: | :heavy_check_mark:  | :x: |
 | **[FASTopic](fastopic.md)** | :heavy_check_mark: | :x: | :x: | :x: | :heavy_check_mark: | :heavy_check_mark: | :x: |
 
-_Table 2: Comparison of the models based on their capabilities_
-
 
-## API Reference
+## Model API Reference
 
 :::turftopic.base.ContextualModel
@@ -22,7 +22,7 @@ nav:
      - Discourse Analysis on Morality and Religion: tutorials/religious.md
      - Discovering a Data-driven Political Compass: tutorials/ideologies.md
      - Customer Dissatisfaction Analysis: tutorials/reviews.md
-  - Topic Models:
+  - Topic Models (Overview and Performance):
     - Model Overview: model_overview.md
     - Model Leaderboard: benchmark.md
     - Semantic Signal Separation (S³): s3.md