Merge pull request #112 from x-tabdeveloping/fix_rendering

x-tabdeveloping · web-flow · commit e7ef621d20d6 · 2025-09-10T14:04:44.000+02:00
Fix documentation rendering
diff --git a/docs/clustering.md b/docs/clustering.md
@@ -31,7 +31,6 @@ but users are free to specify the model that will be used for dimensionality red
         model = ClusteringTopicModel(dimensionality_reduction=TSNE(n_components=2, metric="cosine"))
         ```
         TSNE is a classic method for producing non-linear lower-dimensional representations of high-simensional embeddings.
-        TSNE has an inherent clustering property, which helps clustering models find groups of data.
         While it is widely used, it has many well-known issues, such as poor representation of global relations, and artificial clusters.
 
         !!! tip "Use openTSNE for better performance!"
@@ -130,39 +129,47 @@ By and large there are two types of methods that can be used for importance esti
 
 !!! quote "Choose a term importance estimation method"
 
-    === "c-TF-IDF (BERTopic)"
+    === "soft-c-TF-IDF (Default)"
 
         ```python
         from turftopic import ClusteringTopicModel
 
         model = ClusteringTopicModel(feature_importance="soft-c-tf-idf")
-        # or
+        ```
+
+        #### Formula:
+
+        - Let $X$ be the document term matrix where each element ($X_{ij}$) corresponds with the number of times word $j$ occurs in a document $i$.
+        - Estimate weight of term $j$ for topic $z$: <br>
+        $tf_{zj} = \frac{t_{zj}}{w_z}$, where 
+        $t_{zj} = \sum_{i \in z} X_{ij}$ is the number of occurrences of a word in a topic and 
+        $w_{z}= \sum_{j} t_{zj}$ is all words in the topic <br>
+        - Estimate inverse document/topic frequency for term $j$:  
+        $idf_j = log(\frac{N}{\sum_z |t_{zj}|})$, where
+        $N$ is the total number of documents.
+        - Calculate importance of term $j$ for topic $z$:   
+        $Soft-c-TF-IDF{zj} = tf_{zj} \cdot idf_j$
+
+    === "c-TF-IDF (BERTopic)"
+
+        ```python
+        from turftopic import ClusteringTopicModel
+
         model = ClusteringTopicModel(feature_importance="c-tf-idf")
         ```
 
-         ??? info "Click to see formulas"
-            #### Soft-c-TF-IDF
-            - Let $X$ be the document term matrix where each element ($X_{ij}$) corresponds with the number of times word $j$ occurs in a document $i$.
-            - Estimate weight of term $j$ for topic $z$: <br>
-            $tf_{zj} = \frac{t_{zj}}{w_z}$, where 
-            $t_{zj} = \sum_{i \in z} X_{ij}$ is the number of occurrences of a word in a topic and 
-            $w_{z}= \sum_{j} t_{zj}$ is all words in the topic <br>
-            - Estimate inverse document/topic frequency for term $j$:  
-            $idf_j = log(\frac{N}{\sum_z |t_{zj}|})$, where
-            $N$ is the total number of documents.
-            - Calculate importance of term $j$ for topic $z$:   
-            $Soft-c-TF-IDF{zj} = tf_{zj} \cdot idf_j$
-
-            #### c-TF-IDF
-            - Let $X$ be the document term matrix where each element ($X_{ij}$) corresponds with the number of times word $j$ occurs in a document $i$.
-            - $tf_{zj} = \frac{t_{zj}}{w_z}$, where 
-            $t_{zj} = \sum_{i \in z} X_{ij}$ is the number of occurrences of a word in a topic and 
-            $w_{z}= \sum_{j} t_{zj}$ is all words in the topic <br>
-            - Estimate inverse document/topic frequency for term $j$:  
-            $idf_j = log(1 + \frac{A}{\sum_z |t_{zj}|})$, where
-            $A = \frac{\sum_z \sum_j t_{zj}}{Z}$ is the average number of words per topic, and $Z$ is the number of topics.
-            - Calculate importance of term $j$ for topic $z$:   
-            $c-TF-IDF{zj} = tf_{zj} \cdot idf_j$
+        #### Formula:
+
+        - Let $X$ be the document term matrix where each element ($X_{ij}$) corresponds with the number of times word $j$ occurs in a document $i$.
+        - $tf_{zj} = \frac{t_{zj}}{w_z}$, where 
+        $t_{zj} = \sum_{i \in z} X_{ij}$ is the number of occurrences of a word in a topic and 
+        $w_{z}= \sum_{j} t_{zj}$ is all words in the topic <br>
+        - Estimate inverse document/topic frequency for term $j$:  
+        $idf_j = log(1 + \frac{A}{\sum_z |t_{zj}|})$, where
+        $A = \frac{\sum_z \sum_j t_{zj}}{Z}$ is the average number of words per topic, and $Z$ is the number of topics.
+        - Calculate importance of term $j$ for topic $z$:   
+        $c-TF-IDF{zj} = tf_{zj} \cdot idf_j$
+
 
     === "Centroid Proximity (Top2Vec)"
 
diff --git a/docs/index.md b/docs/index.md
@@ -3,15 +3,12 @@
 Turftopic is a topic modeling library which intends to simplify and streamline the usage of contextually sensitive topic models.
 We provide stable, minimal and scalable implementations of several types of models along with extensive documentation.
 
-<center>
-
 | | | |
 | - | - | - |
 |   :house: [Build and Train Topic Models](model_definition_and_training.md) |  :art: [Explore, Interpret and Visualize your Models](model_interpretation.md) | :wrench: [Modify and Fine-tune Topic Models](finetuning.md) |
 |  :pushpin:  [Choose the Right Model for your Use-Case](model_overview.md) |  :chart_with_upwards_trend: [Explore Topics Changing over Time](dynamic.md)   |  :newspaper: [Use Phrases or Lemmas for Topic Models](vectorizers.md) |
 | :ocean: [Extract Topics from a Stream of Documents](online.md) |  :evergreen_tree: [Find Hierarchical Order in Topics](hierarchical.md) |  :whale: [Name Topics with Large Language Models](namers.md) |
 
-</center>
 
 ## Basic Usage
 
@@ -39,15 +36,13 @@ model = KeyNMF(20).fit(corpus)
 model.print_topics()
 ```
 
-<center>
 
 | Topic ID | Top 10 Words                                                                                    |
 | -------- | ----------------------------------------------------------------------------------------------- |
 |        0 | armenians, armenian, armenia, turks, turkish, genocide, azerbaijan, soviet, turkey, azerbaijani |
 |        1 | sale, price, shipping, offer, sell, prices, interested, 00, games, selling                      |
 |         | ....                                |
 
-</center>
 
 
 
diff --git a/docs/model_overview.md b/docs/model_overview.md
@@ -4,16 +4,13 @@ Turftopic contains implementations of a number of contemporary topic models.
 Some of these models might be similar to each other in a lot of aspects, but they might be different in others.
 It is quite important that you choose the right topic model for your use case.
 
-<center>
 
-| :zap: Speed | :book: Long Documents | :elephant: Scalability | :nut_and_bolt: Flexibility |
-| - | - | - | - |
-| **[SemanticSignalSeparation](s3.md)** | **[KeyNMF](KeyNMF.md)** |  **[KeyNMF](KeyNMF.md)** | **[ClusteringTopicModel](clustering.md)** |
+| ⚡ **Speed** | 📖 **Long Documents** | 🐘 **Scalability** | 🔩 **Flexibility** |
+|-------------|-----------------------|--------------------|---------------------|
+| [SemanticSignalSeparation](s3.md) | [KeyNMF](KeyNMF.md) | [KeyNMF](KeyNMF.md) | [ClusteringTopicModel](clustering.md) |
 
 _Table 1: You should tailor your model choice to your needs_
 
-</center>
-
 
 <figure style="width: 50%; text-align: center; float: right;">
   <img src="../images/docs_per_second.png">
@@ -52,7 +49,6 @@ In general, the most balanced models are $S^3$, Clustering models with `centroid
 
 <br>
 
-<center>
 
 
 | Model | :1234: Multiple Topics per Document  | :hash: Detecting Number of Topics  | :chart_with_upwards_trend: Dynamic Modeling  | :evergreen_tree: Hierarchical Modeling  | :star: Inference over New Documents  | :globe_with_meridians: Cross-Lingual  | :ocean: Online Fitting  |
@@ -66,7 +62,6 @@ In general, the most balanced models are $S^3$, Clustering models with `centroid
 
 _Table 2: Comparison of the models based on their capabilities_
 
-</center>
 
 ## API Reference
 
diff --git a/docs/tutorials/images/s3.png b/docs/tutorials/images/s3.png
diff --git a/mkdocs.yml b/mkdocs.yml
@@ -2,7 +2,7 @@ site_name: Turftopic
 site_description: 'An all-in-one library for topic modeling with sentence embeddings.'
 repo_url: https://github.com/x-tabdeveloping/turftopic
 nav:
-  - Usage:
+  - User Guide:
     - Getting Started: index.md
     - Defining and Fitting Topic Models: model_definition_and_training.md
     - Interpreting and Visualizing Models: model_interpretation.md
@@ -21,20 +21,22 @@ nav:
      - Discourse Analysis on Morality and Religion: tutorials/religious.md
      - Discovering a Data-driven Political Compass: tutorials/ideologies.md
      - Customer Dissatisfaction Analysis: tutorials/reviews.md
-  - Models:
+  - Topic Models:
     - Model Overview: model_overview.md
     - Semantic Signal Separation (S³): s3.md
     - KeyNMF: KeyNMF.md
     - GMM: GMM.md
     - Clustering Models (BERTopic & Top2Vec): clustering.md
     - Autoencoding Models (ZeroShotTM & CombinedTM): ctm.md
     - FASTopic: FASTopic.md
-  - Encoders: encoders.md
-  - Vectorizers: vectorizers.md
-  - Namers: namers.md
+  - Embedding Models: encoders.md
+  - Vectorizers (Term extraction): vectorizers.md
+  - Topic Namers: namers.md
 theme:
   name: material
   logo: images/logo.svg
+  font:
+    text: Ubuntu
   navigation_depth: 3
   palette:
     primary: '#01034A'
@@ -68,8 +70,8 @@ markdown_extensions:
   - pymdownx.superfences
   - attr_list
   - md_in_html
-  - admonition
   - tables
+  - admonition
   - pymdownx.details
   - pymdownx.superfences
   - pymdownx.tabbed:
diff --git a/pyproject.toml b/pyproject.toml
@@ -9,7 +9,7 @@ profile = "black"
 
 [project]
 name = "turftopic"
-version = "0.17.4"
+version = "0.17.5"
 description = "Topic modeling with contextual representations from sentence transformers."
 authors = [
    { name = "Márton Kardos <power.up1163@gmail.com>", email = "martonkardos@cas.au.dk" }
@@ -42,9 +42,9 @@ topic-wizard = ["topic-wizard>1.0.0,<2.0.0"]
 umap-learn = ["umap-learn>=0.5.5,<1.0.0"]
 docs = [
   "griffe==0.40.0",
-  "mkdocs==1.5.3",
+  "mkdocs==1.6.1",
   "mkdocs-autorefs==0.5.0",
-  "mkdocs-material==9.5.6",
+  "mkdocs-material==9.6.19",
   "mkdocs-material-extensions==1.3.1",
   "mkdocstrings==0.22.0",
   "mkdocstrings-python==1.8.0",