Skip to content

Commit e7ef621

Browse files
Merge pull request #112 from x-tabdeveloping/fix_rendering
Fix documentation rendering
2 parents f58b219 + 7e3311a commit e7ef621

File tree

6 files changed

+47
-48
lines changed

6 files changed

+47
-48
lines changed

docs/clustering.md

Lines changed: 33 additions & 26 deletions
Original file line numberDiff line numberDiff line change
@@ -31,7 +31,6 @@ but users are free to specify the model that will be used for dimensionality red
3131
model = ClusteringTopicModel(dimensionality_reduction=TSNE(n_components=2, metric="cosine"))
3232
```
3333
TSNE is a classic method for producing non-linear lower-dimensional representations of high-simensional embeddings.
34-
TSNE has an inherent clustering property, which helps clustering models find groups of data.
3534
While it is widely used, it has many well-known issues, such as poor representation of global relations, and artificial clusters.
3635

3736
!!! tip "Use openTSNE for better performance!"
@@ -130,39 +129,47 @@ By and large there are two types of methods that can be used for importance esti
130129

131130
!!! quote "Choose a term importance estimation method"
132131

133-
=== "c-TF-IDF (BERTopic)"
132+
=== "soft-c-TF-IDF (Default)"
134133

135134
```python
136135
from turftopic import ClusteringTopicModel
137136

138137
model = ClusteringTopicModel(feature_importance="soft-c-tf-idf")
139-
# or
138+
```
139+
140+
#### Formula:
141+
142+
- Let $X$ be the document term matrix where each element ($X_{ij}$) corresponds with the number of times word $j$ occurs in a document $i$.
143+
- Estimate weight of term $j$ for topic $z$: <br>
144+
$tf_{zj} = \frac{t_{zj}}{w_z}$, where
145+
$t_{zj} = \sum_{i \in z} X_{ij}$ is the number of occurrences of a word in a topic and
146+
$w_{z}= \sum_{j} t_{zj}$ is all words in the topic <br>
147+
- Estimate inverse document/topic frequency for term $j$:
148+
$idf_j = log(\frac{N}{\sum_z |t_{zj}|})$, where
149+
$N$ is the total number of documents.
150+
- Calculate importance of term $j$ for topic $z$:
151+
$Soft-c-TF-IDF{zj} = tf_{zj} \cdot idf_j$
152+
153+
=== "c-TF-IDF (BERTopic)"
154+
155+
```python
156+
from turftopic import ClusteringTopicModel
157+
140158
model = ClusteringTopicModel(feature_importance="c-tf-idf")
141159
```
142160

143-
??? info "Click to see formulas"
144-
#### Soft-c-TF-IDF
145-
- Let $X$ be the document term matrix where each element ($X_{ij}$) corresponds with the number of times word $j$ occurs in a document $i$.
146-
- Estimate weight of term $j$ for topic $z$: <br>
147-
$tf_{zj} = \frac{t_{zj}}{w_z}$, where
148-
$t_{zj} = \sum_{i \in z} X_{ij}$ is the number of occurrences of a word in a topic and
149-
$w_{z}= \sum_{j} t_{zj}$ is all words in the topic <br>
150-
- Estimate inverse document/topic frequency for term $j$:
151-
$idf_j = log(\frac{N}{\sum_z |t_{zj}|})$, where
152-
$N$ is the total number of documents.
153-
- Calculate importance of term $j$ for topic $z$:
154-
$Soft-c-TF-IDF{zj} = tf_{zj} \cdot idf_j$
155-
156-
#### c-TF-IDF
157-
- Let $X$ be the document term matrix where each element ($X_{ij}$) corresponds with the number of times word $j$ occurs in a document $i$.
158-
- $tf_{zj} = \frac{t_{zj}}{w_z}$, where
159-
$t_{zj} = \sum_{i \in z} X_{ij}$ is the number of occurrences of a word in a topic and
160-
$w_{z}= \sum_{j} t_{zj}$ is all words in the topic <br>
161-
- Estimate inverse document/topic frequency for term $j$:
162-
$idf_j = log(1 + \frac{A}{\sum_z |t_{zj}|})$, where
163-
$A = \frac{\sum_z \sum_j t_{zj}}{Z}$ is the average number of words per topic, and $Z$ is the number of topics.
164-
- Calculate importance of term $j$ for topic $z$:
165-
$c-TF-IDF{zj} = tf_{zj} \cdot idf_j$
161+
#### Formula:
162+
163+
- Let $X$ be the document term matrix where each element ($X_{ij}$) corresponds with the number of times word $j$ occurs in a document $i$.
164+
- $tf_{zj} = \frac{t_{zj}}{w_z}$, where
165+
$t_{zj} = \sum_{i \in z} X_{ij}$ is the number of occurrences of a word in a topic and
166+
$w_{z}= \sum_{j} t_{zj}$ is all words in the topic <br>
167+
- Estimate inverse document/topic frequency for term $j$:
168+
$idf_j = log(1 + \frac{A}{\sum_z |t_{zj}|})$, where
169+
$A = \frac{\sum_z \sum_j t_{zj}}{Z}$ is the average number of words per topic, and $Z$ is the number of topics.
170+
- Calculate importance of term $j$ for topic $z$:
171+
$c-TF-IDF{zj} = tf_{zj} \cdot idf_j$
172+
166173

167174
=== "Centroid Proximity (Top2Vec)"
168175

docs/index.md

Lines changed: 0 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -3,15 +3,12 @@
33
Turftopic is a topic modeling library which intends to simplify and streamline the usage of contextually sensitive topic models.
44
We provide stable, minimal and scalable implementations of several types of models along with extensive documentation.
55

6-
<center>
7-
86
| | | |
97
| - | - | - |
108
| :house: [Build and Train Topic Models](model_definition_and_training.md) | :art: [Explore, Interpret and Visualize your Models](model_interpretation.md) | :wrench: [Modify and Fine-tune Topic Models](finetuning.md) |
119
| :pushpin: [Choose the Right Model for your Use-Case](model_overview.md) | :chart_with_upwards_trend: [Explore Topics Changing over Time](dynamic.md) | :newspaper: [Use Phrases or Lemmas for Topic Models](vectorizers.md) |
1210
| :ocean: [Extract Topics from a Stream of Documents](online.md) | :evergreen_tree: [Find Hierarchical Order in Topics](hierarchical.md) | :whale: [Name Topics with Large Language Models](namers.md) |
1311

14-
</center>
1512

1613
## Basic Usage
1714

@@ -39,15 +36,13 @@ model = KeyNMF(20).fit(corpus)
3936
model.print_topics()
4037
```
4138

42-
<center>
4339

4440
| Topic ID | Top 10 Words |
4541
| -------- | ----------------------------------------------------------------------------------------------- |
4642
| 0 | armenians, armenian, armenia, turks, turkish, genocide, azerbaijan, soviet, turkey, azerbaijani |
4743
| 1 | sale, price, shipping, offer, sell, prices, interested, 00, games, selling |
4844
| | .... |
4945

50-
</center>
5146

5247

5348

docs/model_overview.md

Lines changed: 3 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -4,16 +4,13 @@ Turftopic contains implementations of a number of contemporary topic models.
44
Some of these models might be similar to each other in a lot of aspects, but they might be different in others.
55
It is quite important that you choose the right topic model for your use case.
66

7-
<center>
87

9-
| :zap: Speed | :book: Long Documents | :elephant: Scalability | :nut_and_bolt: Flexibility |
10-
| - | - | - | - |
11-
| **[SemanticSignalSeparation](s3.md)** | **[KeyNMF](KeyNMF.md)** | **[KeyNMF](KeyNMF.md)** | **[ClusteringTopicModel](clustering.md)** |
8+
| **Speed** | 📖 **Long Documents** | 🐘 **Scalability** | 🔩 **Flexibility** |
9+
|-------------|-----------------------|--------------------|---------------------|
10+
| [SemanticSignalSeparation](s3.md) | [KeyNMF](KeyNMF.md) | [KeyNMF](KeyNMF.md) | [ClusteringTopicModel](clustering.md) |
1211

1312
_Table 1: You should tailor your model choice to your needs_
1413

15-
</center>
16-
1714

1815
<figure style="width: 50%; text-align: center; float: right;">
1916
<img src="../images/docs_per_second.png">
@@ -52,7 +49,6 @@ In general, the most balanced models are $S^3$, Clustering models with `centroid
5249

5350
<br>
5451

55-
<center>
5652

5753

5854
| Model | :1234: Multiple Topics per Document | :hash: Detecting Number of Topics | :chart_with_upwards_trend: Dynamic Modeling | :evergreen_tree: Hierarchical Modeling | :star: Inference over New Documents | :globe_with_meridians: Cross-Lingual | :ocean: Online Fitting |
@@ -66,7 +62,6 @@ In general, the most balanced models are $S^3$, Clustering models with `centroid
6662

6763
_Table 2: Comparison of the models based on their capabilities_
6864

69-
</center>
7065

7166
## API Reference
7267

docs/tutorials/images/s3.png

16.7 KB
Loading

mkdocs.yml

Lines changed: 8 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@ site_name: Turftopic
22
site_description: 'An all-in-one library for topic modeling with sentence embeddings.'
33
repo_url: https://github.com/x-tabdeveloping/turftopic
44
nav:
5-
- Usage:
5+
- User Guide:
66
- Getting Started: index.md
77
- Defining and Fitting Topic Models: model_definition_and_training.md
88
- Interpreting and Visualizing Models: model_interpretation.md
@@ -21,20 +21,22 @@ nav:
2121
- Discourse Analysis on Morality and Religion: tutorials/religious.md
2222
- Discovering a Data-driven Political Compass: tutorials/ideologies.md
2323
- Customer Dissatisfaction Analysis: tutorials/reviews.md
24-
- Models:
24+
- Topic Models:
2525
- Model Overview: model_overview.md
2626
- Semantic Signal Separation (S³): s3.md
2727
- KeyNMF: KeyNMF.md
2828
- GMM: GMM.md
2929
- Clustering Models (BERTopic & Top2Vec): clustering.md
3030
- Autoencoding Models (ZeroShotTM & CombinedTM): ctm.md
3131
- FASTopic: FASTopic.md
32-
- Encoders: encoders.md
33-
- Vectorizers: vectorizers.md
34-
- Namers: namers.md
32+
- Embedding Models: encoders.md
33+
- Vectorizers (Term extraction): vectorizers.md
34+
- Topic Namers: namers.md
3535
theme:
3636
name: material
3737
logo: images/logo.svg
38+
font:
39+
text: Ubuntu
3840
navigation_depth: 3
3941
palette:
4042
primary: '#01034A'
@@ -68,8 +70,8 @@ markdown_extensions:
6870
- pymdownx.superfences
6971
- attr_list
7072
- md_in_html
71-
- admonition
7273
- tables
74+
- admonition
7375
- pymdownx.details
7476
- pymdownx.superfences
7577
- pymdownx.tabbed:

pyproject.toml

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,7 @@ profile = "black"
99

1010
[project]
1111
name = "turftopic"
12-
version = "0.17.4"
12+
version = "0.17.5"
1313
description = "Topic modeling with contextual representations from sentence transformers."
1414
authors = [
1515
{ name = "Márton Kardos <[email protected]>", email = "[email protected]" }
@@ -42,9 +42,9 @@ topic-wizard = ["topic-wizard>1.0.0,<2.0.0"]
4242
umap-learn = ["umap-learn>=0.5.5,<1.0.0"]
4343
docs = [
4444
"griffe==0.40.0",
45-
"mkdocs==1.5.3",
45+
"mkdocs==1.6.1",
4646
"mkdocs-autorefs==0.5.0",
47-
"mkdocs-material==9.5.6",
47+
"mkdocs-material==9.6.19",
4848
"mkdocs-material-extensions==1.3.1",
4949
"mkdocstrings==0.22.0",
5050
"mkdocstrings-python==1.8.0",

0 commit comments

Comments
 (0)