You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: md-docs/user_guide/modules/topic_modeling.md
+37-8Lines changed: 37 additions & 8 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,16 +1,44 @@
1
1
# Topic Modeling
2
2
3
-
The Topic Modeling module allow you to categorize documents based on their content. The goal is to represent each document as a set of topics, where a topic is composed by a list of words that commonly appear together. The percentage of topics in a document varies, suggesting the themes it covers and in what proportion.
3
+
The Topic Modeling module allows you to categorize documents based on their content. The goal is to represent each document as a set of topics, where a topic is made up of a list of words that commonly appear together. The percentage of topics in a document varies, suggesting the concepts it covers and in what proportion.
4
4
5
-
!!! example
6
-
Consider the case of a sport magazine. Words like "team," "game," and "score" would come up a lot, while words like "market" or "technology" would show up less frequently. This would suggest that the magazine's topics are centered around sports.
5
+
For example, a company could use Topic Modeling to analyze customer reviews and identify areas for improvement. Imagine that an e-commerce company uses Topic Modeling to analyze customer reviews of its products. The Topic Modeling module could identify topics such as “price,” “quality,” “shipping,” and “customer service.” The company could then use this information to improve its products and services in areas where customers have expressed concerns or dissatisfaction.
7
6
8
-
## Key Concepts
7
+
Topic Modeling in ML cube Platform is based on unsupervised machine learning algorithms that analyze a corpus of documents and identify the latent topics.
| Topic | A theme represented by a set of words that commonly appear together. |
13
-
| Document Distribution | Each document shows a spread of topics, indicating the themes it covers and in what proportion. |
10
+
| Term | Description |
11
+
|---|---|
12
+
| Topic | A subject represented by a set of words that commonly appear together.|
13
+
| Document Distribution | Each document shows a spread of topics, indicating the concepts it covers and in what proportion.|
14
+
15
+
## Topic Modeling Report
16
+
The Topic Modeling report provides a comprehensive overview of the topics identified in the corpus of documents. The report includes the following sections:
17
+
18
+
***Topic Summary:** This section provides a list of the identified topics, along with their coherence and perplexity. Coherence is a measure of how related the words in a topic are to each other. Perplexity is a measure of how well the model is able to predict the documents in the corpus.
19
+
***Topic Visualization:** This section includes various types of visualizations that help to understand the identified topics. The available visualizations include:
20
+
***Bar Charts:** Shows the distribution of topics in the corpus of documents.
21
+
***Heatmaps:** Shows the relationship between topics and words.
22
+
***Word Clouds:** Shows the most frequent words in each topic.
23
+
***Document Analysis:** This section allows you to examine the topic distribution in individual documents.
24
+
??? code-block "SDK Example"
25
+
The following code shows how to create a topic modeling report
26
+
When triggered, it first sends a notification to the `ml3-platform-notifications` channel on your Slack workspace, using the
27
+
provided webhook URL, and then starts the retraining of the model.
from_timestamp=prod_data_df["timestamp"].min(), # The initial timestamp from which to start the analysis
39
+
to_timestamp=prod_data_df["timestamp"].max(), # The final timestamp to end the analysis
40
+
)
41
+
```
14
42
15
43
## Supported Tasks and Data Structures
16
44
ML cube Platform supports the following tasks and data structures for Topic Modeling:
@@ -21,6 +49,7 @@ ML cube Platform supports the following tasks and data structures for Topic Mode
21
49
| Classification ||| :material-check: ||
22
50
| RAG ||| :material-check: :material-information-outline:{title="Only for User Input"} ||
23
51
52
+
Topic Modeling is only supported for text data structures because it is based on the analysis of words in documents. Topic Modeling for RAG tasks is only supported for user input because the retrieved context is not always available.
0 commit comments