You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Чтобы облегчить этот процесс вы можете воспользоваться [докер образами с предустановленным BigARTM](https://hub.docker.com/r/xtonev/bigartm/tags).
27
27
Если по каким-то причинам использование докер образов вам не подходит, то подробное описание установки BigARTM можно найти здесь: [BigARTM installation manual](https://bigartm.readthedocs.io/en/stable/installation/index.html).
28
-
В полученный образ с BigARTM форкнуть данный репозиторий или же установить его с помощью команды: ```pip install topicnet```.
28
+
В полученный образ с BigARTM скачать данный репозиторий или же установить его с помощью команды: ```pip install topicnet```.
Copy file name to clipboardExpand all lines: README.md
+48-97Lines changed: 48 additions & 97 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -3,20 +3,27 @@
3
3
4
4
---
5
5
### What is TopicNet?
6
-
```topicnet``` library was created to assist in the task of building topic models. It aims at automating model training routine freeing more time for artistic process of constructing a target functional for the task at hand.
7
-
### How does it work?
8
-
The work starts with defining ```TopicModel``` from an ARTM model at hand or with help from ```model_constructor``` module. This model is then assigned a root position for the ```Experiment``` that will provide infrastructure for the model building process. Further, the user can define a set of training stages by the functionality provided by the ```cooking_machine.cubes``` modules and observe results of their actions via ```viewers``` module.
9
-
### Who will use this repo?
10
-
This repo is intended to be used by people that want to explore BigARTM functionality without writing an essential overhead for model training pipelines and information retrieval. It might be helpful for the experienced users to help with rapid solution prototyping
6
+
TopicNet is a high-level interface running on top of BigARTM.
7
+
8
+
```TopicNet``` library was created to assist in the task of building topic models. It aims at automating model training routine freeing more time for artistic process of constructing a target functional for the task at hand.
9
+
10
+
Consider using TopicNet if:
11
+
12
+
* you want to explore BigARTM functionality without writing an overhead.
13
+
* you need help with rapid solution prototyping.
14
+
* you want to build a good topic model quickly (out-of-box, with default parameters).
15
+
* you have an ARTM model at hand and you want to explore it's topics.
16
+
17
+
```TopicNet``` provides an infrastructure for your prototyping (```Experiment``` class) and helps to observe results of your actions via ```viewers``` module.
18
+
19
+
### How to start?
20
+
Define `TopicModel` from an ARTM model at hand or with help from `model_constructor` module. Then create an `Experiment`, assigning a root position to this model. Further, you can define a set of training stages by the functionality provided by the `cooking_machine.cubes` module.
11
21
12
22
---
13
23
## How to install TopicNet
14
24
**Core library functionality is based on BigARTM library** which requires manual installation.
15
25
To avoid that you can use [docker images](https://hub.docker.com/r/xtonev/bigartm/tags) with preinstalled BigARTM library in them.
16
26
17
-
Alternatively, you can follow [BigARTM installation manual](https://bigartm.readthedocs.io/en/stable/installation/index.html)
18
-
After setting up the environment you can fork this repository or use ```pip install topicnet``` to install the library.
19
-
20
27
#### Using docker image
21
28
```
22
29
docker pull xtonev/bigartm:v0.10.0
@@ -29,85 +36,16 @@ import artm
29
36
artm.version()
30
37
```
31
38
39
+
Alternatively, you can follow [BigARTM installation manual](https://bigartm.readthedocs.io/en/stable/installation/index.html).
40
+
After setting up the environment you can fork this repository or use ```pip install topicnet``` to install the library.
41
+
32
42
---
33
43
## How to use TopicNet
34
44
Let's say you have a handful of raw text mined from some source and you want to perform some topic modelling on them. Where should you start?
35
45
### Data Preparation
36
46
Every ML problem starts with data preprocess step. TopicNet does not perform data preprocessing itself. Instead, it demands data being prepared by the user and loaded via [Dataset (no link yet)]() class.
37
-
Here is a basic example of how one can achieve that:
38
-
```
39
-
import nltk
40
-
import artm
41
-
import string
42
-
43
-
import pandas as pd
44
-
from glob import glob
45
-
46
-
WIKI_DATA_PATH = '/Wiki_raw_set/raw_plaintexts/'
47
-
files = glob(WIKI_DATA_PATH+'*.txt')
48
-
```
49
-
Loading all texts from files and leaving only alphabetical characters and spaces:
50
-
```
51
-
right_symbols = string.ascii_letters + ' '
52
-
data = []
53
-
for path in files:
54
-
entry = {}
55
-
entry['id'] = path.split('/')[-1].split('.')[0]
56
-
with open(path,'r') as f:
57
-
text = ''.join([char for char in f.read() if char in right_symbols])
0 commit comments