|
1 | | -# Semantic-Unit-for-Multi-label-Text-Classification |
2 | | -Code for the article "Semantic-Unit-Based Dilated Convolution for Multi-Label Text Classification" (EMNLP 2018). |
3 | | - |
4 | | -*********************************************************** |
5 | | - |
6 | | -## Requirements |
7 | | -* Ubuntu 16.0.4 |
8 | | -* Python 3.5 |
9 | | -* Pytorch 0.4.1 (updated) |
10 | | - |
11 | | -************************************************************** |
12 | | - |
13 | | -## Data |
14 | | -Our preprocessed RCV1-V2 dataset can be retrieved through [this link](https://drive.google.com/open?id=1oQ5_gPoRwAl7UGWTDNu4qATNtJ1l1kXd). |
15 | | - |
16 | | -*************************************************************** |
17 | | - |
18 | | -## Preprocessing |
19 | | -``` |
20 | | -python3 preprocess.py -load_data path_to_data -save_data path_to_store_data |
21 | | -``` |
22 | | -Remember to put the data into a folder and name them *train.src*, *train.tgt*, *valid.src*, *valid.tgt*, *test.src* and *test.tgt*, and make a new folder inside called *data* |
23 | | - |
24 | | -*************************************************************** |
25 | | - |
26 | | -## Training |
27 | | -``` |
28 | | -python3 train.py -log log_name -config config_yaml -gpus id |
29 | | -``` |
30 | | -Create your own yaml file for hyperparameter setting. |
31 | | - |
32 | | -**************************************************************** |
33 | | - |
34 | | -## Evaluation |
35 | | -``` |
36 | | -python3 train.py -log log_name -config config_yaml -gpus id -restore checkpoint -mode eval |
37 | | -``` |
38 | | - |
39 | | -******************************************************************* |
40 | | - |
41 | | -# Citation |
42 | | -If you use this code for your research, please kindly cite our paper: |
43 | | -``` |
44 | | -@inproceedings{DBLP:conf/emnlp/LinSYM018, |
45 | | - author = {Junyang Lin and |
46 | | - Qi Su and |
47 | | - Pengcheng Yang and |
48 | | - Shuming Ma and |
49 | | - Xu Sun}, |
50 | | - title = {Semantic-Unit-Based Dilated Convolution for Multi-Label Text Classification}, |
51 | | - booktitle = {Proceedings of the 2018 Conference on Empirical Methods in Natural |
52 | | - Language Processing, Brussels, Belgium, October 31 - November 4, 2018}, |
53 | | - pages = {4554--4564}, |
54 | | - year = {2018} |
55 | | -} |
56 | | -``` |
57 | | - |
| 1 | +# Semantic-Unit-for-Multi-label-Text-Classification |
| 2 | +Code for the article "Semantic-Unit-Based Dilated Convolution for Multi-Label Text Classification" (EMNLP 2018). |
| 3 | + |
| 4 | +*********************************************************** |
| 5 | + |
| 6 | +## Requirements |
| 7 | +* Ubuntu 16.0.4 |
| 8 | +* Python 3.5 |
| 9 | +* Pytorch 0.4.1 (updated) |
| 10 | + |
| 11 | +************************************************************** |
| 12 | + |
| 13 | +## Data |
| 14 | +Our preprocessed RCV1-V2 dataset can be retrieved through [this link](https://drive.google.com/open?id=1oQ5_gPoRwAl7UGWTDNu4qATNtJ1l1kXd). (The json file of label set for evaluation is added for convenience.) |
| 15 | + |
| 16 | +*************************************************************** |
| 17 | + |
| 18 | +## Preprocessing |
| 19 | +``` |
| 20 | +python3 preprocess.py -load_data path_to_data -save_data path_to_store_data (-src_filter 500) |
| 21 | +``` |
| 22 | +Remember to put the data (plain text file) into a folder and name them *train.src*, *train.tgt*, *valid.src*, *valid.tgt*, *test.src* and *test.tgt*, and make a new folder inside called *data*. |
| 23 | + |
| 24 | +*************************************************************** |
| 25 | + |
| 26 | +## Training |
| 27 | +``` |
| 28 | +python3 train.py -log log_name -config config_yaml -gpus id (-label_dict_file path to your label set) |
| 29 | +``` |
| 30 | +Create your own yaml file for hyperparameter setting. |
| 31 | + |
| 32 | +**************************************************************** |
| 33 | + |
| 34 | +## Evaluation |
| 35 | +``` |
| 36 | +python3 train.py -log log_name -config config_yaml -gpus id -restore checkpoint -mode eval |
| 37 | +``` |
| 38 | + |
| 39 | +******************************************************************* |
| 40 | + |
| 41 | +# Citation |
| 42 | +If you use this code for your research, please kindly cite our paper: |
| 43 | +``` |
| 44 | +@inproceedings{DBLP:conf/emnlp/LinSYM018, |
| 45 | + author = {Junyang Lin and |
| 46 | + Qi Su and |
| 47 | + Pengcheng Yang and |
| 48 | + Shuming Ma and |
| 49 | + Xu Sun}, |
| 50 | + title = {Semantic-Unit-Based Dilated Convolution for Multi-Label Text Classification}, |
| 51 | + booktitle = {Proceedings of the 2018 Conference on Empirical Methods in Natural |
| 52 | + Language Processing, Brussels, Belgium, October 31 - November 4, 2018}, |
| 53 | + pages = {4554--4564}, |
| 54 | + year = {2018} |
| 55 | +} |
| 56 | +``` |
| 57 | + |
0 commit comments