Skip to content

Commit 477bcfe

Browse files
committed
update
1 parent 3f5364b commit 477bcfe

File tree

20 files changed

+2200
-2442
lines changed

20 files changed

+2200
-2442
lines changed

README.md

Lines changed: 57 additions & 57 deletions
Original file line numberDiff line numberDiff line change
@@ -1,57 +1,57 @@
1-
# Semantic-Unit-for-Multi-label-Text-Classification
2-
Code for the article "Semantic-Unit-Based Dilated Convolution for Multi-Label Text Classification" (EMNLP 2018).
3-
4-
***********************************************************
5-
6-
## Requirements
7-
* Ubuntu 16.0.4
8-
* Python 3.5
9-
* Pytorch 0.4.1 (updated)
10-
11-
**************************************************************
12-
13-
## Data
14-
Our preprocessed RCV1-V2 dataset can be retrieved through [this link](https://drive.google.com/open?id=1oQ5_gPoRwAl7UGWTDNu4qATNtJ1l1kXd).
15-
16-
***************************************************************
17-
18-
## Preprocessing
19-
```
20-
python3 preprocess.py -load_data path_to_data -save_data path_to_store_data
21-
```
22-
Remember to put the data into a folder and name them *train.src*, *train.tgt*, *valid.src*, *valid.tgt*, *test.src* and *test.tgt*, and make a new folder inside called *data*
23-
24-
***************************************************************
25-
26-
## Training
27-
```
28-
python3 train.py -log log_name -config config_yaml -gpus id
29-
```
30-
Create your own yaml file for hyperparameter setting.
31-
32-
****************************************************************
33-
34-
## Evaluation
35-
```
36-
python3 train.py -log log_name -config config_yaml -gpus id -restore checkpoint -mode eval
37-
```
38-
39-
*******************************************************************
40-
41-
# Citation
42-
If you use this code for your research, please kindly cite our paper:
43-
```
44-
@inproceedings{DBLP:conf/emnlp/LinSYM018,
45-
author = {Junyang Lin and
46-
Qi Su and
47-
Pengcheng Yang and
48-
Shuming Ma and
49-
Xu Sun},
50-
title = {Semantic-Unit-Based Dilated Convolution for Multi-Label Text Classification},
51-
booktitle = {Proceedings of the 2018 Conference on Empirical Methods in Natural
52-
Language Processing, Brussels, Belgium, October 31 - November 4, 2018},
53-
pages = {4554--4564},
54-
year = {2018}
55-
}
56-
```
57-
1+
# Semantic-Unit-for-Multi-label-Text-Classification
2+
Code for the article "Semantic-Unit-Based Dilated Convolution for Multi-Label Text Classification" (EMNLP 2018).
3+
4+
***********************************************************
5+
6+
## Requirements
7+
* Ubuntu 16.0.4
8+
* Python 3.5
9+
* Pytorch 0.4.1 (updated)
10+
11+
**************************************************************
12+
13+
## Data
14+
Our preprocessed RCV1-V2 dataset can be retrieved through [this link](https://drive.google.com/open?id=1oQ5_gPoRwAl7UGWTDNu4qATNtJ1l1kXd). (The json file of label set for evaluation is added for convenience.)
15+
16+
***************************************************************
17+
18+
## Preprocessing
19+
```
20+
python3 preprocess.py -load_data path_to_data -save_data path_to_store_data (-src_filter 500)
21+
```
22+
Remember to put the data (plain text file) into a folder and name them *train.src*, *train.tgt*, *valid.src*, *valid.tgt*, *test.src* and *test.tgt*, and make a new folder inside called *data*.
23+
24+
***************************************************************
25+
26+
## Training
27+
```
28+
python3 train.py -log log_name -config config_yaml -gpus id (-label_dict_file path to your label set)
29+
```
30+
Create your own yaml file for hyperparameter setting.
31+
32+
****************************************************************
33+
34+
## Evaluation
35+
```
36+
python3 train.py -log log_name -config config_yaml -gpus id -restore checkpoint -mode eval
37+
```
38+
39+
*******************************************************************
40+
41+
# Citation
42+
If you use this code for your research, please kindly cite our paper:
43+
```
44+
@inproceedings{DBLP:conf/emnlp/LinSYM018,
45+
author = {Junyang Lin and
46+
Qi Su and
47+
Pengcheng Yang and
48+
Shuming Ma and
49+
Xu Sun},
50+
title = {Semantic-Unit-Based Dilated Convolution for Multi-Label Text Classification},
51+
booktitle = {Proceedings of the 2018 Conference on Empirical Methods in Natural
52+
Language Processing, Brussels, Belgium, October 31 - November 4, 2018},
53+
pages = {4554--4564},
54+
year = {2018}
55+
}
56+
```
57+

semantic-unit-based/config.yaml

Lines changed: 27 additions & 27 deletions
Original file line numberDiff line numberDiff line change
@@ -1,28 +1,28 @@
1-
data: '/home/linjunyang/multilabel_rcv/data/'
2-
logF: 'experiments/rcv/'
3-
epoch: 20
4-
batch_size: 64
5-
optim: 'adam'
6-
cell: 'lstm'
7-
attention: 'luong_gate'
8-
learning_rate: 0.0003
9-
max_grad_norm: 10
10-
learning_rate_decay: 0.5
11-
start_decay_at: 2
12-
emb_size: 512
13-
hidden_size: 512
14-
dec_num_layers: 2
15-
enc_num_layers: 2
16-
bidirectional: True
17-
dropout: 0.2
18-
max_time_step: 30
19-
eval_interval: 500
20-
save_interval: 1000
21-
unk: False
22-
schedule: False
23-
schesamp: False
24-
length_norm: True
25-
metrics: ['hamming_loss', 'macro_f1', 'micro_f1']
26-
shared_vocab: False
27-
beam_size: 5
1+
data: '/home/linjunyang/multilabel_rcv/data/'
2+
logF: 'experiments/rcv/'
3+
epoch: 20
4+
batch_size: 64
5+
optim: 'adam'
6+
cell: 'lstm'
7+
attention: 'luong_gate'
8+
learning_rate: 0.0003
9+
max_grad_norm: 10
10+
learning_rate_decay: 0.5
11+
start_decay_at: 2
12+
emb_size: 512
13+
hidden_size: 512
14+
dec_num_layers: 2
15+
enc_num_layers: 2
16+
bidirectional: True
17+
dropout: 0.2
18+
max_time_step: 30
19+
eval_interval: 500
20+
save_interval: 1000
21+
unk: False
22+
schedule: False
23+
schesamp: False
24+
length_norm: True
25+
metrics: ['hamming_loss', 'macro_f1', 'micro_f1']
26+
shared_vocab: False
27+
beam_size: 5
2828
dilated: True

0 commit comments

Comments
 (0)