diff --git a/README.md b/README.md
index 8832637..be3be65 100644
--- a/README.md
+++ b/README.md
@@ -1,14 +1,11 @@
-
+
-
-
-
-
+
---
@@ -16,30 +13,33 @@
**NLP-Models-Tensorflow**, Gathers machine learning and tensorflow deep learning models for NLP problems, **code simplify inside Jupyter Notebooks 100%**.
## Table of contents
- * [Text classification](https://github.com/huseinzol05/NLP-Models-Tensorflow#text-classification)
- * [Chatbot](https://github.com/huseinzol05/NLP-Models-Tensorflow#chatbot)
- * [Neural Machine Translation](https://github.com/huseinzol05/NLP-Models-Tensorflow#neural-machine-translation-english-to-vietnam)
- * [Embedded](https://github.com/huseinzol05/NLP-Models-Tensorflow#embedded)
- * [Entity-Tagging](https://github.com/huseinzol05/NLP-Models-Tensorflow#entity-tagging)
- * [POS-Tagging](https://github.com/huseinzol05/NLP-Models-Tensorflow#pos-tagging)
- * [Dependency-Parser](https://github.com/huseinzol05/NLP-Models-Tensorflow#dependency-parser)
- * [SQUAD Question-Answers](https://github.com/huseinzol05/NLP-Models-Tensorflow#squad-question-answers)
- * [Question-Answers](https://github.com/huseinzol05/NLP-Models-Tensorflow#question-answers)
- * [Abstractive Summarization](https://github.com/huseinzol05/NLP-Models-Tensorflow#abstractive-summarization)
- * [Extractive Summarization](https://github.com/huseinzol05/NLP-Models-Tensorflow#extractive-summarization)
- * [Stemming](https://github.com/huseinzol05/NLP-Models-Tensorflow#stemming)
- * [Generator](https://github.com/huseinzol05/NLP-Models-Tensorflow#generator)
- * [Topic Generator](https://github.com/huseinzol05/NLP-Models-Tensorflow#topic-generator)
- * [Language detection](https://github.com/huseinzol05/NLP-Models-Tensorflow#language-detection)
- * [OCR (optical character recognition)](https://github.com/huseinzol05/NLP-Models-Tensorflow#ocr-optical-character-recognition)
- * [Sentence-Pair classification](https://github.com/huseinzol05/NLP-Models-Tensorflow#sentence-pair)
- * [Speech to Text](https://github.com/huseinzol05/NLP-Models-Tensorflow#speech-to-text)
- * [Text to Speech](https://github.com/huseinzol05/NLP-Models-Tensorflow#text-to-speech)
- * [Old-to-Young Vocoder](https://github.com/huseinzol05/NLP-Models-Tensorflow#old-to-young-vocoder)
- * [Text Similarity](https://github.com/huseinzol05/NLP-Models-Tensorflow#text-similarity)
- * [Text Augmentation](https://github.com/huseinzol05/NLP-Models-Tensorflow#text-augmentation)
- * [Miscellaneous](https://github.com/huseinzol05/NLP-Models-Tensorflow#Miscellaneous)
- * [Attention](https://github.com/huseinzol05/NLP-Models-Tensorflow#attention)
+ * [Abstractive Summarization](#abstractive-summarization)
+ * [Chatbot](#chatbot)
+ * [Dependency Parser](#dependency-parser)
+ * [Entity Tagging](#entity-tagging)
+ * [Extractive Summarization](#extractive-summarization)
+ * [Generator](#generator)
+ * [Language Detection](#language-detection)
+ * [Neural Machine Translation](neural-machine-translation)
+ * [OCR](#ocr-optical-character-recognition)
+ * [POS Tagging](#pos-tagging)
+ * [Question-Answers](#question-answers)
+ * [Sentence pairs](#sentence-pair)
+ * [Speech-to-Text](#speech-to-text)
+ * [Spelling correction](#spelling-correction)
+ * [SQUAD Question-Answers](#squad-question-answers)
+ * [Stemming](#stemming)
+ * [Text Augmentation](#text-augmentation)
+ * [Text Classification](#text-classification)
+ * [Text Similarity](#text-similarity)
+ * [Text-to-Speech](#text-to-speech)
+ * [Topic Generator](#topic-generator)
+ * [Topic Modeling](#topic-modeling)
+ * [Unsupervised Extractive Summarization](#unsupervised-extractive-summarization)
+ * [Vectorizer](#vectorizer)
+ * [Old-to-Young Vocoder](#old-to-young-vocoder)
+ * [Visualization](#visualization)
+ * [Attention](#attention)
## Objective
@@ -49,125 +49,40 @@ I will attached github repositories for models that I not implemented from scrat
## Tensorflow version
-Tensorflow version 1.10 and above only, not included 2.X version.
+Tensorflow version 1.13 and above only, not included 2.X version. 1.13 < Tensorflow < 2.0
+
+```bash
+pip install -r requirements.txt
+```
## Contents
-### [Text classification](text-classification)
+### [Abstractive Summarization](abstractive-summarization)
-Trained on [English sentiment dataset](https://github.com/huseinzol05/NLP-Models-Tensorflow/tree/master/text-classification/data).
+Trained on [India news](abstractive-summarization/dataset).
-1. Basic cell RNN
-2. Bidirectional RNN
-3. LSTM cell RNN
-4. GRU cell RNN
-5. LSTM RNN + Conv2D
-6. K-max Conv1d
-7. LSTM RNN + Conv1D + Highway
-8. LSTM RNN with Attention
-9. Neural Turing Machine
-10. BERT
-11. Dynamic Memory Network
-12. XL-net
-
-Complete list (76 notebooks)
+Accuracy based on 10 epochs only, calculated using word positions.
-1. Basic cell RNN
-2. Basic cell RNN + Hinge
-3. Basic cell RNN + Huber
-4. Basic cell Bidirectional RNN
-5. Basic cell Bidirectional RNN + Hinge
-6. Basic cell Bidirectional RNN + Huber
-7. LSTM cell RNN
-8. LSTM cell RNN + Hinge
-9. LSTM cell RNN + Huber
-10. LSTM cell Bidirectional RNN
-11. LSTM cell Bidirectional RNN + Huber
-12. LSTM cell RNN + Dropout + L2
-13. GRU cell RNN
-14. GRU cell RNN + Hinge
-15. GRU cell RNN + Huber
-16. GRU cell Bidirectional RNN
-17. GRU cell Bidirectional RNN + Hinge
-18. GRU cell Bidirectional RNN + Huber
-19. LSTM RNN + Conv2D
-20. K-max Conv1d
-21. LSTM RNN + Conv1D + Highway
-22. LSTM RNN + Basic Attention
-23. LSTM Dilated RNN
-24. Layer-Norm LSTM cell RNN
-25. Only Attention Neural Network
-26. Multihead-Attention Neural Network
-27. Neural Turing Machine
-28. LSTM Seq2Seq
-29. LSTM Seq2Seq + Luong Attention
-30. LSTM Seq2Seq + Bahdanau Attention
-31. LSTM Seq2Seq + Beam Decoder
-32. LSTM Bidirectional Seq2Seq
-33. Pointer Net
-34. LSTM cell RNN + Bahdanau Attention
-35. LSTM cell RNN + Luong Attention
-36. LSTM cell RNN + Stack Bahdanau Luong Attention
-37. LSTM cell Bidirectional RNN + backward Bahdanau + forward Luong
-38. Bytenet
-39. Fast-slow LSTM
-40. Siamese Network
-41. LSTM Seq2Seq + tf.estimator
-42. Capsule layers + RNN LSTM
-43. Capsule layers + LSTM Seq2Seq
-44. Capsule layers + LSTM Bidirectional Seq2Seq
-45. Nested LSTM
-46. LSTM Seq2Seq + Highway
-47. Triplet loss + LSTM
-48. DNC (Differentiable Neural Computer)
-49. ConvLSTM
-50. Temporal Convd Net
-51. Batch-all Triplet-loss + LSTM
-52. Fast-text
-53. Gated Convolution Network
-54. Simple Recurrent Unit
-55. LSTM Hierarchical Attention Network
-56. Bidirectional Transformers
-57. Dynamic Memory Network
-58. Entity Network
-59. End-to-End Memory Network
-60. BOW-Chars Deep sparse Network
-61. Residual Network using Atrous CNN
-62. Residual Network using Atrous CNN + Bahdanau Attention
-63. Deep pyramid CNN
-64. Transformer-XL
-65. Transfer learning GPT-2 345M
-66. Quasi-RNN
-67. Tacotron
-68. Slice GRU
-69. Slice GRU + Bahdanau
-70. Wavenet
-71. Transfer learning BERT Base
-72. Transfer learning XL-net Large
-73. LSTM BiRNN global Max and average pooling
-74. Transfer learning BERT Base drop 6 layers
-75. Transfer learning BERT Large drop 12 layers
-76. Transfer learning XL-net Base
+Complete list (12 notebooks)
+
+1. LSTM Seq2Seq using topic modelling, test accuracy 13.22%
+2. LSTM Seq2Seq + Luong Attention using topic modelling, test accuracy 12.39%
+3. LSTM Seq2Seq + Beam Decoder using topic modelling, test accuracy 10.67%
+4. LSTM Bidirectional + Luong Attention + Beam Decoder using topic modelling, test accuracy 8.29%
+5. Pointer-Generator + Bahdanau, https://github.com/xueyouluo/my_seq2seq, test accuracy 15.51%
+6. Copynet, test accuracy 11.15%
+7. Pointer-Generator + Luong, https://github.com/xueyouluo/my_seq2seq, test accuracy 16.51%
+8. Dilated Seq2Seq, test accuracy 10.88%
+9. Dilated Seq2Seq + Self Attention, test accuracy 11.54%
+10. BERT + Dilated CNN Seq2seq, test accuracy 13.5%
+11. self-attention + Pointer-Generator, test accuracy 4.34%
+12. Dilated-CNN Seq2seq + Pointer-Generator, test accuracy 5.57%
### [Chatbot](chatbot)
-Trained on [Cornell Movie Dialog corpus](https://github.com/huseinzol05/NLP-Models-Tensorflow/blob/master/chatbot/dataset.tar.gz).
-
-1. Seq2Seq-manual
-2. Seq2Seq-API Greedy
-3. Bidirectional Seq2Seq-manual
-4. Bidirectional Seq2Seq-API Greedy
-5. Bidirectional Seq2Seq-manual + backward Bahdanau + forward Luong
-6. Bidirectional Seq2Seq-API + backward Bahdanau + forward Luong + Stack Bahdanau Luong Attention + Beam Decoder
-7. Bytenet
-8. Capsule layers + LSTM Seq2Seq-API + Luong Attention + Beam Decoder
-9. End-to-End Memory Network
-10. Attention is All you need
-11. Transformer-XL + LSTM
-12. GPT-2 + LSTM
-13. Tacotron + Beam decoder
+Trained on [Cornell Movie Dialog corpus](chatbot/dataset.tar.gz), accuracy table in [chatbot](chatbot).
Complete list (54 notebooks)
@@ -220,7 +135,7 @@ Trained on [Cornell Movie Dialog corpus](https://github.com/huseinzol05/NLP-Mode
47. Attention is all you need + Beam Search
48. Transformer-XL + LSTM
49. GPT-2 + LSTM
-50. Fairseq
+50. CNN Seq2seq
51. Conv-Encoder + LSTM
52. Tacotron + Greedy decoder
53. Tacotron + Beam decoder
@@ -228,103 +143,171 @@ Trained on [Cornell Movie Dialog corpus](https://github.com/huseinzol05/NLP-Mode
-### [Neural Machine Translation](neural-machine-translation)
+### [Dependency-Parser](dependency-parser)
-Trained on [500 English-Vietnam](https://github.com/huseinzol05/NLP-Models-Tensorflow/blob/master/neural-machine-translation/vietnam-train).
+Trained on [CONLL English Dependency](https://github.com/UniversalDependencies/UD_English-EWT). Train set to train, dev and test sets to test.
-1. Seq2Seq-manual
-2. Seq2Seq-API Greedy
-3. Bidirectional Seq2Seq-manual
-4. Bidirectional Seq2Seq-API Greedy
-5. Bidirectional Seq2Seq-manual + backward Bahdanau + forward Luong
-6. Bidirectional Seq2Seq-API + backward Bahdanau + forward Luong + Stack Bahdanau Luong Attention + Beam Decoder
-7. Bytenet
-8. Capsule layers + LSTM Seq2Seq-API + Luong Attention + Beam Decoder
-9. End-to-End Memory Network
-10. Attention is All you need
-11. BERT + Dilated Fairseq
+Stackpointer and Biaffine-attention originally from https://github.com/XuezheMax/NeuroNLP2 written in Pytorch.
-Complete list (55 notebooks)
+Accuracy based on arc, types and root accuracies after 15 epochs only.
-1. Basic cell Seq2Seq-manual
-2. LSTM Seq2Seq-manual
-3. GRU Seq2Seq-manual
-4. Basic cell Seq2Seq-API Greedy
-5. LSTM Seq2Seq-API Greedy
-6. GRU Seq2Seq-API Greedy
-7. Basic cell Bidirectional Seq2Seq-manual
-8. LSTM Bidirectional Seq2Seq-manual
-9. GRU Bidirectional Seq2Seq-manual
-10. Basic cell Bidirectional Seq2Seq-API Greedy
-11. LSTM Bidirectional Seq2Seq-API Greedy
-12. GRU Bidirectional Seq2Seq-API Greedy
-13. Basic cell Seq2Seq-manual + Luong Attention
-14. LSTM Seq2Seq-manual + Luong Attention
-15. GRU Seq2Seq-manual + Luong Attention
-16. Basic cell Seq2Seq-manual + Bahdanau Attention
-17. LSTM Seq2Seq-manual + Bahdanau Attention
-18. GRU Seq2Seq-manual + Bahdanau Attention
-19. LSTM Bidirectional Seq2Seq-manual + Luong Attention
-20. GRU Bidirectional Seq2Seq-manual + Luong Attention
-21. LSTM Bidirectional Seq2Seq-manual + Bahdanau Attention
-22. GRU Bidirectional Seq2Seq-manual + Bahdanau Attention
-23. LSTM Bidirectional Seq2Seq-manual + backward Bahdanau + forward Luong
-24. GRU Bidirectional Seq2Seq-manual + backward Bahdanau + forward Luong
-25. LSTM Seq2Seq-API Greedy + Luong Attention
-26. GRU Seq2Seq-API Greedy + Luong Attention
-27. LSTM Seq2Seq-API Greedy + Bahdanau Attention
-28. GRU Seq2Seq-API Greedy + Bahdanau Attention
-29. LSTM Seq2Seq-API Beam Decoder
-30. GRU Seq2Seq-API Beam Decoder
-31. LSTM Bidirectional Seq2Seq-API + Luong Attention + Beam Decoder
-32. GRU Bidirectional Seq2Seq-API + Luong Attention + Beam Decoder
-33. LSTM Bidirectional Seq2Seq-API + backward Bahdanau + forward Luong + Stack Bahdanau Luong Attention + Beam Decoder
-34. GRU Bidirectional Seq2Seq-API + backward Bahdanau + forward Luong + Stack Bahdanau Luong Attention + Beam Decoder
-35. Bytenet
-36. LSTM Seq2Seq + tf.estimator
-37. Capsule layers + LSTM Seq2Seq-API Greedy
-38. Capsule layers + LSTM Seq2Seq-API + Luong Attention + Beam Decoder
-39. LSTM Bidirectional Seq2Seq-API + backward Bahdanau + forward Luong + Stack Bahdanau Luong Attention + Beam Decoder + Dropout + L2
-40. DNC Seq2Seq
-41. LSTM Bidirectional Seq2Seq-API + Luong Monotic Attention + Beam Decoder
-42. LSTM Bidirectional Seq2Seq-API + Bahdanau Monotic Attention + Beam Decoder
-43. End-to-End Memory Network + Basic cell
-44. End-to-End Memory Network + LSTM cell
-45. Attention is all you need
-46. Transformer-XL
-47. Attention is all you need + Beam Search
-48. Fairseq
-49. Conv-Encoder + LSTM
-50. Bytenet Greedy
-51. Residual GRU Bidirectional Seq2Seq-API Greedy
-52. Google NMT
-53. Dilated Seq2Seq
-54. BERT Encoder + LSTM Luong Decoder
-55. BERT Encoder + Dilated Fairseq
+Complete list (8 notebooks)
+
+1. Bidirectional RNN + CRF + Biaffine, arc accuracy 70.48%, types accuracy 65.18%, root accuracy 66.4%
+2. Bidirectional RNN + Bahdanau + CRF + Biaffine, arc accuracy 70.82%, types accuracy 65.33%, root accuracy 66.77%
+3. Bidirectional RNN + Luong + CRF + Biaffine, arc accuracy 71.22%, types accuracy 65.73%, root accuracy 67.23%
+4. BERT Base + CRF + Biaffine, arc accuracy 64.30%, types accuracy 62.89%, root accuracy 74.19%
+5. Bidirectional RNN + Biaffine Attention + Cross Entropy, arc accuracy 72.42%, types accuracy 63.53%, root accuracy 68.51%
+6. BERT Base + Biaffine Attention + Cross Entropy, arc accuracy 72.85%, types accuracy 67.11%, root accuracy 73.93%
+7. Bidirectional RNN + Stackpointer, arc accuracy 61.88%, types accuracy 48.20%, root accuracy 89.39%
+8. XLNET Base + Biaffine Attention + Cross Entropy, arc accuracy 74.41%, types accuracy 71.37%, root accuracy 73.17%
-### [Embedded](embedded)
+### [Entity-Tagging](entity-tagging)
-Trained on [English sentiment dataset](https://github.com/huseinzol05/NLP-Models-Tensorflow/tree/master/text-classification/data).
+Trained on [CONLL NER](https://cogcomp.org/page/resource_view/81).
-1. Word Vector using CBOW sample softmax
-2. Word Vector using CBOW noise contrastive estimation
-3. Word Vector using skipgram sample softmax
-4. Word Vector using skipgram noise contrastive estimation
-5. Lda2Vec Tensorflow
-6. Supervised Embedded
-7. Triplet-loss + LSTM
-8. LSTM Auto-Encoder
-9. Batch-All Triplet-loss LSTM
-10. Fast-text
-11. ELMO (biLM)
-12. Triplet-loss + BERT
+Complete list (9 notebooks)
+
+1. Bidirectional RNN + CRF, test accuracy 96%
+2. Bidirectional RNN + Luong Attention + CRF, test accuracy 93%
+3. Bidirectional RNN + Bahdanau Attention + CRF, test accuracy 95%
+4. Char Ngrams + Bidirectional RNN + Bahdanau Attention + CRF, test accuracy 96%
+5. Char Ngrams + Bidirectional RNN + Bahdanau Attention + CRF, test accuracy 96%
+6. Char Ngrams + Residual Network + Bahdanau Attention + CRF, test accuracy 69%
+7. Char Ngrams + Attention is you all Need + CRF, test accuracy 90%
+8. BERT, test accuracy 99%
+9. XLNET-Base, test accuracy 99%
+
+
+
+### [Extractive Summarization](extractive-summarization)
+
+Trained on [CNN News dataset](https://cs.nyu.edu/~kcho/DMQA/).
+
+Accuracy based on ROUGE-2.
+
+Complete list (4 notebooks)
+
+1. LSTM RNN, test accuracy 16.13%
+2. Dilated-CNN, test accuracy 15.54%
+3. Multihead Attention, test accuracy 26.33%
+4. BERT-Base
+
+
+
+### [Generator](generator)
+
+Trained on [Shakespeare dataset](generator/shakespeare.txt).
+
+Complete list (15 notebooks)
+
+1. Character-wise RNN + LSTM
+2. Character-wise RNN + Beam search
+3. Character-wise RNN + LSTM + Embedding
+4. Word-wise RNN + LSTM
+5. Word-wise RNN + LSTM + Embedding
+6. Character-wise + Seq2Seq + GRU
+7. Word-wise + Seq2Seq + GRU
+8. Character-wise RNN + LSTM + Bahdanau Attention
+9. Character-wise RNN + LSTM + Luong Attention
+10. Word-wise + Seq2Seq + GRU + Beam
+11. Character-wise + Seq2Seq + GRU + Bahdanau Attention
+12. Word-wise + Seq2Seq + GRU + Bahdanau Attention
+13. Character-wise Dilated CNN + Beam search
+14. Transformer + Beam search
+15. Transformer XL + Beam search
+
+
+
+### [Language-detection](language-detection)
+
+Trained on [Tatoeba dataset](http://downloads.tatoeba.org/exports/sentences.tar.bz2).
+
+Complete list (1 notebooks)
+
+1. Fast-text Char N-Grams
+
+
+
+### [Neural Machine Translation](neural-machine-translation)
+
+Trained on [English-French](https://github.com/tensorflow/tensor2tensor/blob/master/tensor2tensor/data_generators/translate_enfr.py), accuracy table in [neural-machine-translation](neural-machine-translation).
+
+Complete list (53 notebooks)
+
+1.basic-seq2seq
+2.lstm-seq2seq
+3.gru-seq2seq
+4.basic-seq2seq-contrib-greedy
+5.lstm-seq2seq-contrib-greedy
+6.gru-seq2seq-contrib-greedy
+7.basic-birnn-seq2seq
+8.lstm-birnn-seq2seq
+9.gru-birnn-seq2seq
+10.basic-birnn-seq2seq-contrib-greedy
+11.lstm-birnn-seq2seq-contrib-greedy
+12.gru-birnn-seq2seq-contrib-greedy
+13.basic-seq2seq-luong
+14.lstm-seq2seq-luong
+15.gru-seq2seq-luong
+16.basic-seq2seq-bahdanau
+17.lstm-seq2seq-bahdanau
+18.gru-seq2seq-bahdanau
+19.basic-birnn-seq2seq-bahdanau
+20.lstm-birnn-seq2seq-bahdanau
+21.gru-birnn-seq2seq-bahdanau
+22.basic-birnn-seq2seq-luong
+23.lstm-birnn-seq2seq-luong
+24.gru-birnn-seq2seq-luong
+25.lstm-seq2seq-contrib-greedy-luong
+26.gru-seq2seq-contrib-greedy-luong
+27.lstm-seq2seq-contrib-greedy-bahdanau
+28.gru-seq2seq-contrib-greedy-bahdanau
+29.lstm-seq2seq-contrib-beam-luong
+30.gru-seq2seq-contrib-beam-luong
+31.lstm-seq2seq-contrib-beam-bahdanau
+32.gru-seq2seq-contrib-beam-bahdanau
+33.lstm-birnn-seq2seq-contrib-beam-bahdanau
+34.lstm-birnn-seq2seq-contrib-beam-luong
+35.gru-birnn-seq2seq-contrib-beam-bahdanau
+36.gru-birnn-seq2seq-contrib-beam-luong
+37.lstm-birnn-seq2seq-contrib-beam-luongmonotonic
+38.gru-birnn-seq2seq-contrib-beam-luongmonotic
+39.lstm-birnn-seq2seq-contrib-beam-bahdanaumonotonic
+40.gru-birnn-seq2seq-contrib-beam-bahdanaumonotic
+41.residual-lstm-seq2seq-greedy-luong
+42.residual-gru-seq2seq-greedy-luong
+43.residual-lstm-seq2seq-greedy-bahdanau
+44.residual-gru-seq2seq-greedy-bahdanau
+45.memory-network-lstm-decoder-greedy
+46.google-nmt
+47.transformer-encoder-transformer-decoder
+48.transformer-encoder-lstm-decoder-greedy
+49.bertmultilanguage-encoder-bertmultilanguage-decoder
+50.bertmultilanguage-encoder-lstm-decoder
+51.bertmultilanguage-encoder-transformer-decoder
+52.bertenglish-encoder-transformer-decoder
+53.transformer-t2t-2gpu
+
+
+
+### [OCR (optical character recognition)](ocr)
+
+Complete list (2 notebooks)
+
+1. CNN + LSTM RNN, test accuracy 100%
+2. Im2Latex, test accuracy 100%
+
+
### [POS-Tagging](pos-tagging)
Trained on [CONLL POS](https://cogcomp.org/page/resource_view/81).
+Complete list (8 notebooks)
+
1. Bidirectional RNN + CRF, test accuracy 92%
2. Bidirectional RNN + Luong Attention + CRF, test accuracy 91%
3. Bidirectional RNN + Bahdanau Attention + CRF, test accuracy 91%
@@ -334,50 +317,80 @@ Trained on [CONLL POS](https://cogcomp.org/page/resource_view/81).
7. Char Ngrams + Attention is you all Need + CRF, test accuracy 89%
8. BERT, test accuracy 99%
-### [Entity-Tagging](entity-tagging)
+
-Trained on [CONLL NER](https://cogcomp.org/page/resource_view/81).
+### [Question-Answers](question-answer)
-1. Bidirectional RNN + CRF, test accuracy 96%
-2. Bidirectional RNN + Luong Attention + CRF, test accuracy 93%
-3. Bidirectional RNN + Bahdanau Attention + CRF, test accuracy 95%
-4. Char Ngrams + Bidirectional RNN + Bahdanau Attention + CRF, test accuracy 96%
-5. Char Ngrams + Bidirectional RNN + Bahdanau Attention + CRF, test accuracy 96%
-6. Char Ngrams + Residual Network + Bahdanau Attention + CRF, test accuracy 69%
-7. Char Ngrams + Attention is you all Need + CRF, test accuracy 90%
-8. BERT, test accuracy 99%
+Trained on [bAbI Dataset](https://research.fb.com/downloads/babi/).
-### [Dependency-Parser](dependency-parser)
+Complete list (4 notebooks)
+
+1. End-to-End Memory Network + Basic cell
+2. End-to-End Memory Network + GRU cell
+3. End-to-End Memory Network + LSTM cell
+4. Dynamic Memory
+
+
+
+### [Sentence-pair](sentence-pair)
+
+Trained on [Cornell Movie--Dialogs Corpus](https://people.mpi-sws.org/~cristian/Cornell_Movie-Dialogs_Corpus.html)
-Trained on [CONLL English Dependency](https://github.com/huseinzol05/NLP-Models-Tensorflow/blob/master/dependency-parser/dev.conll.txt).
+Complete list (1 notebooks)
-1. Bidirectional RNN + Bahdanau Attention + CRF
-2. Bidirectional RNN + Luong Attention + CRF
-3. Residual Network + Bahdanau Attention + CRF
-4. Residual Network + Bahdanau Attention + Char Embedded + CRF
-5. Attention is all you need + CRF
+1. BERT
+
+
+
+### [Speech to Text](speech-to-text)
+
+Trained on [Toronto speech dataset](https://tspace.library.utoronto.ca/handle/1807/24487).
+
+Complete list (11 notebooks)
+
+1. Tacotron, https://github.com/Kyubyong/tacotron_asr, test accuracy 77.09%
+2. BiRNN LSTM, test accuracy 84.66%
+3. BiRNN Seq2Seq + Luong Attention + Cross Entropy, test accuracy 87.86%
+4. BiRNN Seq2Seq + Bahdanau Attention + Cross Entropy, test accuracy 89.28%
+5. BiRNN Seq2Seq + Bahdanau Attention + CTC, test accuracy 86.35%
+6. BiRNN Seq2Seq + Luong Attention + CTC, test accuracy 80.30%
+7. CNN RNN + Bahdanau Attention, test accuracy 80.23%
+8. Dilated CNN RNN, test accuracy 31.60%
+9. Wavenet, test accuracy 75.11%
+10. Deep Speech 2, test accuracy 81.40%
+11. Wav2Vec Transfer learning BiRNN LSTM, test accuracy 83.24%
+
+
+
+### [Spelling correction](spelling-correction)
+
+Complete list (4 notebooks)
+
+1. BERT-Base
+2. XLNET-Base
+3. BERT-Base Fast
+4. BERT-Base accurate
+
+
### [SQUAD Question-Answers](squad-qa)
Trained on [SQUAD Dataset](https://rajpurkar.github.io/SQuAD-explorer/).
+Complete list (1 notebooks)
+
1. BERT,
```json
{"exact_match": 77.57805108798486, "f1": 86.18327335287402}
```
-### [Question-Answers](question-answer)
-
-Trained on [bAbI Dataset](https://research.fb.com/downloads/babi/).
-
-1. End-to-End Memory Network + Basic cell
-2. End-to-End Memory Network + GRU cell
-3. End-to-End Memory Network + LSTM cell
-4. Dynamic Memory
+
### [Stemming](stemming)
-Trained on [English Lemmatization](https://github.com/huseinzol05/NLP-Models-Tensorflow/blob/master/stemming/lemmatization-en.txt).
+Trained on [English Lemmatization](stemming/lemmatization-en.txt).
+
+Complete list (6 notebooks)
1. LSTM + Seq2Seq + Beam
2. GRU + Seq2Seq + Beam
@@ -386,66 +399,138 @@ Trained on [English Lemmatization](https://github.com/huseinzol05/NLP-Models-Ten
5. DNC + Seq2Seq + Greedy
6. BiRNN + Bahdanau + Copynet
-### [Abstractive Summarization](abstractive-summarization)
+
-Trained on [India news](https://github.com/huseinzol05/NLP-Models-Tensorflow/tree/master/abstractive-summarization/dataset).
+### [Text Augmentation](text-augmentation)
-Accuracy based on 10 epochs only, calculated using word positions.
+Complete list (8 notebooks)
-1. LSTM Seq2Seq using topic modelling, test accuracy 13.22%
-2. LSTM Seq2Seq + Luong Attention using topic modelling, test accuracy 12.39%
-3. LSTM Seq2Seq + Beam Decoder using topic modelling, test accuracy 10.67%
-4. LSTM Bidirectional + Luong Attention + Beam Decoder using topic modelling, test accuracy 8.29%
-5. Pointer-Generator + Bahdanau, https://github.com/xueyouluo/my_seq2seq, test accuracy 15.51%
-6. Copynet, test accuracy 11.15%
-7. Pointer-Generator + Luong, https://github.com/xueyouluo/my_seq2seq, test accuracy 16.51%
-8. Dilated Seq2Seq, test accuracy 10.88%
-9. Dilated Seq2Seq + Self Attention, test accuracy 11.54%
-10. BERT + Dilated Fairseq, test accuracy 13.5%
-11. self-attention + Pointer-Generator, test accuracy 4.34%
-12. Dilated-Fairseq + Pointer-Generator, test accuracy 5.57%
+1. Pretrained Glove
+2. GRU VAE-seq2seq-beam TF-probability
+3. LSTM VAE-seq2seq-beam TF-probability
+4. GRU VAE-seq2seq-beam + Bahdanau Attention TF-probability
+5. VAE + Deterministic Bahdanau Attention, https://github.com/HareeshBahuleyan/tf-var-attention
+6. VAE + VAE Bahdanau Attention, https://github.com/HareeshBahuleyan/tf-var-attention
+7. BERT-Base + Nucleus Sampling
+8. XLNET-Base + Nucleus Sampling
-### [Extractive Summarization](extractive-summarization)
+
-Trained on [random books](https://github.com/huseinzol05/NLP-Models-Tensorflow/tree/master/extractive-summarization/books).
+### [Text classification](text-classification)
-1. Skip-thought Vector
-2. Residual Network using Atrous CNN
-3. Residual Network using Atrous CNN + Bahdanau Attention
+Trained on [English sentiment dataset](text-classification/data), accuracy table in [text-classification](text-classification).
-### [OCR (optical character recognition)](ocr)
+Complete list (79 notebooks)
-1. CNN + LSTM RNN
+1. Basic cell RNN
+2. Basic cell RNN + Hinge
+3. Basic cell RNN + Huber
+4. Basic cell Bidirectional RNN
+5. Basic cell Bidirectional RNN + Hinge
+6. Basic cell Bidirectional RNN + Huber
+7. LSTM cell RNN
+8. LSTM cell RNN + Hinge
+9. LSTM cell RNN + Huber
+10. LSTM cell Bidirectional RNN
+11. LSTM cell Bidirectional RNN + Huber
+12. LSTM cell RNN + Dropout + L2
+13. GRU cell RNN
+14. GRU cell RNN + Hinge
+15. GRU cell RNN + Huber
+16. GRU cell Bidirectional RNN
+17. GRU cell Bidirectional RNN + Hinge
+18. GRU cell Bidirectional RNN + Huber
+19. LSTM RNN + Conv2D
+20. K-max Conv1d
+21. LSTM RNN + Conv1D + Highway
+22. LSTM RNN + Basic Attention
+23. LSTM Dilated RNN
+24. Layer-Norm LSTM cell RNN
+25. Only Attention Neural Network
+26. Multihead-Attention Neural Network
+27. Neural Turing Machine
+28. LSTM Seq2Seq
+29. LSTM Seq2Seq + Luong Attention
+30. LSTM Seq2Seq + Bahdanau Attention
+31. LSTM Seq2Seq + Beam Decoder
+32. LSTM Bidirectional Seq2Seq
+33. Pointer Net
+34. LSTM cell RNN + Bahdanau Attention
+35. LSTM cell RNN + Luong Attention
+36. LSTM cell RNN + Stack Bahdanau Luong Attention
+37. LSTM cell Bidirectional RNN + backward Bahdanau + forward Luong
+38. Bytenet
+39. Fast-slow LSTM
+40. Siamese Network
+41. LSTM Seq2Seq + tf.estimator
+42. Capsule layers + RNN LSTM
+43. Capsule layers + LSTM Seq2Seq
+44. Capsule layers + LSTM Bidirectional Seq2Seq
+45. Nested LSTM
+46. LSTM Seq2Seq + Highway
+47. Triplet loss + LSTM
+48. DNC (Differentiable Neural Computer)
+49. ConvLSTM
+50. Temporal Convd Net
+51. Batch-all Triplet-loss + LSTM
+52. Fast-text
+53. Gated Convolution Network
+54. Simple Recurrent Unit
+55. LSTM Hierarchical Attention Network
+56. Bidirectional Transformers
+57. Dynamic Memory Network
+58. Entity Network
+59. End-to-End Memory Network
+60. BOW-Chars Deep sparse Network
+61. Residual Network using Atrous CNN
+62. Residual Network using Atrous CNN + Bahdanau Attention
+63. Deep pyramid CNN
+64. Transformer-XL
+65. Transfer learning GPT-2 345M
+66. Quasi-RNN
+67. Tacotron
+68. Slice GRU
+69. Slice GRU + Bahdanau
+70. Wavenet
+71. Transfer learning BERT Base
+72. Transfer learning XL-net Large
+73. LSTM BiRNN global Max and average pooling
+74. Transfer learning BERT Base drop 6 layers
+75. Transfer learning BERT Large drop 12 layers
+76. Transfer learning XL-net Base
+77. Transfer learning ALBERT
+78. Transfer learning ELECTRA Base
+79. Transfer learning ELECTRA Large
-### [Sentence-pair](sentence-pair)
+
-Trained on [Cornell Movie--Dialogs Corpus](https://people.mpi-sws.org/~cristian/Cornell_Movie-Dialogs_Corpus.html)
+### [Text Similarity](text-similarity)
-1. BERT
+Trained on [MNLI](https://cims.nyu.edu/~sbowman/multinli/).
-### [Speech to Text](speech-to-text)
+Complete list (10 notebooks)
-Trained on [Toronto speech dataset](https://tspace.library.utoronto.ca/handle/1807/24487).
+1. BiRNN + Contrastive loss, test accuracy 73.032%
+2. BiRNN + Cross entropy, test accuracy 74.265%
+3. BiRNN + Circle loss, test accuracy 75.857%
+4. BiRNN + Proxy loss, test accuracy 48.37%
+5. BERT Base + Cross entropy, test accuracy 91.123%
+6. BERT Base + Circle loss, test accuracy 89.903%
+7. ELECTRA Base + Cross entropy, test accuracy 96.317%
+8. ELECTRA Base + Circle loss, test accuracy 95.603%
+9. XLNET Base + Cross entropy, test accuracy 93.998%
+10. XLNET Base + Circle loss, test accuracy 94.033%
-1. Tacotron, https://github.com/Kyubyong/tacotron_asr
-2. Bidirectional RNN + Greedy CTC
-3. Bidirectional RNN + Beam CTC
-4. Seq2Seq + Bahdanau Attention + Beam CTC
-5. Seq2Seq + Luong Attention + Beam CTC
-6. Bidirectional RNN + Attention + Beam CTC
-7. Wavenet
-8. CNN encoder + RNN decoder + Bahdanau Attention
-9. CNN encoder + RNN decoder + Luong Attention
-10. Dilation CNN + GRU Bidirectional
-11. Deep speech 2
-12. Pyramid Dilated CNN
+
### [Text to Speech](text-to-speech)
Trained on [Toronto speech dataset](https://tspace.library.utoronto.ca/handle/1807/24487).
+Complete list (8 notebooks)
+
1. Tacotron, https://github.com/Kyubyong/tacotron
-2. Fairseq + Dilated CNN vocoder
+2. CNN Seq2seq + Dilated CNN vocoder
3. Seq2Seq + Bahdanau Attention
4. Seq2Seq + Luong Attention
5. Dilated CNN + Monothonic Attention + Dilated CNN vocoder
@@ -453,69 +538,90 @@ Trained on [Toronto speech dataset](https://tspace.library.utoronto.ca/handle/18
7. Deep CNN + Monothonic Attention + Dilated CNN vocoder
8. Deep CNN + Self Attention + Dilated CNN vocoder
-### [Old-to-Young Vocoder](vocoder)
-
-Trained on [Toronto speech dataset](https://tspace.library.utoronto.ca/handle/1807/24487).
-
-1. Dilated CNN
-
-### [Generator](generator)
-
-Trained on [Shakespeare dataset](https://github.com/huseinzol05/NLP-Models-Tensorflow/blob/master/generator/shakespeare.txt).
-
-1. Character-wise RNN + LSTM
-2. Character-wise RNN + Beam search
-3. Character-wise RNN + LSTM + Embedding
-4. Word-wise RNN + LSTM
-5. Word-wise RNN + LSTM + Embedding
-6. Character-wise + Seq2Seq + GRU
-7. Word-wise + Seq2Seq + GRU
-8. Character-wise RNN + LSTM + Bahdanau Attention
-9. Character-wise RNN + LSTM + Luong Attention
-10. Word-wise + Seq2Seq + GRU + Beam
-11. Character-wise + Seq2Seq + GRU + Bahdanau Attention
-12. Word-wise + Seq2Seq + GRU + Bahdanau Attention
-13. Character-wise Dilated CNN + Beam search
-14. Transformer + Beam search
-15. Transformer XL + Beam search
+
### [Topic Generator](topic-generator)
Trained on [Malaysia news](https://github.com/huseinzol05/Malaya-Dataset/raw/master/news/news.zip).
+Complete list (4 notebooks)
+
1. TAT-LSTM
2. TAV-LSTM
3. MTA-LSTM
-4. Dilated Fairseq
+4. Dilated CNN Seq2seq
-### [Language-detection](language-detection)
+
-Trained on [Tatoeba dataset](http://downloads.tatoeba.org/exports/sentences.tar.bz2).
+### [Topic Modeling](topic-model)
-1. Fast-text Char N-Grams
+Extracted from [English sentiment dataset](text-classification/data).
-### [Text Similarity](text-similarity)
+Complete list (3 notebooks)
-Trained on [First Quora Dataset Release: Question Pairs](https://data.quora.com/First-Quora-Dataset-Release-Question-Pairs).
+1. LDA2Vec
+2. BERT Attention
+3. XLNET Attention
-1. BiRNN + Contrastive loss, test accuracy 76.50%
-2. Dilated CNN + Contrastive loss, test accuracy 72.98%
-3. Transformer + Contrastive loss, test accuracy 73.48%
-4. Dilated CNN + Cross entropy, test accuracy 72.27%
-5. Transformer + Cross entropy, test accuracy 71.1%
-6. Transfer learning BERT base + Cross entropy, test accuracy 90%
+
-### [Text Augmentation](text-augmentation)
+### [Unsupervised Extractive Summarization](unsupervised-extractive-summarization)
-1. Pretrained Glove
-2. GRU VAE-seq2seq-beam TF-probability
-3. LSTM VAE-seq2seq-beam TF-probability
-4. GRU VAE-seq2seq-beam + Bahdanau Attention TF-probability
-5. VAE + Deterministic Bahdanau Attention, https://github.com/HareeshBahuleyan/tf-var-attention
-6. VAE + VAE Bahdanau Attention, https://github.com/HareeshBahuleyan/tf-var-attention
+Trained on [random books](extractive-summarization/books).
+
+Complete list (3 notebooks)
+
+1. Skip-thought Vector
+2. Residual Network using Atrous CNN
+3. Residual Network using Atrous CNN + Bahdanau Attention
+
+
+
+### [Vectorizer](vectorizer)
+
+Trained on [English sentiment dataset](text-classification/data).
+
+Complete list (11 notebooks)
+
+1. Word Vector using CBOW sample softmax
+2. Word Vector using CBOW noise contrastive estimation
+3. Word Vector using skipgram sample softmax
+4. Word Vector using skipgram noise contrastive estimation
+5. Supervised Embedded
+6. Triplet-loss + LSTM
+7. LSTM Auto-Encoder
+8. Batch-All Triplet-loss LSTM
+9. Fast-text
+10. ELMO (biLM)
+11. Triplet-loss + BERT
+
+
+
+### [Visualization](visualization)
+
+Complete list (4 notebooks)
+
+1. Attention heatmap on Bahdanau Attention
+2. Attention heatmap on Luong Attention
+3. BERT attention, https://github.com/hsm207/bert_attn_viz
+4. XLNET attention
+
+
+
+### [Old-to-Young Vocoder](vocoder)
+
+Trained on [Toronto speech dataset](https://tspace.library.utoronto.ca/handle/1807/24487).
+
+Complete list (1 notebooks)
+
+1. Dilated CNN
+
+
### [Attention](attention)
+Complete list (8 notebooks)
+
1. Bahdanau
2. Luong
3. Hierarchical
@@ -525,12 +631,7 @@ Trained on [First Quora Dataset Release: Question Pairs](https://data.quora.com/
7. Bahdanau API
8. Luong API
-### [Miscellaneous](misc)
-
-1. Attention heatmap on Bahdanau Attention
-2. Attention heatmap on Luong Attention
-3. BERT attention, https://github.com/hsm207/bert_attn_viz
-4. XLNET attention
+
### [Not-deep-learning](not-deep-learning)
diff --git a/chatbot/README.md b/chatbot/README.md
index ca04b27..ee2e312 100644
--- a/chatbot/README.md
+++ b/chatbot/README.md
@@ -6,7 +6,7 @@
## Accuracy, not sorted
-Based on 20 epochs accuracy. The results will be different on different dataset. Trained on a GTX 960, 4GB VRAM.
+Based on training accuracy for 20 epochs.
| name | accuracy |
|------------------------------------------------------------|----------|
diff --git a/dependency-parser/1.birnn-bahdanau.ipynb b/dependency-parser/1.birnn-bahdanau.ipynb
deleted file mode 100644
index b4bd0f1..0000000
--- a/dependency-parser/1.birnn-bahdanau.ipynb
+++ /dev/null
@@ -1,899 +0,0 @@
-{
- "cells": [
- {
- "cell_type": "code",
- "execution_count": 1,
- "metadata": {},
- "outputs": [],
- "source": [
- "import tensorflow as tf\n",
- "from tqdm import tqdm\n",
- "import numpy as np"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": 2,
- "metadata": {},
- "outputs": [],
- "source": [
- "with open('test.conll.txt') as fopen:\n",
- " corpus = fopen.read().split('\\n')\n",
- " \n",
- "with open('dev.conll.txt') as fopen:\n",
- " corpus_test = fopen.read().split('\\n')"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": 3,
- "metadata": {},
- "outputs": [],
- "source": [
- "word2idx = {'PAD': 0,'NUM':1,'UNK':2}\n",
- "tag2idx = {'PAD': 0}\n",
- "char2idx = {'PAD': 0,'NUM':1,'UNK':2}\n",
- "word_idx = 3\n",
- "tag_idx = 1\n",
- "char_idx = 3\n",
- "\n",
- "def process_corpus(corpus, until = None):\n",
- " global word2idx, tag2idx, char2idx, word_idx, tag_idx, char_idx\n",
- " sentences, words, depends, labels = [], [], [], []\n",
- " temp_sentence, temp_word, temp_depend, temp_label = [], [], [], []\n",
- " for sentence in corpus:\n",
- " if len(sentence):\n",
- " sentence = sentence.split('\\t')\n",
- " for c in sentence[1]:\n",
- " if c not in char2idx:\n",
- " char2idx[c] = char_idx\n",
- " char_idx += 1\n",
- " if sentence[7] not in tag2idx:\n",
- " tag2idx[sentence[7]] = tag_idx\n",
- " tag_idx += 1\n",
- " if sentence[1] not in word2idx:\n",
- " word2idx[sentence[1]] = word_idx\n",
- " word_idx += 1\n",
- " temp_word.append(word2idx[sentence[1]])\n",
- " temp_depend.append(int(sentence[6]))\n",
- " temp_label.append(tag2idx[sentence[7]])\n",
- " temp_sentence.append(sentence[1])\n",
- " else:\n",
- " words.append(temp_word)\n",
- " depends.append(temp_depend)\n",
- " labels.append(temp_label)\n",
- " sentences.append(temp_sentence)\n",
- " temp_word = []\n",
- " temp_depend = []\n",
- " temp_label = []\n",
- " temp_sentence = []\n",
- " return sentences[:-1], words[:-1], depends[:-1], labels[:-1]\n",
- " \n",
- "sentences, words, depends, labels = process_corpus(corpus)\n",
- "sentences_test, words_test, depends_test, labels_test = process_corpus(corpus_test)"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": 4,
- "metadata": {},
- "outputs": [
- {
- "name": "stderr",
- "output_type": "stream",
- "text": [
- "Using TensorFlow backend.\n"
- ]
- }
- ],
- "source": [
- "from keras.preprocessing.sequence import pad_sequences"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": 5,
- "metadata": {},
- "outputs": [],
- "source": [
- "words = pad_sequences(words,padding='post')\n",
- "depends = pad_sequences(depends,padding='post')\n",
- "labels = pad_sequences(labels,padding='post')\n",
- "\n",
- "words_test = pad_sequences(words_test,padding='post')\n",
- "depends_test = pad_sequences(depends_test,padding='post')\n",
- "labels_test = pad_sequences(labels_test,padding='post')"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": 6,
- "metadata": {},
- "outputs": [
- {
- "data": {
- "text/plain": [
- "(1700, 118)"
- ]
- },
- "execution_count": 6,
- "metadata": {},
- "output_type": "execute_result"
- }
- ],
- "source": [
- "words_test.shape"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": 7,
- "metadata": {},
- "outputs": [],
- "source": [
- "def generate_char_seq(batch, UNK = 2):\n",
- " maxlen_c = max([len(k) for k in batch])\n",
- " x = [[len(i) for i in k] for k in batch]\n",
- " maxlen = max([j for i in x for j in i])\n",
- " temp = np.zeros((len(batch),maxlen_c,maxlen),dtype=np.int32)\n",
- " for i in range(len(batch)):\n",
- " for k in range(len(batch[i])):\n",
- " for no, c in enumerate(batch[i][k]):\n",
- " temp[i,k,-1-no] = char2idx.get(c, UNK)\n",
- " return temp"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": 8,
- "metadata": {},
- "outputs": [],
- "source": [
- "idx2word = {idx: tag for tag, idx in word2idx.items()}\n",
- "idx2tag = {i: w for w, i in tag2idx.items()}\n",
- "\n",
- "train_X = words\n",
- "train_Y = labels\n",
- "train_depends = depends\n",
- "train_char = generate_char_seq(sentences)\n",
- "\n",
- "test_X = words_test\n",
- "test_Y = labels_test\n",
- "test_depends = depends_test\n",
- "test_char = generate_char_seq(sentences_test)"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": 9,
- "metadata": {},
- "outputs": [],
- "source": [
- "class Model:\n",
- " def __init__(\n",
- " self,\n",
- " dim_word,\n",
- " dim_char,\n",
- " dropout,\n",
- " learning_rate,\n",
- " hidden_size_char,\n",
- " hidden_size_word,\n",
- " num_layers,\n",
- " maxlen\n",
- " ):\n",
- " def cells(size, reuse = False):\n",
- " return tf.contrib.rnn.DropoutWrapper(\n",
- " tf.nn.rnn_cell.LSTMCell(\n",
- " size,\n",
- " initializer = tf.orthogonal_initializer(),\n",
- " reuse = reuse,\n",
- " ),\n",
- " output_keep_prob = dropout,\n",
- " )\n",
- "\n",
- " def bahdanau(embedded, size):\n",
- " attention_mechanism = tf.contrib.seq2seq.BahdanauAttention(\n",
- " num_units = hidden_size_word, memory = embedded\n",
- " )\n",
- " return tf.contrib.seq2seq.AttentionWrapper(\n",
- " cell = cells(hidden_size_word),\n",
- " attention_mechanism = attention_mechanism,\n",
- " attention_layer_size = hidden_size_word,\n",
- " )\n",
- "\n",
- " self.word_ids = tf.placeholder(tf.int32, shape = [None, None])\n",
- " self.char_ids = tf.placeholder(tf.int32, shape = [None, None, None])\n",
- " self.labels = tf.placeholder(tf.int32, shape = [None, None])\n",
- " self.depends = tf.placeholder(tf.int32, shape = [None, None])\n",
- " self.maxlen = tf.shape(self.word_ids)[1]\n",
- " self.lengths = tf.count_nonzero(self.word_ids, 1)\n",
- "\n",
- " self.word_embeddings = tf.Variable(\n",
- " tf.truncated_normal(\n",
- " [len(word2idx), dim_word], stddev = 1.0 / np.sqrt(dim_word)\n",
- " )\n",
- " )\n",
- " self.char_embeddings = tf.Variable(\n",
- " tf.truncated_normal(\n",
- " [len(char2idx), dim_char], stddev = 1.0 / np.sqrt(dim_char)\n",
- " )\n",
- " )\n",
- "\n",
- " word_embedded = tf.nn.embedding_lookup(\n",
- " self.word_embeddings, self.word_ids\n",
- " )\n",
- " char_embedded = tf.nn.embedding_lookup(\n",
- " self.char_embeddings, self.char_ids\n",
- " )\n",
- " s = tf.shape(char_embedded)\n",
- " char_embedded = tf.reshape(\n",
- " char_embedded, shape = [s[0] * s[1], s[-2], dim_char]\n",
- " )\n",
- "\n",
- " for n in range(num_layers):\n",
- " (out_fw, out_bw), (\n",
- " state_fw,\n",
- " state_bw,\n",
- " ) = tf.nn.bidirectional_dynamic_rnn(\n",
- " cell_fw = cells(hidden_size_char),\n",
- " cell_bw = cells(hidden_size_char),\n",
- " inputs = char_embedded,\n",
- " dtype = tf.float32,\n",
- " scope = 'bidirectional_rnn_char_%d' % (n),\n",
- " )\n",
- " char_embedded = tf.concat((out_fw, out_bw), 2)\n",
- " output = tf.reshape(\n",
- " char_embedded[:, -1], shape = [s[0], s[1], 2 * hidden_size_char]\n",
- " )\n",
- " word_embedded = tf.concat([word_embedded, output], axis = -1)\n",
- "\n",
- " for n in range(num_layers):\n",
- " (out_fw, out_bw), (\n",
- " state_fw,\n",
- " state_bw,\n",
- " ) = tf.nn.bidirectional_dynamic_rnn(\n",
- " cell_fw = bahdanau(word_embedded, hidden_size_word),\n",
- " cell_bw = bahdanau(word_embedded, hidden_size_word),\n",
- " inputs = word_embedded,\n",
- " dtype = tf.float32,\n",
- " scope = 'bidirectional_rnn_word_%d' % (n),\n",
- " )\n",
- " word_embedded = tf.concat((out_fw, out_bw), 2)\n",
- "\n",
- " logits = tf.layers.dense(word_embedded, len(idx2tag))\n",
- " logits_depends = tf.layers.dense(word_embedded, maxlen)\n",
- " log_likelihood, transition_params = tf.contrib.crf.crf_log_likelihood(\n",
- " logits, self.labels, self.lengths\n",
- " )\n",
- " with tf.variable_scope(\"depends\"):\n",
- " log_likelihood_depends, transition_params_depends = tf.contrib.crf.crf_log_likelihood(\n",
- " logits_depends, self.depends, self.lengths\n",
- " )\n",
- " self.cost = tf.reduce_mean(-log_likelihood) + tf.reduce_mean(-log_likelihood_depends)\n",
- " self.optimizer = tf.train.AdamOptimizer(\n",
- " learning_rate = learning_rate\n",
- " ).minimize(self.cost)\n",
- " \n",
- " mask = tf.sequence_mask(self.lengths, maxlen = self.maxlen)\n",
- " \n",
- " self.tags_seq, _ = tf.contrib.crf.crf_decode(\n",
- " logits, transition_params, self.lengths\n",
- " )\n",
- " self.tags_seq_depends, _ = tf.contrib.crf.crf_decode(\n",
- " logits_depends, transition_params_depends, self.lengths\n",
- " )\n",
- "\n",
- " self.prediction = tf.boolean_mask(self.tags_seq, mask)\n",
- " mask_label = tf.boolean_mask(self.labels, mask)\n",
- " correct_pred = tf.equal(self.prediction, mask_label)\n",
- " correct_index = tf.cast(correct_pred, tf.float32)\n",
- " self.accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32))\n",
- " \n",
- " self.prediction = tf.boolean_mask(self.tags_seq_depends, mask)\n",
- " mask_label = tf.boolean_mask(self.depends, mask)\n",
- " correct_pred = tf.equal(self.prediction, mask_label)\n",
- " correct_index = tf.cast(correct_pred, tf.float32)\n",
- " self.accuracy_depends = tf.reduce_mean(tf.cast(correct_pred, tf.float32))"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": 10,
- "metadata": {},
- "outputs": [
- {
- "name": "stderr",
- "output_type": "stream",
- "text": [
- "/usr/local/lib/python3.6/dist-packages/tensorflow/python/ops/gradients_impl.py:100: UserWarning: Converting sparse IndexedSlices to a dense Tensor of unknown shape. This may consume a large amount of memory.\n",
- " \"Converting sparse IndexedSlices to a dense Tensor of unknown shape. \"\n"
- ]
- }
- ],
- "source": [
- "tf.reset_default_graph()\n",
- "sess = tf.InteractiveSession()\n",
- "\n",
- "dim_word = 128\n",
- "dim_char = 256\n",
- "dropout = 1\n",
- "learning_rate = 1e-3\n",
- "hidden_size_char = 64\n",
- "hidden_size_word = 64\n",
- "num_layers = 2\n",
- "batch_size = 32\n",
- "\n",
- "model = Model(dim_word,dim_char,dropout,learning_rate,hidden_size_char,hidden_size_word,num_layers,\n",
- " words.shape[1])\n",
- "sess.run(tf.global_variables_initializer())"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": 11,
- "metadata": {},
- "outputs": [
- {
- "name": "stderr",
- "output_type": "stream",
- "text": [
- "train minibatch loop: 100%|██████████| 76/76 [00:43<00:00, 1.90it/s, accuracy=0.123, accuracy_depends=0.116, cost=104] \n",
- "test minibatch loop: 100%|██████████| 54/54 [00:16<00:00, 3.61it/s, accuracy=0.136, accuracy_depends=0.0273, cost=168]\n",
- "train minibatch loop: 0%| | 0/76 [00:00, ?it/s]"
- ]
- },
- {
- "name": "stdout",
- "output_type": "stream",
- "text": [
- "time taken: 60.34970307350159\n",
- "epoch: 0, training loss: 149.379215, training acc: 0.132985, training depends: 0.079643, valid loss: 144.880309, valid acc: 0.144134, valid depends: 0.090478\n",
- "\n"
- ]
- },
- {
- "name": "stderr",
- "output_type": "stream",
- "text": [
- "train minibatch loop: 100%|██████████| 76/76 [00:43<00:00, 1.92it/s, accuracy=0.233, accuracy_depends=0.137, cost=95.3]\n",
- "test minibatch loop: 100%|██████████| 54/54 [00:16<00:00, 3.67it/s, accuracy=0.255, accuracy_depends=0.0909, cost=152]\n",
- "train minibatch loop: 0%| | 0/76 [00:00, ?it/s]"
- ]
- },
- {
- "name": "stdout",
- "output_type": "stream",
- "text": [
- "time taken: 59.71914076805115\n",
- "epoch: 1, training loss: 132.388958, training acc: 0.186336, training depends: 0.126305, valid loss: 132.800756, valid acc: 0.259966, valid depends: 0.107971\n",
- "\n"
- ]
- },
- {
- "name": "stderr",
- "output_type": "stream",
- "text": [
- "train minibatch loop: 100%|██████████| 76/76 [00:42<00:00, 1.93it/s, accuracy=0.483, accuracy_depends=0.219, cost=74.4]\n",
- "test minibatch loop: 100%|██████████| 54/54 [00:16<00:00, 3.65it/s, accuracy=0.527, accuracy_depends=0.155, cost=123] \n",
- "train minibatch loop: 0%| | 0/76 [00:00, ?it/s]"
- ]
- },
- {
- "name": "stdout",
- "output_type": "stream",
- "text": [
- "time taken: 59.5471887588501\n",
- "epoch: 2, training loss: 111.379293, training acc: 0.398462, training depends: 0.154883, valid loss: 106.319645, valid acc: 0.507343, valid depends: 0.159132\n",
- "\n"
- ]
- },
- {
- "name": "stderr",
- "output_type": "stream",
- "text": [
- "train minibatch loop: 100%|██████████| 76/76 [00:43<00:00, 1.89it/s, accuracy=0.637, accuracy_depends=0.226, cost=64.3]\n",
- "test minibatch loop: 100%|██████████| 54/54 [00:16<00:00, 3.58it/s, accuracy=0.6, accuracy_depends=0.118, cost=111] \n",
- "train minibatch loop: 0%| | 0/76 [00:00, ?it/s]"
- ]
- },
- {
- "name": "stdout",
- "output_type": "stream",
- "text": [
- "time taken: 59.70253324508667\n",
- "epoch: 3, training loss: 93.394260, training acc: 0.580188, training depends: 0.196789, valid loss: 98.123219, valid acc: 0.622438, valid depends: 0.158794\n",
- "\n"
- ]
- },
- {
- "name": "stderr",
- "output_type": "stream",
- "text": [
- "train minibatch loop: 100%|██████████| 76/76 [00:42<00:00, 1.91it/s, accuracy=0.661, accuracy_depends=0.295, cost=55] \n",
- "test minibatch loop: 100%|██████████| 54/54 [00:16<00:00, 3.63it/s, accuracy=0.645, accuracy_depends=0.218, cost=103] \n",
- "train minibatch loop: 0%| | 0/76 [00:00, ?it/s]"
- ]
- },
- {
- "name": "stdout",
- "output_type": "stream",
- "text": [
- "time taken: 59.64599561691284\n",
- "epoch: 4, training loss: 81.789298, training acc: 0.674215, training depends: 0.241653, valid loss: 89.319501, valid acc: 0.659905, valid depends: 0.208392\n",
- "\n"
- ]
- },
- {
- "name": "stderr",
- "output_type": "stream",
- "text": [
- "train minibatch loop: 100%|██████████| 76/76 [00:42<00:00, 1.94it/s, accuracy=0.76, accuracy_depends=0.397, cost=46.4] \n",
- "test minibatch loop: 100%|██████████| 54/54 [00:16<00:00, 3.63it/s, accuracy=0.718, accuracy_depends=0.2, cost=92] \n",
- "train minibatch loop: 0%| | 0/76 [00:00, ?it/s]"
- ]
- },
- {
- "name": "stdout",
- "output_type": "stream",
- "text": [
- "time taken: 59.38800311088562\n",
- "epoch: 5, training loss: 72.712886, training acc: 0.737575, training depends: 0.290594, valid loss: 82.248631, valid acc: 0.704317, valid depends: 0.227678\n",
- "\n"
- ]
- },
- {
- "name": "stderr",
- "output_type": "stream",
- "text": [
- "train minibatch loop: 100%|██████████| 76/76 [00:42<00:00, 1.94it/s, accuracy=0.798, accuracy_depends=0.445, cost=42.7]\n",
- "test minibatch loop: 100%|██████████| 54/54 [00:16<00:00, 3.60it/s, accuracy=0.727, accuracy_depends=0.155, cost=93.7]\n",
- "train minibatch loop: 0%| | 0/76 [00:00, ?it/s]"
- ]
- },
- {
- "name": "stdout",
- "output_type": "stream",
- "text": [
- "time taken: 59.42586350440979\n",
- "epoch: 6, training loss: 64.976823, training acc: 0.780481, training depends: 0.352966, valid loss: 81.418071, valid acc: 0.729733, valid depends: 0.216033\n",
- "\n"
- ]
- },
- {
- "name": "stderr",
- "output_type": "stream",
- "text": [
- "train minibatch loop: 100%|██████████| 76/76 [00:42<00:00, 1.96it/s, accuracy=0.87, accuracy_depends=0.558, cost=34.8] \n",
- "test minibatch loop: 100%|██████████| 54/54 [00:16<00:00, 3.65it/s, accuracy=0.782, accuracy_depends=0.145, cost=86.3]\n",
- "train minibatch loop: 0%| | 0/76 [00:00, ?it/s]"
- ]
- },
- {
- "name": "stdout",
- "output_type": "stream",
- "text": [
- "time taken: 59.26801133155823\n",
- "epoch: 7, training loss: 57.875818, training acc: 0.807750, training depends: 0.420946, valid loss: 81.698430, valid acc: 0.744215, valid depends: 0.215680\n",
- "\n"
- ]
- },
- {
- "name": "stderr",
- "output_type": "stream",
- "text": [
- "train minibatch loop: 100%|██████████| 76/76 [00:42<00:00, 1.95it/s, accuracy=0.877, accuracy_depends=0.555, cost=30.3]\n",
- "test minibatch loop: 100%|██████████| 54/54 [00:16<00:00, 3.67it/s, accuracy=0.782, accuracy_depends=0.227, cost=82.4]\n",
- "train minibatch loop: 0%| | 0/76 [00:00, ?it/s]"
- ]
- },
- {
- "name": "stdout",
- "output_type": "stream",
- "text": [
- "time taken: 59.17950391769409\n",
- "epoch: 8, training loss: 51.506652, training acc: 0.834453, training depends: 0.481950, valid loss: 81.583055, valid acc: 0.754682, valid depends: 0.230223\n",
- "\n"
- ]
- },
- {
- "name": "stderr",
- "output_type": "stream",
- "text": [
- "train minibatch loop: 100%|██████████| 76/76 [00:42<00:00, 1.97it/s, accuracy=0.89, accuracy_depends=0.599, cost=29.8] \n",
- "test minibatch loop: 100%|██████████| 54/54 [00:16<00:00, 3.59it/s, accuracy=0.773, accuracy_depends=0.273, cost=87.6]\n",
- "train minibatch loop: 0%| | 0/76 [00:00, ?it/s]"
- ]
- },
- {
- "name": "stdout",
- "output_type": "stream",
- "text": [
- "time taken: 58.80722999572754\n",
- "epoch: 9, training loss: 47.002005, training acc: 0.853911, training depends: 0.516468, valid loss: 83.256975, valid acc: 0.755200, valid depends: 0.228200\n",
- "\n"
- ]
- },
- {
- "name": "stderr",
- "output_type": "stream",
- "text": [
- "train minibatch loop: 100%|██████████| 76/76 [00:42<00:00, 1.97it/s, accuracy=0.925, accuracy_depends=0.682, cost=23.8]\n",
- "test minibatch loop: 100%|██████████| 54/54 [00:16<00:00, 3.69it/s, accuracy=0.764, accuracy_depends=0.236, cost=84.4]\n",
- "train minibatch loop: 0%| | 0/76 [00:00, ?it/s]"
- ]
- },
- {
- "name": "stdout",
- "output_type": "stream",
- "text": [
- "time taken: 58.78147864341736\n",
- "epoch: 10, training loss: 42.517333, training acc: 0.874978, training depends: 0.566414, valid loss: 80.450278, valid acc: 0.765323, valid depends: 0.249620\n",
- "\n"
- ]
- },
- {
- "name": "stderr",
- "output_type": "stream",
- "text": [
- "train minibatch loop: 100%|██████████| 76/76 [00:42<00:00, 1.94it/s, accuracy=0.945, accuracy_depends=0.678, cost=21.3]\n",
- "test minibatch loop: 100%|██████████| 54/54 [00:16<00:00, 3.62it/s, accuracy=0.791, accuracy_depends=0.191, cost=88] \n",
- "train minibatch loop: 0%| | 0/76 [00:00, ?it/s]"
- ]
- },
- {
- "name": "stdout",
- "output_type": "stream",
- "text": [
- "time taken: 58.979615449905396\n",
- "epoch: 11, training loss: 38.750908, training acc: 0.887077, training depends: 0.602463, valid loss: 82.313864, valid acc: 0.777407, valid depends: 0.247966\n",
- "\n"
- ]
- },
- {
- "name": "stderr",
- "output_type": "stream",
- "text": [
- "train minibatch loop: 100%|██████████| 76/76 [00:42<00:00, 1.97it/s, accuracy=0.945, accuracy_depends=0.702, cost=19] \n",
- "test minibatch loop: 100%|██████████| 54/54 [00:16<00:00, 3.68it/s, accuracy=0.755, accuracy_depends=0.227, cost=99.1]\n",
- "train minibatch loop: 0%| | 0/76 [00:00, ?it/s]"
- ]
- },
- {
- "name": "stdout",
- "output_type": "stream",
- "text": [
- "time taken: 58.794678926467896\n",
- "epoch: 12, training loss: 35.110486, training acc: 0.897572, training depends: 0.640705, valid loss: 92.571763, valid acc: 0.765969, valid depends: 0.220016\n",
- "\n"
- ]
- },
- {
- "name": "stderr",
- "output_type": "stream",
- "text": [
- "train minibatch loop: 100%|██████████| 76/76 [00:42<00:00, 1.97it/s, accuracy=0.952, accuracy_depends=0.771, cost=17.5]\n",
- "test minibatch loop: 100%|██████████| 54/54 [00:16<00:00, 3.67it/s, accuracy=0.773, accuracy_depends=0.227, cost=93.7]\n",
- "train minibatch loop: 0%| | 0/76 [00:00, ?it/s]"
- ]
- },
- {
- "name": "stdout",
- "output_type": "stream",
- "text": [
- "time taken: 58.75972008705139\n",
- "epoch: 13, training loss: 33.335737, training acc: 0.906613, training depends: 0.654030, valid loss: 92.200707, valid acc: 0.771328, valid depends: 0.246652\n",
- "\n"
- ]
- },
- {
- "name": "stderr",
- "output_type": "stream",
- "text": [
- "train minibatch loop: 100%|██████████| 76/76 [00:42<00:00, 1.97it/s, accuracy=0.945, accuracy_depends=0.812, cost=15.1]\n",
- "test minibatch loop: 100%|██████████| 54/54 [00:16<00:00, 3.70it/s, accuracy=0.782, accuracy_depends=0.245, cost=92.2]\n",
- "train minibatch loop: 0%| | 0/76 [00:00, ?it/s]"
- ]
- },
- {
- "name": "stdout",
- "output_type": "stream",
- "text": [
- "time taken: 58.76455545425415\n",
- "epoch: 14, training loss: 29.911946, training acc: 0.915421, training depends: 0.693428, valid loss: 87.522663, valid acc: 0.782953, valid depends: 0.262049\n",
- "\n"
- ]
- },
- {
- "name": "stderr",
- "output_type": "stream",
- "text": [
- "train minibatch loop: 100%|██████████| 76/76 [00:42<00:00, 1.96it/s, accuracy=0.966, accuracy_depends=0.812, cost=13] \n",
- "test minibatch loop: 100%|██████████| 54/54 [00:16<00:00, 3.71it/s, accuracy=0.791, accuracy_depends=0.291, cost=89.7]\n",
- "train minibatch loop: 0%| | 0/76 [00:00, ?it/s]"
- ]
- },
- {
- "name": "stdout",
- "output_type": "stream",
- "text": [
- "time taken: 58.7827091217041\n",
- "epoch: 15, training loss: 27.855397, training acc: 0.924138, training depends: 0.715602, valid loss: 89.768037, valid acc: 0.789311, valid depends: 0.263630\n",
- "\n"
- ]
- },
- {
- "name": "stderr",
- "output_type": "stream",
- "text": [
- "train minibatch loop: 100%|██████████| 76/76 [00:42<00:00, 1.97it/s, accuracy=0.935, accuracy_depends=0.788, cost=13.2]\n",
- "test minibatch loop: 100%|██████████| 54/54 [00:16<00:00, 3.69it/s, accuracy=0.791, accuracy_depends=0.291, cost=86.2]\n",
- "train minibatch loop: 0%| | 0/76 [00:00, ?it/s]"
- ]
- },
- {
- "name": "stdout",
- "output_type": "stream",
- "text": [
- "time taken: 58.92595195770264\n",
- "epoch: 16, training loss: 26.030449, training acc: 0.934033, training depends: 0.725757, valid loss: 92.311703, valid acc: 0.784540, valid depends: 0.263717\n",
- "\n"
- ]
- },
- {
- "name": "stderr",
- "output_type": "stream",
- "text": [
- "train minibatch loop: 100%|██████████| 76/76 [00:42<00:00, 1.95it/s, accuracy=0.966, accuracy_depends=0.818, cost=12.7]\n",
- "test minibatch loop: 100%|██████████| 54/54 [00:16<00:00, 3.68it/s, accuracy=0.791, accuracy_depends=0.291, cost=86.6]\n",
- "train minibatch loop: 0%| | 0/76 [00:00, ?it/s]"
- ]
- },
- {
- "name": "stdout",
- "output_type": "stream",
- "text": [
- "time taken: 58.923388719558716\n",
- "epoch: 17, training loss: 24.009113, training acc: 0.943932, training depends: 0.744273, valid loss: 99.844222, valid acc: 0.776442, valid depends: 0.237053\n",
- "\n"
- ]
- },
- {
- "name": "stderr",
- "output_type": "stream",
- "text": [
- "train minibatch loop: 100%|██████████| 76/76 [00:42<00:00, 1.95it/s, accuracy=0.969, accuracy_depends=0.795, cost=12] \n",
- "test minibatch loop: 100%|██████████| 54/54 [00:16<00:00, 3.65it/s, accuracy=0.8, accuracy_depends=0.218, cost=93.2] \n",
- "train minibatch loop: 0%| | 0/76 [00:00, ?it/s]"
- ]
- },
- {
- "name": "stdout",
- "output_type": "stream",
- "text": [
- "time taken: 59.0040009021759\n",
- "epoch: 18, training loss: 21.788654, training acc: 0.950376, training depends: 0.768208, valid loss: 101.069921, valid acc: 0.784996, valid depends: 0.249907\n",
- "\n"
- ]
- },
- {
- "name": "stderr",
- "output_type": "stream",
- "text": [
- "train minibatch loop: 100%|██████████| 76/76 [00:42<00:00, 1.94it/s, accuracy=0.976, accuracy_depends=0.829, cost=11.7]\n",
- "test minibatch loop: 100%|██████████| 54/54 [00:16<00:00, 3.67it/s, accuracy=0.809, accuracy_depends=0.245, cost=101] "
- ]
- },
- {
- "name": "stdout",
- "output_type": "stream",
- "text": [
- "time taken: 59.10275650024414\n",
- "epoch: 19, training loss: 20.282661, training acc: 0.956831, training depends: 0.780196, valid loss: 104.031246, valid acc: 0.784366, valid depends: 0.267584\n",
- "\n"
- ]
- },
- {
- "name": "stderr",
- "output_type": "stream",
- "text": [
- "\n"
- ]
- }
- ],
- "source": [
- "import time\n",
- "\n",
- "for e in range(20):\n",
- " lasttime = time.time()\n",
- " train_acc, train_loss, test_acc, test_loss, train_acc_depends, test_acc_depends = 0, 0, 0, 0, 0, 0\n",
- " pbar = tqdm(\n",
- " range(0, len(train_X), batch_size), desc = 'train minibatch loop'\n",
- " )\n",
- " for i in pbar:\n",
- " batch_x = train_X[i : min(i + batch_size, train_X.shape[0])]\n",
- " batch_char = train_char[i : min(i + batch_size, train_X.shape[0])]\n",
- " batch_y = train_Y[i : min(i + batch_size, train_X.shape[0])]\n",
- " batch_depends = train_depends[i : min(i + batch_size, train_X.shape[0])]\n",
- " acc_depends, acc, cost, _ = sess.run(\n",
- " [model.accuracy_depends, model.accuracy, model.cost, model.optimizer],\n",
- " feed_dict = {\n",
- " model.word_ids: batch_x,\n",
- " model.char_ids: batch_char,\n",
- " model.labels: batch_y,\n",
- " model.depends: batch_depends\n",
- " },\n",
- " )\n",
- " assert not np.isnan(cost)\n",
- " train_loss += cost\n",
- " train_acc += acc\n",
- " train_acc_depends += acc_depends\n",
- " pbar.set_postfix(cost = cost, accuracy = acc, accuracy_depends = acc_depends)\n",
- " \n",
- " pbar = tqdm(\n",
- " range(0, len(test_X), batch_size), desc = 'test minibatch loop'\n",
- " )\n",
- " for i in pbar:\n",
- " batch_x = test_X[i : min(i + batch_size, test_X.shape[0])]\n",
- " batch_char = test_char[i : min(i + batch_size, test_X.shape[0])]\n",
- " batch_y = test_Y[i : min(i + batch_size, test_X.shape[0])]\n",
- " batch_depends = test_depends[i : min(i + batch_size, test_X.shape[0])]\n",
- " acc_depends, acc, cost = sess.run(\n",
- " [model.accuracy_depends, model.accuracy, model.cost],\n",
- " feed_dict = {\n",
- " model.word_ids: batch_x,\n",
- " model.char_ids: batch_char,\n",
- " model.labels: batch_y,\n",
- " model.depends: batch_depends\n",
- " },\n",
- " )\n",
- " assert not np.isnan(cost)\n",
- " test_loss += cost\n",
- " test_acc += acc\n",
- " test_acc_depends += acc_depends\n",
- " pbar.set_postfix(cost = cost, accuracy = acc, accuracy_depends = acc_depends)\n",
- " \n",
- " train_loss /= len(train_X) / batch_size\n",
- " train_acc /= len(train_X) / batch_size\n",
- " train_acc_depends /= len(train_X) / batch_size\n",
- " test_loss /= len(test_X) / batch_size\n",
- " test_acc /= len(test_X) / batch_size\n",
- " test_acc_depends /= len(test_X) / batch_size\n",
- "\n",
- " print('time taken:', time.time() - lasttime)\n",
- " print(\n",
- " 'epoch: %d, training loss: %f, training acc: %f, training depends: %f, valid loss: %f, valid acc: %f, valid depends: %f\\n'\n",
- " % (e, train_loss, train_acc, train_acc_depends, test_loss, test_acc, test_acc_depends)\n",
- " )"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": 12,
- "metadata": {},
- "outputs": [],
- "source": [
- "seq, deps = sess.run([model.tags_seq, model.tags_seq_depends],\n",
- " feed_dict={model.word_ids:batch_x[:1],\n",
- " model.char_ids:batch_char[:1]})"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": 13,
- "metadata": {},
- "outputs": [],
- "source": [
- "seq = seq[0]\n",
- "deps = deps[0]"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": 14,
- "metadata": {},
- "outputs": [
- {
- "data": {
- "text/plain": [
- "array([18, 19, 2, 6, 3, 7, 16, 18, 23, 20, 19, 2], dtype=int32)"
- ]
- },
- "execution_count": 14,
- "metadata": {},
- "output_type": "execute_result"
- }
- ],
- "source": [
- "seq[seq>0]"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": 15,
- "metadata": {},
- "outputs": [
- {
- "data": {
- "text/plain": [
- "array([18, 19, 2, 6, 3, 7, 16, 18, 23, 20, 19, 2], dtype=int32)"
- ]
- },
- "execution_count": 15,
- "metadata": {},
- "output_type": "execute_result"
- }
- ],
- "source": [
- "batch_y[0][seq>0]"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": 16,
- "metadata": {},
- "outputs": [
- {
- "data": {
- "text/plain": [
- "array([ 2, 3, 3, 5, 5, 0, 5, 11, 11, 11, 8, 3], dtype=int32)"
- ]
- },
- "execution_count": 16,
- "metadata": {},
- "output_type": "execute_result"
- }
- ],
- "source": [
- "deps[seq>0]"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": 17,
- "metadata": {},
- "outputs": [
- {
- "data": {
- "text/plain": [
- "array([ 2, 6, 6, 5, 6, 0, 6, 11, 11, 11, 6, 6], dtype=int32)"
- ]
- },
- "execution_count": 17,
- "metadata": {},
- "output_type": "execute_result"
- }
- ],
- "source": [
- "batch_depends[0][seq>0]"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "metadata": {},
- "outputs": [],
- "source": []
- }
- ],
- "metadata": {
- "kernelspec": {
- "display_name": "Python 3",
- "language": "python",
- "name": "python3"
- },
- "language_info": {
- "codemirror_mode": {
- "name": "ipython",
- "version": 3
- },
- "file_extension": ".py",
- "mimetype": "text/x-python",
- "name": "python",
- "nbconvert_exporter": "python",
- "pygments_lexer": "ipython3",
- "version": "3.6.8"
- }
- },
- "nbformat": 4,
- "nbformat_minor": 2
-}
diff --git a/dependency-parser/1.lstm-birnn-crf-biaffine.ipynb b/dependency-parser/1.lstm-birnn-crf-biaffine.ipynb
new file mode 100644
index 0000000..fc63018
--- /dev/null
+++ b/dependency-parser/1.lstm-birnn-crf-biaffine.ipynb
@@ -0,0 +1 @@
+{"cells":[{"metadata":{"_uuid":"8f2839f25d086af736a60e9eeb907d3b93b6e0e5","_cell_guid":"b1076dfc-b9ad-4769-8c92-a6c4dae69d19","trusted":true},"cell_type":"code","source":"!wget https://raw.githubusercontent.com/UniversalDependencies/UD_English-EWT/master/en_ewt-ud-dev.conllu\n!wget https://raw.githubusercontent.com/UniversalDependencies/UD_English-EWT/master/en_ewt-ud-train.conllu\n!wget https://raw.githubusercontent.com/UniversalDependencies/UD_English-EWT/master/en_ewt-ud-test.conllu\n!pip install malaya -U","execution_count":1,"outputs":[{"output_type":"stream","text":"--2019-09-30 05:05:04-- https://raw.githubusercontent.com/UniversalDependencies/UD_English-EWT/master/en_ewt-ud-dev.conllu\nResolving raw.githubusercontent.com (raw.githubusercontent.com)... 151.101.0.133, 151.101.64.133, 151.101.128.133, ...\nConnecting to raw.githubusercontent.com (raw.githubusercontent.com)|151.101.0.133|:443... connected.\nHTTP request sent, awaiting response... 200 OK\nLength: 1668174 (1.6M) [text/plain]\nSaving to: ‘en_ewt-ud-dev.conllu’\n\nen_ewt-ud-dev.conll 100%[===================>] 1.59M --.-KB/s in 0.05s \n\n2019-09-30 05:05:05 (30.7 MB/s) - ‘en_ewt-ud-dev.conllu’ saved [1668174/1668174]\n\n--2019-09-30 05:05:05-- https://raw.githubusercontent.com/UniversalDependencies/UD_English-EWT/master/en_ewt-ud-train.conllu\nResolving raw.githubusercontent.com (raw.githubusercontent.com)... 151.101.0.133, 151.101.64.133, 151.101.128.133, ...\nConnecting to raw.githubusercontent.com (raw.githubusercontent.com)|151.101.0.133|:443... connected.\nHTTP request sent, awaiting response... 200 OK\nLength: 13303045 (13M) [text/plain]\nSaving to: ‘en_ewt-ud-train.conllu’\n\nen_ewt-ud-train.con 100%[===================>] 12.69M --.-KB/s in 0.1s \n\n2019-09-30 05:05:06 (102 MB/s) - ‘en_ewt-ud-train.conllu’ saved [13303045/13303045]\n\n--2019-09-30 05:05:06-- https://raw.githubusercontent.com/UniversalDependencies/UD_English-EWT/master/en_ewt-ud-test.conllu\nResolving raw.githubusercontent.com (raw.githubusercontent.com)... 151.101.0.133, 151.101.64.133, 151.101.128.133, ...\nConnecting to raw.githubusercontent.com (raw.githubusercontent.com)|151.101.0.133|:443... connected.\nHTTP request sent, awaiting response... 200 OK\nLength: 1661985 (1.6M) [text/plain]\nSaving to: ‘en_ewt-ud-test.conllu’\n\nen_ewt-ud-test.conl 100%[===================>] 1.58M --.-KB/s in 0.05s \n\n2019-09-30 05:05:07 (31.0 MB/s) - ‘en_ewt-ud-test.conllu’ saved [1661985/1661985]\n\nCollecting malaya\n\u001b[?25l Downloading https://files.pythonhosted.org/packages/b1/11/5f8ea8da94136d1fb4db39931d4ed55ae51655a3212b33e5bf607271646e/malaya-2.7.7.0-py3-none-any.whl (2.1MB)\n\u001b[K |████████████████████████████████| 2.1MB 4.9MB/s eta 0:00:01\n\u001b[?25hRequirement already satisfied, skipping upgrade: scipy in /opt/conda/lib/python3.6/site-packages (from malaya) (1.2.1)\nCollecting dateparser (from malaya)\n\u001b[?25l Downloading https://files.pythonhosted.org/packages/82/9d/51126ac615bbc4418478d725a5fa1a0f112059f6f111e4b48cfbe17ef9d0/dateparser-0.7.2-py2.py3-none-any.whl (352kB)\n\u001b[K |████████████████████████████████| 358kB 43.4MB/s eta 0:00:01\n\u001b[?25hRequirement already satisfied, skipping upgrade: sentencepiece in /opt/conda/lib/python3.6/site-packages (from malaya) (0.1.83)\nCollecting bert-tensorflow (from malaya)\n\u001b[?25l Downloading https://files.pythonhosted.org/packages/a6/66/7eb4e8b6ea35b7cc54c322c816f976167a43019750279a8473d355800a93/bert_tensorflow-1.0.1-py2.py3-none-any.whl (67kB)\n\u001b[K |████████████████████████████████| 71kB 26.2MB/s eta 0:00:01\n\u001b[?25hRequirement already satisfied, skipping upgrade: tensorflow in /opt/conda/lib/python3.6/site-packages (from malaya) (1.14.0)\nRequirement already satisfied, skipping upgrade: networkx in /opt/conda/lib/python3.6/site-packages (from malaya) (2.3)\nRequirement already satisfied, skipping upgrade: ftfy in /opt/conda/lib/python3.6/site-packages (from malaya) (5.6)\nRequirement already satisfied, skipping upgrade: xgboost in /opt/conda/lib/python3.6/site-packages (from malaya) (0.90)\nRequirement already satisfied, skipping upgrade: sklearn in /opt/conda/lib/python3.6/site-packages (from malaya) (0.0)\nRequirement already satisfied, skipping upgrade: requests in /opt/conda/lib/python3.6/site-packages (from malaya) (2.22.0)\nCollecting PySastrawi (from malaya)\n\u001b[?25l Downloading https://files.pythonhosted.org/packages/61/84/b0a5454a040f81e81e6a95a5d5635f20ad43cc0c288f8b4966b339084962/PySastrawi-1.2.0-py2.py3-none-any.whl (210kB)\n\u001b[K |████████████████████████████████| 215kB 44.6MB/s eta 0:00:01\n\u001b[?25hRequirement already satisfied, skipping upgrade: scikit-learn in /opt/conda/lib/python3.6/site-packages (from malaya) (0.21.3)\nRequirement already satisfied, skipping upgrade: numpy in /opt/conda/lib/python3.6/site-packages (from malaya) (1.16.4)\nRequirement already satisfied, skipping upgrade: unidecode in /opt/conda/lib/python3.6/site-packages (from malaya) (1.1.1)\nRequirement already satisfied, skipping upgrade: tzlocal in /opt/conda/lib/python3.6/site-packages (from dateparser->malaya) (2.0.0)\nRequirement already satisfied, skipping upgrade: pytz in /opt/conda/lib/python3.6/site-packages (from dateparser->malaya) (2019.2)\nRequirement already satisfied, skipping upgrade: regex in /opt/conda/lib/python3.6/site-packages (from dateparser->malaya) (2019.8.19)\nRequirement already satisfied, skipping upgrade: python-dateutil in /opt/conda/lib/python3.6/site-packages (from dateparser->malaya) (2.8.0)\nRequirement already satisfied, skipping upgrade: six in /opt/conda/lib/python3.6/site-packages (from bert-tensorflow->malaya) (1.12.0)\nRequirement already satisfied, skipping upgrade: wrapt>=1.11.1 in /opt/conda/lib/python3.6/site-packages (from tensorflow->malaya) (1.11.2)\nRequirement already satisfied, skipping upgrade: astor>=0.6.0 in /opt/conda/lib/python3.6/site-packages (from tensorflow->malaya) (0.8.0)\nRequirement already satisfied, skipping upgrade: wheel>=0.26 in /opt/conda/lib/python3.6/site-packages (from tensorflow->malaya) (0.33.6)\nRequirement already satisfied, skipping upgrade: absl-py>=0.7.0 in /opt/conda/lib/python3.6/site-packages (from tensorflow->malaya) (0.8.0)\nRequirement already satisfied, skipping upgrade: gast>=0.2.0 in /opt/conda/lib/python3.6/site-packages (from tensorflow->malaya) (0.3.2)\nRequirement already satisfied, skipping upgrade: keras-applications>=1.0.6 in /opt/conda/lib/python3.6/site-packages (from tensorflow->malaya) (1.0.8)\nRequirement already satisfied, skipping upgrade: termcolor>=1.1.0 in /opt/conda/lib/python3.6/site-packages (from tensorflow->malaya) (1.1.0)\nRequirement already satisfied, skipping upgrade: tensorflow-estimator<1.15.0rc0,>=1.14.0rc0 in /opt/conda/lib/python3.6/site-packages (from tensorflow->malaya) (1.14.0)\nRequirement already satisfied, skipping upgrade: grpcio>=1.8.6 in /opt/conda/lib/python3.6/site-packages (from tensorflow->malaya) (1.24.0)\nRequirement already satisfied, skipping upgrade: keras-preprocessing>=1.0.5 in /opt/conda/lib/python3.6/site-packages (from tensorflow->malaya) (1.1.0)\nRequirement already satisfied, skipping upgrade: protobuf>=3.6.1 in /opt/conda/lib/python3.6/site-packages (from tensorflow->malaya) (3.7.1)\nRequirement already satisfied, skipping upgrade: tensorboard<1.15.0,>=1.14.0 in /opt/conda/lib/python3.6/site-packages (from tensorflow->malaya) (1.14.0)\nRequirement already satisfied, skipping upgrade: google-pasta>=0.1.6 in /opt/conda/lib/python3.6/site-packages (from tensorflow->malaya) (0.1.7)\nRequirement already satisfied, skipping upgrade: decorator>=4.3.0 in /opt/conda/lib/python3.6/site-packages (from networkx->malaya) (4.4.0)\nRequirement already satisfied, skipping upgrade: wcwidth in /opt/conda/lib/python3.6/site-packages (from ftfy->malaya) (0.1.7)\nRequirement already satisfied, skipping upgrade: chardet<3.1.0,>=3.0.2 in /opt/conda/lib/python3.6/site-packages (from requests->malaya) (3.0.4)\nRequirement already satisfied, skipping upgrade: certifi>=2017.4.17 in /opt/conda/lib/python3.6/site-packages (from requests->malaya) (2019.9.11)\nRequirement already satisfied, skipping upgrade: idna<2.9,>=2.5 in /opt/conda/lib/python3.6/site-packages (from requests->malaya) (2.8)\nRequirement already satisfied, skipping upgrade: urllib3!=1.25.0,!=1.25.1,<1.26,>=1.21.1 in /opt/conda/lib/python3.6/site-packages (from requests->malaya) (1.24.2)\nRequirement already satisfied, skipping upgrade: joblib>=0.11 in /opt/conda/lib/python3.6/site-packages (from scikit-learn->malaya) (0.13.2)\nRequirement already satisfied, skipping upgrade: h5py in /opt/conda/lib/python3.6/site-packages (from keras-applications>=1.0.6->tensorflow->malaya) (2.9.0)\nRequirement already satisfied, skipping upgrade: setuptools in /opt/conda/lib/python3.6/site-packages (from protobuf>=3.6.1->tensorflow->malaya) (41.2.0)\nRequirement already satisfied, skipping upgrade: markdown>=2.6.8 in /opt/conda/lib/python3.6/site-packages (from tensorboard<1.15.0,>=1.14.0->tensorflow->malaya) (3.1.1)\nRequirement already satisfied, skipping upgrade: werkzeug>=0.11.15 in /opt/conda/lib/python3.6/site-packages (from tensorboard<1.15.0,>=1.14.0->tensorflow->malaya) (0.16.0)\n","name":"stdout"},{"output_type":"stream","text":"Installing collected packages: dateparser, bert-tensorflow, PySastrawi, malaya\nSuccessfully installed PySastrawi-1.2.0 bert-tensorflow-1.0.1 dateparser-0.7.2 malaya-2.7.7.0\n","name":"stdout"}]},{"metadata":{"_uuid":"d629ff2d2480ee46fbb7e2d37f6b5fab8052498a","_cell_guid":"79c7e3d0-c299-4dcb-8224-4455121ee9b0","trusted":true},"cell_type":"code","source":"import malaya\nimport re\nfrom malaya.texts._text_functions import split_into_sentences\nfrom malaya.texts import _regex\nimport numpy as np\nimport itertools\nimport tensorflow as tf\nfrom tensorflow.keras.preprocessing.sequence import pad_sequences\n\ntokenizer = malaya.preprocessing._tokenizer\nsplitter = split_into_sentences","execution_count":2,"outputs":[{"output_type":"stream","text":"not found any version, deleting previous version models..\n","name":"stdout"}]},{"metadata":{"trusted":true},"cell_type":"code","source":"def is_number_regex(s):\n if re.match(\"^\\d+?\\.\\d+?$\", s) is None:\n return s.isdigit()\n return True\n\ndef preprocessing(w):\n if is_number_regex(w):\n return ''\n elif re.match(_regex._money, w):\n return ''\n elif re.match(_regex._date, w):\n return ''\n elif re.match(_regex._expressions['email'], w):\n return ''\n elif re.match(_regex._expressions['url'], w):\n return ''\n else:\n w = ''.join(''.join(s)[:2] for _, s in itertools.groupby(w))\n return w","execution_count":3,"outputs":[]},{"metadata":{"trusted":true},"cell_type":"code","source":"word2idx = {'PAD': 0,'UNK':1, '_ROOT': 2}\ntag2idx = {'PAD': 0, '_': 1}\nchar2idx = {'PAD': 0,'UNK':1, '_ROOT': 2}\nword_idx = 3\ntag_idx = 2\nchar_idx = 3\n\nspecial_tokens = ['', '', '', '', '']\n\nfor t in special_tokens:\n word2idx[t] = word_idx\n word_idx += 1\n char2idx[t] = char_idx\n char_idx += 1\n \nword2idx, char2idx","execution_count":4,"outputs":[{"output_type":"execute_result","execution_count":4,"data":{"text/plain":"({'PAD': 0,\n 'UNK': 1,\n '_ROOT': 2,\n '': 3,\n '': 4,\n '': 5,\n '': 6,\n '': 7},\n {'PAD': 0,\n 'UNK': 1,\n '_ROOT': 2,\n '': 3,\n '': 4,\n '': 5,\n '': 6,\n '': 7})"},"metadata":{}}]},{"metadata":{"trusted":true},"cell_type":"code","source":"PAD = \"_PAD\"\nPAD_POS = \"_PAD_POS\"\nPAD_TYPE = \"_\"\nPAD_CHAR = \"_PAD_CHAR\"\nROOT = \"_ROOT\"\nROOT_POS = \"_ROOT_POS\"\nROOT_TYPE = \"_\"\nROOT_CHAR = \"_ROOT_CHAR\"\nEND = \"_END\"\nEND_POS = \"_END_POS\"\nEND_TYPE = \"_\"\nEND_CHAR = \"_END_CHAR\"\n\ndef process_corpus(corpus, until = None):\n global word2idx, tag2idx, char2idx, word_idx, tag_idx, char_idx\n sentences, words, depends, labels, pos, chars = [], [], [], [], [], []\n temp_sentence, temp_word, temp_depend, temp_label, temp_pos = [], [], [], [], []\n first_time = True\n for sentence in corpus:\n try:\n if len(sentence):\n if sentence[0] == '#':\n continue\n if first_time:\n print(sentence)\n first_time = False\n sentence = sentence.split('\\t')\n for c in sentence[1]:\n if c not in char2idx:\n char2idx[c] = char_idx\n char_idx += 1\n if sentence[7] not in tag2idx:\n tag2idx[sentence[7]] = tag_idx\n tag_idx += 1\n sentence[1] = preprocessing(sentence[1])\n if sentence[1] not in word2idx:\n word2idx[sentence[1]] = word_idx\n word_idx += 1\n temp_word.append(word2idx[sentence[1]])\n temp_depend.append(int(sentence[6]))\n temp_label.append(tag2idx[sentence[7]])\n temp_sentence.append(sentence[1])\n temp_pos.append(sentence[3])\n else:\n if len(temp_sentence) < 2 or len(temp_word) != len(temp_label):\n temp_word = []\n temp_depend = []\n temp_label = []\n temp_sentence = []\n temp_pos = []\n continue\n words.append(temp_word)\n depends.append(temp_depend)\n labels.append(temp_label)\n sentences.append( temp_sentence)\n pos.append(temp_pos)\n char_ = [[char2idx['_ROOT']]]\n for w in temp_sentence:\n if w in char2idx:\n char_.append([char2idx[w]])\n else:\n char_.append([char2idx[c] for c in w])\n chars.append(char_)\n temp_word = []\n temp_depend = []\n temp_label = []\n temp_sentence = []\n temp_pos = []\n except Exception as e:\n print(e, sentence)\n return sentences[:-1], words[:-1], depends[:-1], labels[:-1], pos[:-1], chars[:-1]","execution_count":5,"outputs":[]},{"metadata":{"trusted":true},"cell_type":"code","source":"with open('en_ewt-ud-dev.conllu') as fopen:\n dev = fopen.read().split('\\n')\n\nsentences_dev, words_dev, depends_dev, labels_dev, _, _ = process_corpus(dev)","execution_count":6,"outputs":[{"output_type":"stream","text":"1\tFrom\tfrom\tADP\tIN\t_\t3\tcase\t3:case\t_\ninvalid literal for int() with base 10: '_' ['10.1', 'has', 'have', 'VERB', 'VBZ', '_', '_', '_', '8:parataxis', 'CopyOf=-1']\ninvalid literal for int() with base 10: '_' ['21.1', 'has', 'have', 'VERB', 'VBZ', '_', '_', '_', '16:conj:and', 'CopyOf=-1']\n","name":"stdout"}]},{"metadata":{"trusted":true},"cell_type":"code","source":"with open('en_ewt-ud-test.conllu') as fopen:\n test = fopen.read().split('\\n')\n\nsentences_test, words_test, depends_test, labels_test, _, _ = process_corpus(test)\nsentences_test.extend(sentences_dev)\nwords_test.extend(words_dev)\ndepends_test.extend(depends_dev)\nlabels_test.extend(labels_dev)","execution_count":7,"outputs":[{"output_type":"stream","text":"1\tWhat\twhat\tPRON\tWP\tPronType=Int\t0\troot\t0:root\t_\ninvalid literal for int() with base 10: '_' ['24.1', 'left', 'left', 'VERB', 'VBN', 'Tense=Past|VerbForm=Part', '_', '_', '6:parataxis', 'CopyOf=6']\n","name":"stdout"}]},{"metadata":{"trusted":true},"cell_type":"code","source":"with open('en_ewt-ud-train.conllu') as fopen:\n train = fopen.read().split('\\n')\n\nsentences_train, words_train, depends_train, labels_train, _, _ = process_corpus(train)","execution_count":8,"outputs":[{"output_type":"stream","text":"1\tAl\tAl\tPROPN\tNNP\tNumber=Sing\t0\troot\t0:root\tSpaceAfter=No\ninvalid literal for int() with base 10: '_' ['8.1', 'reported', 'report', 'VERB', 'VBN', 'Tense=Past|VerbForm=Part|Voice=Pass', '_', '_', '5:conj:and', 'CopyOf=5']\ninvalid literal for int() with base 10: '_' ['22.1', 'used', 'use', 'VERB', 'VBN', 'Tense=Past|VerbForm=Part', '_', '_', '13:advcl:with|17:conj:and', 'CopyOf=17']\ninvalid literal for int() with base 10: '_' ['22.1', 'used', 'use', 'VERB', 'VBN', 'Tense=Past|VerbForm=Part', '_', '_', '13:advcl:with|17:conj:and', 'CopyOf=17']\ninvalid literal for int() with base 10: '_' ['11.1', 'called', 'call', 'VERB', 'VBN', 'Tense=Past|VerbForm=Part|Voice=Pass', '_', '_', '3:conj:and', 'CopyOf=3']\ninvalid literal for int() with base 10: '_' ['14.1', 'is', 'be', 'VERB', 'VBZ', '_', '_', '_', '1:conj:and', 'CopyOf=1']\ninvalid literal for int() with base 10: '_' ['20.1', 'reflect', 'reflect', 'VERB', 'VBP', 'Mood=Ind|Tense=Pres|VerbForm=Fin', '_', '_', '7:acl:relcl|9:conj', 'CopyOf=9']\ninvalid literal for int() with base 10: '_' ['21.1', 'recruited', 'recruit', 'VERB', 'VBD', 'Mood=Ind|Tense=Past|VerbForm=Fin', '_', '_', '9:conj:and', 'CopyOf=9']\ninvalid literal for int() with base 10: '_' ['9.1', 'wish', 'wish', 'VERB', 'VBP', 'Mood=Ind|Tense=Pres|VerbForm=Fin', '_', '_', '2:conj:and', 'CopyOf=2']\ninvalid literal for int() with base 10: '_' ['38.1', 'supplied', 'supply', 'VERB', 'VBN', 'Tense=Past|VerbForm=Part|Voice=Pass', '_', '_', '16:conj:and', 'CopyOf=16']\ninvalid literal for int() with base 10: '_' ['18.1', 'keep', 'keep', 'VERB', 'VB', 'Mood=Imp|VerbForm=Fin', '_', '_', '14:conj:and', 'CopyOf=14']\ninvalid literal for int() with base 10: '_' ['21.1', 'keep', 'keep', 'VERB', 'VB', 'Mood=Imp|VerbForm=Fin', '_', '_', '14:conj:and', 'CopyOf=14']\ninvalid literal for int() with base 10: '_' ['18.1', 'mean', 'mean', 'VERB', 'VB', 'VerbForm=Inf', '_', '_', '8:conj', 'CopyOf=8']\ninvalid literal for int() with base 10: '_' ['30.1', 'play', 'play', 'VERB', 'VBP', 'Mood=Ind|Tense=Pres|VerbForm=Fin', '_', '_', '18:acl:relcl|27:conj:but', 'CopyOf=27']\ninvalid literal for int() with base 10: '_' ['22.1', 'have', 'have', 'VERB', 'VBP', 'Mood=Ind|Tense=Pres|VerbForm=Fin', '_', '_', '17:conj', 'CopyOf=17']\ninvalid literal for int() with base 10: '_' ['27.1', 'have', 'have', 'VERB', 'VBP', 'Mood=Ind|Tense=Pres|VerbForm=Fin', '_', '_', '17:conj', 'CopyOf=17']\ninvalid literal for int() with base 10: '_' ['49.1', 'helped', 'help', 'VERB', 'VBD', '_', '_', '_', '38:conj:but', 'CopyOf=38']\ninvalid literal for int() with base 10: '_' ['7.1', 'found', 'find', 'VERB', 'VBD', 'Mood=Ind|Tense=Past|VerbForm=Fin', '_', '_', '3:conj', 'CopyOf=3']\ninvalid literal for int() with base 10: '_' ['10.1', 'excited', 'excited', 'ADJ', 'JJ', 'Degree=Pos', '_', '_', '4:advcl', 'CopyOf=4']\ninvalid literal for int() with base 10: '_' ['15.1', \"'s\", 'be', 'VERB', 'VBZ', '_', '_', '_', '2:conj:and', 'CopyOf=2']\ninvalid literal for int() with base 10: '_' ['25.1', 'took', 'take', 'VERB', 'VBD', 'Mood=Ind|Tense=Past|VerbForm=Fin', '_', '_', '17:conj:and', 'CopyOf=17']\ninvalid literal for int() with base 10: '_' ['10.1', 'loss', 'lose', 'VERB', 'VBD', 'Mood=Ind|Tense=Past|VerbForm=Fin', '_', '_', '3:conj:and', 'CopyOf=3']\ninvalid literal for int() with base 10: '_' ['11.1', 'leave', 'leave', 'VERB', 'VB', 'VerbForm=Inf', '_', '_', '7:parataxis', 'CopyOf=7']\ninvalid literal for int() with base 10: '_' ['24.1', 'charge', 'charge', 'VERB', 'VBP', 'Mood=Ind|Tense=Pres|VerbForm=Fin', '_', '_', '16:conj:and', 'CopyOf=16']\n","name":"stdout"}]},{"metadata":{"trusted":true},"cell_type":"code","source":"len(sentences_train), len(sentences_test)","execution_count":9,"outputs":[{"output_type":"execute_result","execution_count":9,"data":{"text/plain":"(12000, 3824)"},"metadata":{}}]},{"metadata":{"trusted":true},"cell_type":"code","source":"idx2word = {v:k for k, v in word2idx.items()}\nidx2tag = {v:k for k, v in tag2idx.items()}\nlen(idx2word)","execution_count":10,"outputs":[{"output_type":"execute_result","execution_count":10,"data":{"text/plain":"21974"},"metadata":{}}]},{"metadata":{"trusted":true},"cell_type":"code","source":"def generate_char_seq(batch, UNK = 2):\n maxlen_c = max([len(k) for k in batch])\n x = [[len(i) for i in k] for k in batch]\n maxlen = max([j for i in x for j in i])\n temp = np.zeros((len(batch),maxlen_c,maxlen),dtype=np.int32)\n for i in range(len(batch)):\n for k in range(len(batch[i])):\n for no, c in enumerate(batch[i][k]):\n temp[i,k,-1-no] = char2idx.get(c, UNK)\n return temp","execution_count":11,"outputs":[]},{"metadata":{"trusted":true},"cell_type":"code","source":"generate_char_seq(sentences_train[:5]).shape","execution_count":12,"outputs":[{"output_type":"execute_result","execution_count":12,"data":{"text/plain":"(5, 36, 11)"},"metadata":{}}]},{"metadata":{"trusted":true},"cell_type":"code","source":"pad_sequences(words_train[:5],padding='post').shape","execution_count":13,"outputs":[{"output_type":"execute_result","execution_count":13,"data":{"text/plain":"(5, 36)"},"metadata":{}}]},{"metadata":{"trusted":true},"cell_type":"code","source":"train_X = words_train\ntrain_Y = labels_train\ntrain_depends = depends_train\ntrain_char = sentences_train\n\ntest_X = words_test\ntest_Y = labels_test\ntest_depends = depends_test\ntest_char = sentences_test","execution_count":14,"outputs":[]},{"metadata":{"trusted":true},"cell_type":"code","source":"class BiAAttention:\n def __init__(self, input_size_encoder, input_size_decoder, num_labels):\n self.input_size_encoder = input_size_encoder\n self.input_size_decoder = input_size_decoder\n self.num_labels = num_labels\n \n self.W_d = tf.get_variable(\"W_d\", shape=[self.num_labels, self.input_size_decoder],\n initializer=tf.contrib.layers.xavier_initializer())\n self.W_e = tf.get_variable(\"W_e\", shape=[self.num_labels, self.input_size_encoder],\n initializer=tf.contrib.layers.xavier_initializer())\n self.U = tf.get_variable(\"U\", shape=[self.num_labels, self.input_size_decoder, self.input_size_encoder],\n initializer=tf.contrib.layers.xavier_initializer())\n \n def forward(self, input_d, input_e, mask_d=None, mask_e=None):\n batch = tf.shape(input_d)[0]\n length_decoder = tf.shape(input_d)[1]\n length_encoder = tf.shape(input_e)[1]\n out_d = tf.expand_dims(tf.matmul(self.W_d, tf.transpose(input_d, [0, 2, 1])), 3)\n out_e = tf.expand_dims(tf.matmul(self.W_e, tf.transpose(input_e, [0, 2, 1])), 2)\n output = tf.matmul(tf.expand_dims(input_d, 1), self.U)\n output = tf.matmul(output, tf.transpose(tf.expand_dims(input_e, 1), [0, 1, 3, 2]))\n \n output = output + out_d + out_e\n \n if mask_d is not None:\n d = tf.expand_dims(tf.expand_dims(mask_d, 1), 3)\n e = tf.expand_dims(tf.expand_dims(mask_e, 1), 2)\n output = output * d * e\n \n return output\n\nclass Model:\n def __init__(\n self,\n dim_word,\n dim_char,\n dropout,\n learning_rate,\n hidden_size_char,\n hidden_size_word,\n num_layers\n ):\n def cells(size, reuse = False):\n return tf.contrib.rnn.DropoutWrapper(\n tf.nn.rnn_cell.LSTMCell(\n size,\n initializer = tf.orthogonal_initializer(),\n reuse = reuse,\n ),\n output_keep_prob = dropout,\n )\n \n self.word_ids = tf.placeholder(tf.int32, shape = [None, None])\n self.char_ids = tf.placeholder(tf.int32, shape = [None, None, None])\n self.labels = tf.placeholder(tf.int32, shape = [None, None])\n self.depends = tf.placeholder(tf.int32, shape = [None, None])\n self.maxlen = tf.shape(self.word_ids)[1]\n self.lengths = tf.count_nonzero(self.word_ids, 1)\n self.mask = tf.math.not_equal(self.word_ids, 0)\n float_mask = tf.cast(self.mask, tf.float32)\n \n self.arc_h = tf.layers.Dense(hidden_size_word)\n self.arc_c = tf.layers.Dense(hidden_size_word)\n self.attention = BiAAttention(hidden_size_word, hidden_size_word, 1)\n\n self.word_embeddings = tf.Variable(\n tf.truncated_normal(\n [len(word2idx), dim_word], stddev = 1.0 / np.sqrt(dim_word)\n )\n )\n self.char_embeddings = tf.Variable(\n tf.truncated_normal(\n [len(char2idx), dim_char], stddev = 1.0 / np.sqrt(dim_char)\n )\n )\n\n word_embedded = tf.nn.embedding_lookup(\n self.word_embeddings, self.word_ids\n )\n char_embedded = tf.nn.embedding_lookup(\n self.char_embeddings, self.char_ids\n )\n s = tf.shape(char_embedded)\n char_embedded = tf.reshape(\n char_embedded, shape = [s[0] * s[1], s[-2], dim_char]\n )\n\n for n in range(num_layers):\n (out_fw, out_bw), (\n state_fw,\n state_bw,\n ) = tf.nn.bidirectional_dynamic_rnn(\n cell_fw = cells(hidden_size_char),\n cell_bw = cells(hidden_size_char),\n inputs = char_embedded,\n dtype = tf.float32,\n scope = 'bidirectional_rnn_char_%d' % (n),\n )\n char_embedded = tf.concat((out_fw, out_bw), 2)\n output = tf.reshape(\n char_embedded[:, -1], shape = [s[0], s[1], 2 * hidden_size_char]\n )\n word_embedded = tf.concat([word_embedded, output], axis = -1)\n\n for n in range(num_layers):\n (out_fw, out_bw), (\n state_fw,\n state_bw,\n ) = tf.nn.bidirectional_dynamic_rnn(\n cell_fw = cells(hidden_size_word),\n cell_bw = cells(hidden_size_word),\n inputs = word_embedded,\n dtype = tf.float32,\n scope = 'bidirectional_rnn_word_%d' % (n),\n )\n word_embedded = tf.concat((out_fw, out_bw), 2)\n\n logits = tf.layers.dense(word_embedded, len(idx2tag))\n log_likelihood, transition_params = tf.contrib.crf.crf_log_likelihood(\n logits, self.labels, self.lengths\n )\n arc_h = tf.nn.elu(self.arc_h(word_embedded))\n arc_c = tf.nn.elu(self.arc_c(word_embedded))\n out_arc = tf.squeeze(self.attention.forward(arc_h, arc_h, mask_d=float_mask, mask_e=float_mask), axis = 1)\n \n batch = tf.shape(out_arc)[0]\n batch_index = tf.range(0, batch)\n max_len = tf.shape(out_arc)[1]\n sec_max_len = tf.shape(out_arc)[2]\n \n minus_inf = -1e8\n minus_mask = (1 - float_mask) * minus_inf\n out_arc = out_arc + tf.expand_dims(minus_mask, axis = 2) + tf.expand_dims(minus_mask, axis = 1)\n loss_arc = tf.nn.log_softmax(out_arc, dim=1)\n loss_arc = loss_arc * tf.expand_dims(float_mask, axis = 2) * tf.expand_dims(float_mask, axis = 1)\n num = tf.reduce_sum(float_mask) - tf.cast(batch, tf.float32)\n \n child_index = tf.tile(tf.expand_dims(tf.range(0, max_len), 1), [1, batch])\n t = tf.transpose(self.depends)\n broadcasted = tf.broadcast_to(batch_index, tf.shape(t))\n concatenated = tf.transpose(tf.concat([tf.expand_dims(broadcasted, axis = 0),\n tf.expand_dims(t, axis = 0),\n tf.expand_dims(child_index, axis = 0)], axis = 0))\n loss_arc = tf.gather_nd(loss_arc, concatenated)\n loss_arc = tf.transpose(loss_arc, [1, 0])[1:]\n \n loss_arc = tf.reduce_sum(-loss_arc) / num\n \n self.cost = tf.reduce_mean(-log_likelihood) + loss_arc\n \n self.optimizer = tf.train.AdamOptimizer(\n learning_rate = learning_rate\n ).minimize(self.cost)\n \n mask = tf.sequence_mask(self.lengths, maxlen = self.maxlen)\n \n self.tags_seq, _ = tf.contrib.crf.crf_decode(\n logits, transition_params, self.lengths\n )\n \n out_arc = out_arc + tf.linalg.diag(tf.fill([max_len], -np.inf))\n minus_mask = tf.expand_dims(tf.cast(1.0 - float_mask, tf.bool), axis = 2)\n minus_mask = tf.tile(minus_mask, [1, 1, sec_max_len])\n out_arc = tf.where(minus_mask, tf.fill(tf.shape(out_arc), -np.inf), out_arc)\n self.heads = tf.argmax(out_arc, axis = 1)\n \n self.prediction = tf.boolean_mask(self.tags_seq, mask)\n mask_label = tf.boolean_mask(self.labels, mask)\n correct_pred = tf.equal(self.prediction, mask_label)\n correct_index = tf.cast(correct_pred, tf.float32)\n self.accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32))\n \n self.prediction = tf.cast(tf.boolean_mask(self.heads, mask), tf.int32)\n mask_label = tf.boolean_mask(self.depends, mask)\n correct_pred = tf.equal(self.prediction, mask_label)\n correct_index = tf.cast(correct_pred, tf.float32)\n self.accuracy_depends = tf.reduce_mean(tf.cast(correct_pred, tf.float32))","execution_count":15,"outputs":[]},{"metadata":{"trusted":true},"cell_type":"code","source":"tf.reset_default_graph()\nsess = tf.InteractiveSession()\n\ndim_word = 128\ndim_char = 256\ndropout = 1.0\nlearning_rate = 1e-3\nhidden_size_char = 128\nhidden_size_word = 128\nnum_layers = 2\n\nmodel = Model(dim_word,dim_char,dropout,learning_rate,hidden_size_char,hidden_size_word,num_layers)\nsess.run(tf.global_variables_initializer())","execution_count":16,"outputs":[{"output_type":"stream","text":"WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AttributeError: module 'gast' has no attribute 'Num'\nWARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AttributeError: module 'gast' has no attribute 'Num'\nWARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AttributeError: module 'gast' has no attribute 'Num'\nWARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AttributeError: module 'gast' has no attribute 'Num'\nWARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AttributeError: module 'gast' has no attribute 'Num'\nWARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AttributeError: module 'gast' has no attribute 'Num'\nWARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AttributeError: module 'gast' has no attribute 'Num'\nWARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AttributeError: module 'gast' has no attribute 'Num'\nWARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\nWARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\nWARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n","name":"stdout"}]},{"metadata":{"trusted":true},"cell_type":"code","source":"batch_x = train_X[:5]\nbatch_x = pad_sequences(batch_x,padding='post')\nbatch_char = train_char[:5]\nbatch_char = generate_char_seq(batch_char)\nbatch_y = train_Y[:5]\nbatch_y = pad_sequences(batch_y,padding='post')\nbatch_depends = train_depends[:5]\nbatch_depends = pad_sequences(batch_depends,padding='post')","execution_count":17,"outputs":[]},{"metadata":{"trusted":true},"cell_type":"code","source":"sess.run([model.accuracy, model.accuracy_depends, model.cost],\n feed_dict = {model.word_ids: batch_x,\n model.char_ids: batch_char,\n model.labels: batch_y,\n model.depends: batch_depends})","execution_count":18,"outputs":[{"output_type":"execute_result","execution_count":18,"data":{"text/plain":"[0.0, 0.00862069, 94.88574]"},"metadata":{}}]},{"metadata":{"trusted":true},"cell_type":"code","source":"from tqdm import tqdm\n\nbatch_size = 32\nepoch = 15\n\nfor e in range(epoch):\n train_acc, train_loss = [], []\n test_acc, test_loss = [], []\n train_acc_depends, test_acc_depends = [], []\n \n pbar = tqdm(\n range(0, len(train_X), batch_size), desc = 'train minibatch loop'\n )\n for i in pbar:\n index = min(i + batch_size, len(train_X))\n batch_x = train_X[i: index]\n batch_x = pad_sequences(batch_x,padding='post')\n batch_char = train_char[i: index]\n batch_char = generate_char_seq(batch_char)\n batch_y = train_Y[i: index]\n batch_y = pad_sequences(batch_y,padding='post')\n batch_depends = train_depends[i: index]\n batch_depends = pad_sequences(batch_depends,padding='post')\n \n acc_depends, acc, cost, _ = sess.run(\n [model.accuracy_depends, model.accuracy, model.cost, model.optimizer],\n feed_dict = {\n model.word_ids: batch_x,\n model.char_ids: batch_char,\n model.labels: batch_y,\n model.depends: batch_depends\n },\n )\n train_loss.append(cost)\n train_acc.append(acc)\n train_acc_depends.append(acc_depends)\n pbar.set_postfix(cost = cost, accuracy = acc, accuracy_depends = acc_depends)\n \n pbar = tqdm(\n range(0, len(test_X), batch_size), desc = 'test minibatch loop'\n )\n for i in pbar:\n index = min(i + batch_size, len(test_X))\n batch_x = test_X[i: index]\n batch_x = pad_sequences(batch_x,padding='post')\n batch_char = test_char[i: index]\n batch_char = generate_char_seq(batch_char)\n batch_y = test_Y[i: index]\n batch_y = pad_sequences(batch_y,padding='post')\n batch_depends = test_depends[i: index]\n batch_depends = pad_sequences(batch_depends,padding='post')\n \n acc_depends, acc, cost = sess.run(\n [model.accuracy_depends, model.accuracy, model.cost],\n feed_dict = {\n model.word_ids: batch_x,\n model.char_ids: batch_char,\n model.labels: batch_y,\n model.depends: batch_depends\n },\n )\n test_loss.append(cost)\n test_acc.append(acc)\n test_acc_depends.append(acc_depends)\n pbar.set_postfix(cost = cost, accuracy = acc, accuracy_depends = acc_depends)\n \n \n print(\n 'epoch: %d, training loss: %f, training acc: %f, training depends: %f, valid loss: %f, valid acc: %f, valid depends: %f\\n'\n % (e, np.mean(train_loss), \n np.mean(train_acc), \n np.mean(train_acc_depends), \n np.mean(test_loss), \n np.mean(test_acc), \n np.mean(test_acc_depends)\n ))\n ","execution_count":19,"outputs":[{"output_type":"stream","text":"train minibatch loop: 100%|██████████| 375/375 [02:06<00:00, 2.96it/s, accuracy=0.79, accuracy_depends=0.607, cost=17.6] \ntest minibatch loop: 100%|██████████| 120/120 [00:18<00:00, 6.36it/s, accuracy=0.822, accuracy_depends=0.599, cost=10.2]\ntrain minibatch loop: 0%| | 0/375 [00:00, ?it/s]","name":"stderr"},{"output_type":"stream","text":"epoch: 0, training loss: 30.011385, training acc: 0.547456, training depends: 0.384560, valid loss: 12.160900, valid acc: 0.763054, valid depends: 0.570995\n\n","name":"stdout"},{"output_type":"stream","text":"train minibatch loop: 100%|██████████| 375/375 [02:06<00:00, 2.97it/s, accuracy=0.871, accuracy_depends=0.731, cost=11.2]\ntest minibatch loop: 100%|██████████| 120/120 [00:17<00:00, 6.70it/s, accuracy=0.883, accuracy_depends=0.721, cost=7.19]\ntrain minibatch loop: 0%| | 0/375 [00:00, ?it/s]","name":"stderr"},{"output_type":"stream","text":"epoch: 1, training loss: 11.472513, training acc: 0.819827, training depends: 0.622879, valid loss: 8.814706, valid acc: 0.820314, valid depends: 0.651465\n\n","name":"stdout"},{"output_type":"stream","text":"train minibatch loop: 100%|██████████| 375/375 [02:04<00:00, 3.02it/s, accuracy=0.891, accuracy_depends=0.772, cost=8.89]\ntest minibatch loop: 100%|██████████| 120/120 [00:18<00:00, 6.66it/s, accuracy=0.899, accuracy_depends=0.773, cost=5.81]\ntrain minibatch loop: 0%| | 0/375 [00:00, ?it/s]","name":"stderr"},{"output_type":"stream","text":"epoch: 2, training loss: 7.686915, training acc: 0.879738, training depends: 0.695450, valid loss: 7.770745, valid acc: 0.841929, valid depends: 0.677420\n\n","name":"stdout"},{"output_type":"stream","text":"train minibatch loop: 100%|██████████| 375/375 [02:04<00:00, 3.02it/s, accuracy=0.928, accuracy_depends=0.792, cost=7.04]\ntest minibatch loop: 100%|██████████| 120/120 [00:18<00:00, 6.53it/s, accuracy=0.927, accuracy_depends=0.794, cost=5.42]\ntrain minibatch loop: 0%| | 0/375 [00:00, ?it/s]","name":"stderr"},{"output_type":"stream","text":"epoch: 3, training loss: 5.874665, training acc: 0.909947, training depends: 0.731291, valid loss: 7.536985, valid acc: 0.848511, valid depends: 0.688594\n\n","name":"stdout"},{"output_type":"stream","text":"train minibatch loop: 100%|██████████| 375/375 [02:05<00:00, 3.00it/s, accuracy=0.94, accuracy_depends=0.807, cost=5.93] \ntest minibatch loop: 100%|██████████| 120/120 [00:18<00:00, 6.54it/s, accuracy=0.907, accuracy_depends=0.794, cost=5.43]\ntrain minibatch loop: 0%| | 0/375 [00:00, ?it/s]","name":"stderr"},{"output_type":"stream","text":"epoch: 4, training loss: 4.713706, training acc: 0.930022, training depends: 0.755921, valid loss: 7.551268, valid acc: 0.853732, valid depends: 0.697372\n\n","name":"stdout"},{"output_type":"stream","text":"train minibatch loop: 100%|██████████| 375/375 [02:05<00:00, 2.99it/s, accuracy=0.948, accuracy_depends=0.823, cost=4.92]\ntest minibatch loop: 100%|██████████| 120/120 [00:18<00:00, 6.60it/s, accuracy=0.903, accuracy_depends=0.785, cost=5.59]\ntrain minibatch loop: 0%| | 0/375 [00:00, ?it/s]","name":"stderr"},{"output_type":"stream","text":"epoch: 5, training loss: 3.865628, training acc: 0.944412, training depends: 0.772768, valid loss: 7.787340, valid acc: 0.854716, valid depends: 0.699638\n\n","name":"stdout"},{"output_type":"stream","text":"train minibatch loop: 100%|██████████| 375/375 [02:04<00:00, 3.02it/s, accuracy=0.959, accuracy_depends=0.838, cost=3.99]\ntest minibatch loop: 100%|██████████| 120/120 [00:18<00:00, 6.34it/s, accuracy=0.903, accuracy_depends=0.789, cost=5.47]\ntrain minibatch loop: 0%| | 0/375 [00:00, ?it/s]","name":"stderr"},{"output_type":"stream","text":"epoch: 6, training loss: 3.208910, training acc: 0.955319, training depends: 0.786367, valid loss: 8.006297, valid acc: 0.856437, valid depends: 0.703963\n\n","name":"stdout"},{"output_type":"stream","text":"train minibatch loop: 100%|██████████| 375/375 [02:04<00:00, 3.01it/s, accuracy=0.968, accuracy_depends=0.832, cost=3.38] \ntest minibatch loop: 100%|██████████| 120/120 [00:18<00:00, 6.55it/s, accuracy=0.911, accuracy_depends=0.806, cost=5.37]\ntrain minibatch loop: 0%| | 0/375 [00:00, ?it/s]","name":"stderr"},{"output_type":"stream","text":"epoch: 7, training loss: 2.692029, training acc: 0.963750, training depends: 0.797338, valid loss: 8.316906, valid acc: 0.855605, valid depends: 0.705780\n\n","name":"stdout"},{"output_type":"stream","text":"train minibatch loop: 100%|██████████| 375/375 [02:05<00:00, 2.99it/s, accuracy=0.975, accuracy_depends=0.847, cost=2.66] \ntest minibatch loop: 100%|██████████| 120/120 [00:18<00:00, 6.64it/s, accuracy=0.915, accuracy_depends=0.806, cost=5.85]\ntrain minibatch loop: 0%| | 0/375 [00:00, ?it/s]","name":"stderr"},{"output_type":"stream","text":"epoch: 8, training loss: 2.280745, training acc: 0.969272, training depends: 0.806022, valid loss: 8.649180, valid acc: 0.855627, valid depends: 0.704731\n\n","name":"stdout"},{"output_type":"stream","text":"train minibatch loop: 100%|██████████| 375/375 [02:06<00:00, 2.97it/s, accuracy=0.979, accuracy_depends=0.844, cost=2.18] \ntest minibatch loop: 100%|██████████| 120/120 [00:18<00:00, 6.64it/s, accuracy=0.907, accuracy_depends=0.814, cost=6.05]\ntrain minibatch loop: 0%| | 0/375 [00:00, ?it/s]","name":"stderr"},{"output_type":"stream","text":"epoch: 9, training loss: 1.914802, training acc: 0.974550, training depends: 0.812909, valid loss: 8.974008, valid acc: 0.855271, valid depends: 0.705615\n\n","name":"stdout"},{"output_type":"stream","text":"train minibatch loop: 100%|██████████| 375/375 [02:04<00:00, 3.00it/s, accuracy=0.984, accuracy_depends=0.847, cost=1.87] \ntest minibatch loop: 100%|██████████| 120/120 [00:18<00:00, 6.57it/s, accuracy=0.911, accuracy_depends=0.81, cost=6.58] \ntrain minibatch loop: 0%| | 0/375 [00:00, ?it/s]","name":"stderr"},{"output_type":"stream","text":"epoch: 10, training loss: 1.551014, training acc: 0.980733, training depends: 0.820803, valid loss: 9.371287, valid acc: 0.854632, valid depends: 0.704477\n\n","name":"stdout"},{"output_type":"stream","text":"train minibatch loop: 100%|██████████| 375/375 [02:05<00:00, 2.98it/s, accuracy=0.983, accuracy_depends=0.854, cost=1.65] \ntest minibatch loop: 100%|██████████| 120/120 [00:18<00:00, 6.45it/s, accuracy=0.903, accuracy_depends=0.818, cost=6.59]\ntrain minibatch loop: 0%| | 0/375 [00:00, ?it/s]","name":"stderr"},{"output_type":"stream","text":"epoch: 11, training loss: 1.224350, training acc: 0.986435, training depends: 0.829505, valid loss: 9.807080, valid acc: 0.853969, valid depends: 0.706970\n\n","name":"stdout"},{"output_type":"stream","text":"train minibatch loop: 100%|██████████| 375/375 [02:06<00:00, 2.97it/s, accuracy=0.988, accuracy_depends=0.86, cost=1.34] \ntest minibatch loop: 100%|██████████| 120/120 [00:18<00:00, 6.54it/s, accuracy=0.899, accuracy_depends=0.818, cost=7.63]\ntrain minibatch loop: 0%| | 0/375 [00:00, ?it/s]","name":"stderr"},{"output_type":"stream","text":"epoch: 12, training loss: 0.981612, training acc: 0.989736, training depends: 0.837044, valid loss: 10.443762, valid acc: 0.851969, valid depends: 0.703867\n\n","name":"stdout"},{"output_type":"stream","text":"train minibatch loop: 100%|██████████| 375/375 [02:06<00:00, 2.97it/s, accuracy=0.993, accuracy_depends=0.854, cost=1.17] \ntest minibatch loop: 100%|██████████| 120/120 [00:18<00:00, 6.52it/s, accuracy=0.911, accuracy_depends=0.826, cost=6.97]\ntrain minibatch loop: 0%| | 0/375 [00:00, ?it/s]","name":"stderr"},{"output_type":"stream","text":"epoch: 13, training loss: 0.779776, training acc: 0.992719, training depends: 0.843190, valid loss: 10.895637, valid acc: 0.851806, valid depends: 0.703673\n\n","name":"stdout"},{"output_type":"stream","text":"train minibatch loop: 100%|██████████| 375/375 [02:05<00:00, 2.99it/s, accuracy=0.995, accuracy_depends=0.856, cost=0.899]\ntest minibatch loop: 100%|██████████| 120/120 [00:18<00:00, 6.52it/s, accuracy=0.907, accuracy_depends=0.81, cost=7.13] ","name":"stderr"},{"output_type":"stream","text":"epoch: 14, training loss: 0.616768, training acc: 0.994896, training depends: 0.848271, valid loss: 11.158718, valid acc: 0.853316, valid depends: 0.704753\n\n","name":"stdout"},{"output_type":"stream","text":"\n","name":"stderr"}]},{"metadata":{"trusted":true},"cell_type":"code","source":"def evaluate(heads_pred, types_pred, heads, types, lengths,\n symbolic_root=False, symbolic_end=False):\n batch_size, _ = words.shape\n ucorr = 0.\n lcorr = 0.\n total = 0.\n ucomplete_match = 0.\n lcomplete_match = 0.\n\n corr_root = 0.\n total_root = 0.\n start = 1 if symbolic_root else 0\n end = 1 if symbolic_end else 0\n for i in range(batch_size):\n ucm = 1.\n lcm = 1.\n for j in range(start, lengths[i] - end):\n\n total += 1\n if heads[i, j] == heads_pred[i, j]:\n ucorr += 1\n if types[i, j] == types_pred[i, j]:\n lcorr += 1\n else:\n lcm = 0\n else:\n ucm = 0\n lcm = 0\n\n if heads[i, j] == 0:\n total_root += 1\n corr_root += 1 if heads_pred[i, j] == 0 else 0\n\n ucomplete_match += ucm\n lcomplete_match += lcm\n\n return (ucorr, lcorr, total, ucomplete_match, lcomplete_match), \\\n (corr_root, total_root), batch_size","execution_count":20,"outputs":[]},{"metadata":{"trusted":true},"cell_type":"code","source":"tags_seq, heads = sess.run(\n [model.tags_seq, model.heads],\n feed_dict = {\n model.word_ids: batch_x,\n model.char_ids: batch_char\n },\n)\ntags_seq[0], heads[0], batch_depends[0]","execution_count":21,"outputs":[{"output_type":"execute_result","execution_count":21,"data":{"text/plain":"(array([40, 6, 22, 26, 23, 18, 16, 5, 3, 13, 10, 11, 6, 12, 13, 10, 16,\n 7, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,\n 0], dtype=int32),\n array([ 2, 8, 5, 5, 2, 8, 8, 0, 11, 11, 8, 14, 14, 8, 16, 14, 14,\n 8, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,\n 0]),\n array([ 2, 8, 5, 5, 2, 8, 8, 0, 11, 11, 8, 14, 14, 8, 16, 14, 14,\n 8, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,\n 0], dtype=int32))"},"metadata":{}}]},{"metadata":{"trusted":true},"cell_type":"code","source":"def evaluate(heads_pred, types_pred, heads, types, lengths,\n symbolic_root=False, symbolic_end=False):\n batch_size, _ = heads_pred.shape\n ucorr = 0.\n lcorr = 0.\n total = 0.\n ucomplete_match = 0.\n lcomplete_match = 0.\n\n corr_root = 0.\n total_root = 0.\n start = 1 if symbolic_root else 0\n end = 1 if symbolic_end else 0\n for i in range(batch_size):\n ucm = 1.\n lcm = 1.\n for j in range(start, lengths[i] - end):\n\n total += 1\n if heads[i, j] == heads_pred[i, j]:\n ucorr += 1\n if types[i, j] == types_pred[i, j]:\n lcorr += 1\n else:\n lcm = 0\n else:\n ucm = 0\n lcm = 0\n\n if heads[i, j] == 0:\n total_root += 1\n corr_root += 1 if heads_pred[i, j] == 0 else 0\n\n ucomplete_match += ucm\n lcomplete_match += lcm\n \n return ucorr / total, lcorr / total, corr_root / total_root","execution_count":22,"outputs":[]},{"metadata":{"trusted":true},"cell_type":"code","source":"arc_accuracy, type_accuracy, root_accuracy = evaluate(heads, tags_seq, batch_depends, batch_y, \n np.count_nonzero(batch_x, axis = 1))\narc_accuracy, type_accuracy, root_accuracy","execution_count":23,"outputs":[{"output_type":"execute_result","execution_count":23,"data":{"text/plain":"(0.8097165991902834, 0.7692307692307693, 0.8125)"},"metadata":{}}]},{"metadata":{"trusted":true},"cell_type":"code","source":"arcs, types, roots = [], [], []\n\npbar = tqdm(\n range(0, len(test_X), batch_size), desc = 'test minibatch loop'\n)\nfor i in pbar:\n index = min(i + batch_size, len(test_X))\n batch_x = test_X[i: index]\n batch_x = pad_sequences(batch_x,padding='post')\n batch_char = test_char[i: index]\n batch_char = generate_char_seq(batch_char)\n batch_y = test_Y[i: index]\n batch_y = pad_sequences(batch_y,padding='post')\n batch_depends = test_depends[i: index]\n batch_depends = pad_sequences(batch_depends,padding='post')\n \n tags_seq, heads = sess.run(\n [model.tags_seq, model.heads],\n feed_dict = {\n model.word_ids: batch_x,\n model.char_ids: batch_char\n },\n )\n \n arc_accuracy, type_accuracy, root_accuracy = evaluate(heads, tags_seq, batch_depends, batch_y, \n np.count_nonzero(batch_x, axis = 1))\n pbar.set_postfix(arc_accuracy = arc_accuracy, type_accuracy = type_accuracy, \n root_accuracy = root_accuracy)\n arcs.append(arc_accuracy)\n types.append(type_accuracy)\n roots.append(root_accuracy)","execution_count":24,"outputs":[{"output_type":"stream","text":"test minibatch loop: 100%|██████████| 120/120 [00:16<00:00, 7.09it/s, arc_accuracy=0.81, root_accuracy=0.812, type_accuracy=0.769] \n","name":"stderr"}]},{"metadata":{"trusted":true},"cell_type":"code","source":"print('arc accuracy:', np.mean(arcs))\nprint('types accuracy:', np.mean(types))\nprint('root accuracy:', np.mean(roots))","execution_count":25,"outputs":[{"output_type":"stream","text":"arc accuracy: 0.7047525228808802\ntypes accuracy: 0.6518708550802323\nroot accuracy: 0.6640625\n","name":"stdout"}]},{"metadata":{"trusted":true},"cell_type":"code","source":"","execution_count":null,"outputs":[]}],"metadata":{"kernelspec":{"language":"python","display_name":"Python 3","name":"python3"},"language_info":{"pygments_lexer":"ipython3","nbconvert_exporter":"python","version":"3.6.4","file_extension":".py","codemirror_mode":{"name":"ipython","version":3},"name":"python","mimetype":"text/x-python"}},"nbformat":4,"nbformat_minor":1}
\ No newline at end of file
diff --git a/dependency-parser/2.birnn-luong.ipynb b/dependency-parser/2.birnn-luong.ipynb
deleted file mode 100644
index a2da18f..0000000
--- a/dependency-parser/2.birnn-luong.ipynb
+++ /dev/null
@@ -1,899 +0,0 @@
-{
- "cells": [
- {
- "cell_type": "code",
- "execution_count": 1,
- "metadata": {},
- "outputs": [],
- "source": [
- "import tensorflow as tf\n",
- "from tqdm import tqdm\n",
- "import numpy as np"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": 2,
- "metadata": {},
- "outputs": [],
- "source": [
- "with open('test.conll.txt') as fopen:\n",
- " corpus = fopen.read().split('\\n')\n",
- " \n",
- "with open('dev.conll.txt') as fopen:\n",
- " corpus_test = fopen.read().split('\\n')"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": 3,
- "metadata": {},
- "outputs": [],
- "source": [
- "word2idx = {'PAD': 0,'NUM':1,'UNK':2}\n",
- "tag2idx = {'PAD': 0}\n",
- "char2idx = {'PAD': 0,'NUM':1,'UNK':2}\n",
- "word_idx = 3\n",
- "tag_idx = 1\n",
- "char_idx = 3\n",
- "\n",
- "def process_corpus(corpus, until = None):\n",
- " global word2idx, tag2idx, char2idx, word_idx, tag_idx, char_idx\n",
- " sentences, words, depends, labels = [], [], [], []\n",
- " temp_sentence, temp_word, temp_depend, temp_label = [], [], [], []\n",
- " for sentence in corpus:\n",
- " if len(sentence):\n",
- " sentence = sentence.split('\\t')\n",
- " for c in sentence[1]:\n",
- " if c not in char2idx:\n",
- " char2idx[c] = char_idx\n",
- " char_idx += 1\n",
- " if sentence[7] not in tag2idx:\n",
- " tag2idx[sentence[7]] = tag_idx\n",
- " tag_idx += 1\n",
- " if sentence[1] not in word2idx:\n",
- " word2idx[sentence[1]] = word_idx\n",
- " word_idx += 1\n",
- " temp_word.append(word2idx[sentence[1]])\n",
- " temp_depend.append(int(sentence[6]))\n",
- " temp_label.append(tag2idx[sentence[7]])\n",
- " temp_sentence.append(sentence[1])\n",
- " else:\n",
- " words.append(temp_word)\n",
- " depends.append(temp_depend)\n",
- " labels.append(temp_label)\n",
- " sentences.append(temp_sentence)\n",
- " temp_word = []\n",
- " temp_depend = []\n",
- " temp_label = []\n",
- " temp_sentence = []\n",
- " return sentences[:-1], words[:-1], depends[:-1], labels[:-1]\n",
- " \n",
- "sentences, words, depends, labels = process_corpus(corpus)\n",
- "sentences_test, words_test, depends_test, labels_test = process_corpus(corpus_test)"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": 4,
- "metadata": {},
- "outputs": [
- {
- "name": "stderr",
- "output_type": "stream",
- "text": [
- "Using TensorFlow backend.\n"
- ]
- }
- ],
- "source": [
- "from keras.preprocessing.sequence import pad_sequences"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": 5,
- "metadata": {},
- "outputs": [],
- "source": [
- "words = pad_sequences(words,padding='post')\n",
- "depends = pad_sequences(depends,padding='post')\n",
- "labels = pad_sequences(labels,padding='post')\n",
- "\n",
- "words_test = pad_sequences(words_test,padding='post')\n",
- "depends_test = pad_sequences(depends_test,padding='post')\n",
- "labels_test = pad_sequences(labels_test,padding='post')"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": 6,
- "metadata": {},
- "outputs": [
- {
- "data": {
- "text/plain": [
- "(1700, 118)"
- ]
- },
- "execution_count": 6,
- "metadata": {},
- "output_type": "execute_result"
- }
- ],
- "source": [
- "words_test.shape"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": 7,
- "metadata": {},
- "outputs": [],
- "source": [
- "def generate_char_seq(batch, UNK = 2):\n",
- " maxlen_c = max([len(k) for k in batch])\n",
- " x = [[len(i) for i in k] for k in batch]\n",
- " maxlen = max([j for i in x for j in i])\n",
- " temp = np.zeros((len(batch),maxlen_c,maxlen),dtype=np.int32)\n",
- " for i in range(len(batch)):\n",
- " for k in range(len(batch[i])):\n",
- " for no, c in enumerate(batch[i][k]):\n",
- " temp[i,k,-1-no] = char2idx.get(c, UNK)\n",
- " return temp"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": 8,
- "metadata": {},
- "outputs": [],
- "source": [
- "idx2word = {idx: tag for tag, idx in word2idx.items()}\n",
- "idx2tag = {i: w for w, i in tag2idx.items()}\n",
- "\n",
- "train_X = words\n",
- "train_Y = labels\n",
- "train_depends = depends\n",
- "train_char = generate_char_seq(sentences)\n",
- "\n",
- "test_X = words_test\n",
- "test_Y = labels_test\n",
- "test_depends = depends_test\n",
- "test_char = generate_char_seq(sentences_test)"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": 9,
- "metadata": {},
- "outputs": [],
- "source": [
- "class Model:\n",
- " def __init__(\n",
- " self,\n",
- " dim_word,\n",
- " dim_char,\n",
- " dropout,\n",
- " learning_rate,\n",
- " hidden_size_char,\n",
- " hidden_size_word,\n",
- " num_layers,\n",
- " maxlen\n",
- " ):\n",
- " def cells(size, reuse = False):\n",
- " return tf.contrib.rnn.DropoutWrapper(\n",
- " tf.nn.rnn_cell.LSTMCell(\n",
- " size,\n",
- " initializer = tf.orthogonal_initializer(),\n",
- " reuse = reuse,\n",
- " ),\n",
- " output_keep_prob = dropout,\n",
- " )\n",
- "\n",
- " def bahdanau(embedded, size):\n",
- " attention_mechanism = tf.contrib.seq2seq.LuongAttention(\n",
- " num_units = hidden_size_word, memory = embedded\n",
- " )\n",
- " return tf.contrib.seq2seq.AttentionWrapper(\n",
- " cell = cells(hidden_size_word),\n",
- " attention_mechanism = attention_mechanism,\n",
- " attention_layer_size = hidden_size_word,\n",
- " )\n",
- "\n",
- " self.word_ids = tf.placeholder(tf.int32, shape = [None, None])\n",
- " self.char_ids = tf.placeholder(tf.int32, shape = [None, None, None])\n",
- " self.labels = tf.placeholder(tf.int32, shape = [None, None])\n",
- " self.depends = tf.placeholder(tf.int32, shape = [None, None])\n",
- " self.maxlen = tf.shape(self.word_ids)[1]\n",
- " self.lengths = tf.count_nonzero(self.word_ids, 1)\n",
- "\n",
- " self.word_embeddings = tf.Variable(\n",
- " tf.truncated_normal(\n",
- " [len(word2idx), dim_word], stddev = 1.0 / np.sqrt(dim_word)\n",
- " )\n",
- " )\n",
- " self.char_embeddings = tf.Variable(\n",
- " tf.truncated_normal(\n",
- " [len(char2idx), dim_char], stddev = 1.0 / np.sqrt(dim_char)\n",
- " )\n",
- " )\n",
- "\n",
- " word_embedded = tf.nn.embedding_lookup(\n",
- " self.word_embeddings, self.word_ids\n",
- " )\n",
- " char_embedded = tf.nn.embedding_lookup(\n",
- " self.char_embeddings, self.char_ids\n",
- " )\n",
- " s = tf.shape(char_embedded)\n",
- " char_embedded = tf.reshape(\n",
- " char_embedded, shape = [s[0] * s[1], s[-2], dim_char]\n",
- " )\n",
- "\n",
- " for n in range(num_layers):\n",
- " (out_fw, out_bw), (\n",
- " state_fw,\n",
- " state_bw,\n",
- " ) = tf.nn.bidirectional_dynamic_rnn(\n",
- " cell_fw = cells(hidden_size_char),\n",
- " cell_bw = cells(hidden_size_char),\n",
- " inputs = char_embedded,\n",
- " dtype = tf.float32,\n",
- " scope = 'bidirectional_rnn_char_%d' % (n),\n",
- " )\n",
- " char_embedded = tf.concat((out_fw, out_bw), 2)\n",
- " output = tf.reshape(\n",
- " char_embedded[:, -1], shape = [s[0], s[1], 2 * hidden_size_char]\n",
- " )\n",
- " word_embedded = tf.concat([word_embedded, output], axis = -1)\n",
- "\n",
- " for n in range(num_layers):\n",
- " (out_fw, out_bw), (\n",
- " state_fw,\n",
- " state_bw,\n",
- " ) = tf.nn.bidirectional_dynamic_rnn(\n",
- " cell_fw = bahdanau(word_embedded, hidden_size_word),\n",
- " cell_bw = bahdanau(word_embedded, hidden_size_word),\n",
- " inputs = word_embedded,\n",
- " dtype = tf.float32,\n",
- " scope = 'bidirectional_rnn_word_%d' % (n),\n",
- " )\n",
- " word_embedded = tf.concat((out_fw, out_bw), 2)\n",
- "\n",
- " logits = tf.layers.dense(word_embedded, len(idx2tag))\n",
- " logits_depends = tf.layers.dense(word_embedded, maxlen)\n",
- " log_likelihood, transition_params = tf.contrib.crf.crf_log_likelihood(\n",
- " logits, self.labels, self.lengths\n",
- " )\n",
- " with tf.variable_scope(\"depends\"):\n",
- " log_likelihood_depends, transition_params_depends = tf.contrib.crf.crf_log_likelihood(\n",
- " logits_depends, self.depends, self.lengths\n",
- " )\n",
- " self.cost = tf.reduce_mean(-log_likelihood) + tf.reduce_mean(-log_likelihood_depends)\n",
- " self.optimizer = tf.train.AdamOptimizer(\n",
- " learning_rate = learning_rate\n",
- " ).minimize(self.cost)\n",
- " \n",
- " mask = tf.sequence_mask(self.lengths, maxlen = self.maxlen)\n",
- " \n",
- " self.tags_seq, _ = tf.contrib.crf.crf_decode(\n",
- " logits, transition_params, self.lengths\n",
- " )\n",
- " self.tags_seq_depends, _ = tf.contrib.crf.crf_decode(\n",
- " logits_depends, transition_params_depends, self.lengths\n",
- " )\n",
- "\n",
- " self.prediction = tf.boolean_mask(self.tags_seq, mask)\n",
- " mask_label = tf.boolean_mask(self.labels, mask)\n",
- " correct_pred = tf.equal(self.prediction, mask_label)\n",
- " correct_index = tf.cast(correct_pred, tf.float32)\n",
- " self.accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32))\n",
- " \n",
- " self.prediction = tf.boolean_mask(self.tags_seq_depends, mask)\n",
- " mask_label = tf.boolean_mask(self.depends, mask)\n",
- " correct_pred = tf.equal(self.prediction, mask_label)\n",
- " correct_index = tf.cast(correct_pred, tf.float32)\n",
- " self.accuracy_depends = tf.reduce_mean(tf.cast(correct_pred, tf.float32))"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": 10,
- "metadata": {},
- "outputs": [
- {
- "name": "stderr",
- "output_type": "stream",
- "text": [
- "/usr/local/lib/python3.6/dist-packages/tensorflow/python/ops/gradients_impl.py:100: UserWarning: Converting sparse IndexedSlices to a dense Tensor of unknown shape. This may consume a large amount of memory.\n",
- " \"Converting sparse IndexedSlices to a dense Tensor of unknown shape. \"\n"
- ]
- }
- ],
- "source": [
- "tf.reset_default_graph()\n",
- "sess = tf.InteractiveSession()\n",
- "\n",
- "dim_word = 128\n",
- "dim_char = 256\n",
- "dropout = 1\n",
- "learning_rate = 1e-3\n",
- "hidden_size_char = 64\n",
- "hidden_size_word = 64\n",
- "num_layers = 2\n",
- "batch_size = 32\n",
- "\n",
- "model = Model(dim_word,dim_char,dropout,learning_rate,hidden_size_char,hidden_size_word,num_layers,\n",
- " words.shape[1])\n",
- "sess.run(tf.global_variables_initializer())"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": 11,
- "metadata": {},
- "outputs": [
- {
- "name": "stderr",
- "output_type": "stream",
- "text": [
- "train minibatch loop: 100%|██████████| 76/76 [00:40<00:00, 2.05it/s, accuracy=0.106, accuracy_depends=0.14, cost=105] \n",
- "test minibatch loop: 100%|██████████| 54/54 [00:16<00:00, 3.87it/s, accuracy=0.136, accuracy_depends=0.0455, cost=164]\n",
- "train minibatch loop: 0%| | 0/76 [00:00, ?it/s]"
- ]
- },
- {
- "name": "stdout",
- "output_type": "stream",
- "text": [
- "time taken: 56.09716725349426\n",
- "epoch: 0, training loss: 150.300666, training acc: 0.128670, training depends: 0.081010, valid loss: 141.028491, valid acc: 0.147059, valid depends: 0.120955\n",
- "\n"
- ]
- },
- {
- "name": "stderr",
- "output_type": "stream",
- "text": [
- "train minibatch loop: 100%|██████████| 76/76 [00:39<00:00, 2.06it/s, accuracy=0.298, accuracy_depends=0.161, cost=90.5]\n",
- "test minibatch loop: 100%|██████████| 54/54 [00:15<00:00, 3.83it/s, accuracy=0.309, accuracy_depends=0.118, cost=143] \n",
- "train minibatch loop: 0%| | 0/76 [00:00, ?it/s]"
- ]
- },
- {
- "name": "stdout",
- "output_type": "stream",
- "text": [
- "time taken: 55.58958339691162\n",
- "epoch: 1, training loss: 130.642711, training acc: 0.227523, training depends: 0.128279, valid loss: 124.700015, valid acc: 0.309620, valid depends: 0.129380\n",
- "\n"
- ]
- },
- {
- "name": "stderr",
- "output_type": "stream",
- "text": [
- "train minibatch loop: 100%|██████████| 76/76 [00:39<00:00, 2.08it/s, accuracy=0.517, accuracy_depends=0.195, cost=74.7]\n",
- "test minibatch loop: 100%|██████████| 54/54 [00:15<00:00, 3.83it/s, accuracy=0.518, accuracy_depends=0.145, cost=121] \n",
- "train minibatch loop: 0%| | 0/76 [00:00, ?it/s]"
- ]
- },
- {
- "name": "stdout",
- "output_type": "stream",
- "text": [
- "time taken: 55.429301023483276\n",
- "epoch: 2, training loss: 109.336289, training acc: 0.440639, training depends: 0.154167, valid loss: 105.495260, valid acc: 0.495497, valid depends: 0.153521\n",
- "\n"
- ]
- },
- {
- "name": "stderr",
- "output_type": "stream",
- "text": [
- "train minibatch loop: 100%|██████████| 76/76 [00:39<00:00, 2.08it/s, accuracy=0.562, accuracy_depends=0.236, cost=67.6]\n",
- "test minibatch loop: 100%|██████████| 54/54 [00:15<00:00, 3.83it/s, accuracy=0.573, accuracy_depends=0.2, cost=115] \n",
- "train minibatch loop: 0%| | 0/76 [00:00, ?it/s]"
- ]
- },
- {
- "name": "stdout",
- "output_type": "stream",
- "text": [
- "time taken: 55.509848833084106\n",
- "epoch: 3, training loss: 96.279404, training acc: 0.549312, training depends: 0.185349, valid loss: 99.562834, valid acc: 0.568718, valid depends: 0.161316\n",
- "\n"
- ]
- },
- {
- "name": "stderr",
- "output_type": "stream",
- "text": [
- "train minibatch loop: 100%|██████████| 76/76 [00:39<00:00, 2.08it/s, accuracy=0.682, accuracy_depends=0.257, cost=59.7]\n",
- "test minibatch loop: 100%|██████████| 54/54 [00:15<00:00, 3.84it/s, accuracy=0.627, accuracy_depends=0.2, cost=105] \n",
- "train minibatch loop: 0%| | 0/76 [00:00, ?it/s]"
- ]
- },
- {
- "name": "stdout",
- "output_type": "stream",
- "text": [
- "time taken: 55.52307176589966\n",
- "epoch: 4, training loss: 86.934430, training acc: 0.639515, training depends: 0.214143, valid loss: 90.451283, valid acc: 0.642919, valid depends: 0.181113\n",
- "\n"
- ]
- },
- {
- "name": "stderr",
- "output_type": "stream",
- "text": [
- "train minibatch loop: 100%|██████████| 76/76 [00:39<00:00, 2.09it/s, accuracy=0.747, accuracy_depends=0.274, cost=53] \n",
- "test minibatch loop: 100%|██████████| 54/54 [00:15<00:00, 3.88it/s, accuracy=0.673, accuracy_depends=0.209, cost=99.6]\n",
- "train minibatch loop: 0%| | 0/76 [00:00, ?it/s]"
- ]
- },
- {
- "name": "stdout",
- "output_type": "stream",
- "text": [
- "time taken: 55.530587673187256\n",
- "epoch: 5, training loss: 79.217252, training acc: 0.707435, training depends: 0.240276, valid loss: 85.398946, valid acc: 0.691384, valid depends: 0.198120\n",
- "\n"
- ]
- },
- {
- "name": "stderr",
- "output_type": "stream",
- "text": [
- "train minibatch loop: 100%|██████████| 76/76 [00:39<00:00, 2.08it/s, accuracy=0.801, accuracy_depends=0.353, cost=48.9]\n",
- "test minibatch loop: 100%|██████████| 54/54 [00:15<00:00, 3.91it/s, accuracy=0.709, accuracy_depends=0.182, cost=94.8]\n",
- "train minibatch loop: 0%| | 0/76 [00:00, ?it/s]"
- ]
- },
- {
- "name": "stdout",
- "output_type": "stream",
- "text": [
- "time taken: 55.583584785461426\n",
- "epoch: 6, training loss: 72.303042, training acc: 0.762662, training depends: 0.274533, valid loss: 82.404467, valid acc: 0.727524, valid depends: 0.189771\n",
- "\n"
- ]
- },
- {
- "name": "stderr",
- "output_type": "stream",
- "text": [
- "train minibatch loop: 100%|██████████| 76/76 [00:39<00:00, 2.09it/s, accuracy=0.805, accuracy_depends=0.377, cost=43.4]\n",
- "test minibatch loop: 100%|██████████| 54/54 [00:15<00:00, 3.87it/s, accuracy=0.727, accuracy_depends=0.191, cost=90.9]\n",
- "train minibatch loop: 0%| | 0/76 [00:00, ?it/s]"
- ]
- },
- {
- "name": "stdout",
- "output_type": "stream",
- "text": [
- "time taken: 55.58053755760193\n",
- "epoch: 7, training loss: 65.055744, training acc: 0.798943, training depends: 0.321437, valid loss: 80.717043, valid acc: 0.746362, valid depends: 0.196639\n",
- "\n"
- ]
- },
- {
- "name": "stderr",
- "output_type": "stream",
- "text": [
- "train minibatch loop: 100%|██████████| 76/76 [00:39<00:00, 2.09it/s, accuracy=0.853, accuracy_depends=0.455, cost=38.9]\n",
- "test minibatch loop: 100%|██████████| 54/54 [00:15<00:00, 3.86it/s, accuracy=0.745, accuracy_depends=0.209, cost=93.3]\n",
- "train minibatch loop: 0%| | 0/76 [00:00, ?it/s]"
- ]
- },
- {
- "name": "stdout",
- "output_type": "stream",
- "text": [
- "time taken: 55.67471981048584\n",
- "epoch: 8, training loss: 58.739642, training acc: 0.827087, training depends: 0.377910, valid loss: 81.661547, valid acc: 0.749696, valid depends: 0.195816\n",
- "\n"
- ]
- },
- {
- "name": "stderr",
- "output_type": "stream",
- "text": [
- "train minibatch loop: 100%|██████████| 76/76 [00:39<00:00, 2.08it/s, accuracy=0.866, accuracy_depends=0.527, cost=35.2]\n",
- "test minibatch loop: 100%|██████████| 54/54 [00:15<00:00, 3.85it/s, accuracy=0.727, accuracy_depends=0.145, cost=101] \n",
- "train minibatch loop: 0%| | 0/76 [00:00, ?it/s]"
- ]
- },
- {
- "name": "stdout",
- "output_type": "stream",
- "text": [
- "time taken: 55.61992311477661\n",
- "epoch: 9, training loss: 54.076346, training acc: 0.848619, training depends: 0.417288, valid loss: 80.947128, valid acc: 0.767324, valid depends: 0.209349\n",
- "\n"
- ]
- },
- {
- "name": "stderr",
- "output_type": "stream",
- "text": [
- "train minibatch loop: 100%|██████████| 76/76 [00:39<00:00, 2.09it/s, accuracy=0.904, accuracy_depends=0.507, cost=33] \n",
- "test minibatch loop: 100%|██████████| 54/54 [00:15<00:00, 3.90it/s, accuracy=0.782, accuracy_depends=0.209, cost=91.2]\n",
- "train minibatch loop: 0%| | 0/76 [00:00, ?it/s]"
- ]
- },
- {
- "name": "stdout",
- "output_type": "stream",
- "text": [
- "time taken: 55.548739194869995\n",
- "epoch: 10, training loss: 50.326555, training acc: 0.863248, training depends: 0.458952, valid loss: 79.820367, valid acc: 0.774822, valid depends: 0.222942\n",
- "\n"
- ]
- },
- {
- "name": "stderr",
- "output_type": "stream",
- "text": [
- "train minibatch loop: 100%|██████████| 76/76 [00:39<00:00, 2.06it/s, accuracy=0.911, accuracy_depends=0.558, cost=30.4]\n",
- "test minibatch loop: 100%|██████████| 54/54 [00:15<00:00, 3.83it/s, accuracy=0.791, accuracy_depends=0.227, cost=89.9]\n",
- "train minibatch loop: 0%| | 0/76 [00:00, ?it/s]"
- ]
- },
- {
- "name": "stdout",
- "output_type": "stream",
- "text": [
- "time taken: 55.604267597198486\n",
- "epoch: 11, training loss: 45.569131, training acc: 0.877704, training depends: 0.509152, valid loss: 80.193576, valid acc: 0.779312, valid depends: 0.218611\n",
- "\n"
- ]
- },
- {
- "name": "stderr",
- "output_type": "stream",
- "text": [
- "train minibatch loop: 100%|██████████| 76/76 [00:39<00:00, 2.08it/s, accuracy=0.928, accuracy_depends=0.688, cost=23.6]\n",
- "test minibatch loop: 100%|██████████| 54/54 [00:15<00:00, 3.85it/s, accuracy=0.791, accuracy_depends=0.145, cost=91.4]\n",
- "train minibatch loop: 0%| | 0/76 [00:00, ?it/s]"
- ]
- },
- {
- "name": "stdout",
- "output_type": "stream",
- "text": [
- "time taken: 55.518343448638916\n",
- "epoch: 12, training loss: 41.137693, training acc: 0.893106, training depends: 0.548518, valid loss: 82.710994, valid acc: 0.784206, valid depends: 0.220646\n",
- "\n"
- ]
- },
- {
- "name": "stderr",
- "output_type": "stream",
- "text": [
- "train minibatch loop: 100%|██████████| 76/76 [00:40<00:00, 2.08it/s, accuracy=0.935, accuracy_depends=0.678, cost=24.9]\n",
- "test minibatch loop: 100%|██████████| 54/54 [00:15<00:00, 3.84it/s, accuracy=0.809, accuracy_depends=0.164, cost=97.1]\n",
- "train minibatch loop: 0%| | 0/76 [00:00, ?it/s]"
- ]
- },
- {
- "name": "stdout",
- "output_type": "stream",
- "text": [
- "time taken: 55.76531481742859\n",
- "epoch: 13, training loss: 37.963494, training acc: 0.906679, training depends: 0.583725, valid loss: 82.073511, valid acc: 0.782869, valid depends: 0.243221\n",
- "\n"
- ]
- },
- {
- "name": "stderr",
- "output_type": "stream",
- "text": [
- "train minibatch loop: 100%|██████████| 76/76 [00:39<00:00, 2.10it/s, accuracy=0.942, accuracy_depends=0.733, cost=18.8]\n",
- "test minibatch loop: 100%|██████████| 54/54 [00:15<00:00, 3.86it/s, accuracy=0.836, accuracy_depends=0.145, cost=101] \n",
- "train minibatch loop: 0%| | 0/76 [00:00, ?it/s]"
- ]
- },
- {
- "name": "stdout",
- "output_type": "stream",
- "text": [
- "time taken: 55.35486125946045\n",
- "epoch: 14, training loss: 34.554710, training acc: 0.917006, training depends: 0.620208, valid loss: 85.657195, valid acc: 0.784520, valid depends: 0.241463\n",
- "\n"
- ]
- },
- {
- "name": "stderr",
- "output_type": "stream",
- "text": [
- "train minibatch loop: 100%|██████████| 76/76 [00:39<00:00, 2.10it/s, accuracy=0.966, accuracy_depends=0.781, cost=15.8]\n",
- "test minibatch loop: 100%|██████████| 54/54 [00:15<00:00, 3.89it/s, accuracy=0.864, accuracy_depends=0.173, cost=103] \n",
- "train minibatch loop: 0%| | 0/76 [00:00, ?it/s]"
- ]
- },
- {
- "name": "stdout",
- "output_type": "stream",
- "text": [
- "time taken: 55.17195224761963\n",
- "epoch: 15, training loss: 31.398126, training acc: 0.924450, training depends: 0.656583, valid loss: 85.506386, valid acc: 0.793804, valid depends: 0.255798\n",
- "\n"
- ]
- },
- {
- "name": "stderr",
- "output_type": "stream",
- "text": [
- "train minibatch loop: 100%|██████████| 76/76 [00:39<00:00, 2.06it/s, accuracy=0.935, accuracy_depends=0.774, cost=15.4]\n",
- "test minibatch loop: 100%|██████████| 54/54 [00:15<00:00, 3.83it/s, accuracy=0.818, accuracy_depends=0.164, cost=98.7]\n",
- "train minibatch loop: 0%| | 0/76 [00:00, ?it/s]"
- ]
- },
- {
- "name": "stdout",
- "output_type": "stream",
- "text": [
- "time taken: 55.265546560287476\n",
- "epoch: 16, training loss: 28.956760, training acc: 0.932248, training depends: 0.678523, valid loss: 84.795803, valid acc: 0.796438, valid depends: 0.264498\n",
- "\n"
- ]
- },
- {
- "name": "stderr",
- "output_type": "stream",
- "text": [
- "train minibatch loop: 100%|██████████| 76/76 [00:39<00:00, 2.10it/s, accuracy=0.959, accuracy_depends=0.75, cost=16.2] \n",
- "test minibatch loop: 100%|██████████| 54/54 [00:15<00:00, 3.90it/s, accuracy=0.8, accuracy_depends=0.164, cost=111] \n",
- "train minibatch loop: 0%| | 0/76 [00:00, ?it/s]"
- ]
- },
- {
- "name": "stdout",
- "output_type": "stream",
- "text": [
- "time taken: 55.26459813117981\n",
- "epoch: 17, training loss: 27.902658, training acc: 0.938587, training depends: 0.685745, valid loss: 90.332116, valid acc: 0.796167, valid depends: 0.239845\n",
- "\n"
- ]
- },
- {
- "name": "stderr",
- "output_type": "stream",
- "text": [
- "train minibatch loop: 100%|██████████| 76/76 [00:39<00:00, 2.08it/s, accuracy=0.959, accuracy_depends=0.856, cost=12.1]\n",
- "test minibatch loop: 100%|██████████| 54/54 [00:15<00:00, 3.89it/s, accuracy=0.827, accuracy_depends=0.191, cost=102] \n",
- "train minibatch loop: 0%| | 0/76 [00:00, ?it/s]"
- ]
- },
- {
- "name": "stdout",
- "output_type": "stream",
- "text": [
- "time taken: 55.02435898780823\n",
- "epoch: 18, training loss: 24.752691, training acc: 0.943680, training depends: 0.727282, valid loss: 88.909203, valid acc: 0.802451, valid depends: 0.263541\n",
- "\n"
- ]
- },
- {
- "name": "stderr",
- "output_type": "stream",
- "text": [
- "train minibatch loop: 100%|██████████| 76/76 [00:39<00:00, 2.12it/s, accuracy=0.976, accuracy_depends=0.877, cost=9.05]\n",
- "test minibatch loop: 100%|██████████| 54/54 [00:15<00:00, 3.92it/s, accuracy=0.836, accuracy_depends=0.155, cost=110] "
- ]
- },
- {
- "name": "stdout",
- "output_type": "stream",
- "text": [
- "time taken: 55.076401472091675\n",
- "epoch: 19, training loss: 21.722709, training acc: 0.951147, training depends: 0.767359, valid loss: 92.559914, valid acc: 0.800110, valid depends: 0.274829\n",
- "\n"
- ]
- },
- {
- "name": "stderr",
- "output_type": "stream",
- "text": [
- "\n"
- ]
- }
- ],
- "source": [
- "import time\n",
- "\n",
- "for e in range(20):\n",
- " lasttime = time.time()\n",
- " train_acc, train_loss, test_acc, test_loss, train_acc_depends, test_acc_depends = 0, 0, 0, 0, 0, 0\n",
- " pbar = tqdm(\n",
- " range(0, len(train_X), batch_size), desc = 'train minibatch loop'\n",
- " )\n",
- " for i in pbar:\n",
- " batch_x = train_X[i : min(i + batch_size, train_X.shape[0])]\n",
- " batch_char = train_char[i : min(i + batch_size, train_X.shape[0])]\n",
- " batch_y = train_Y[i : min(i + batch_size, train_X.shape[0])]\n",
- " batch_depends = train_depends[i : min(i + batch_size, train_X.shape[0])]\n",
- " acc_depends, acc, cost, _ = sess.run(\n",
- " [model.accuracy_depends, model.accuracy, model.cost, model.optimizer],\n",
- " feed_dict = {\n",
- " model.word_ids: batch_x,\n",
- " model.char_ids: batch_char,\n",
- " model.labels: batch_y,\n",
- " model.depends: batch_depends\n",
- " },\n",
- " )\n",
- " assert not np.isnan(cost)\n",
- " train_loss += cost\n",
- " train_acc += acc\n",
- " train_acc_depends += acc_depends\n",
- " pbar.set_postfix(cost = cost, accuracy = acc, accuracy_depends = acc_depends)\n",
- " \n",
- " pbar = tqdm(\n",
- " range(0, len(test_X), batch_size), desc = 'test minibatch loop'\n",
- " )\n",
- " for i in pbar:\n",
- " batch_x = test_X[i : min(i + batch_size, test_X.shape[0])]\n",
- " batch_char = test_char[i : min(i + batch_size, test_X.shape[0])]\n",
- " batch_y = test_Y[i : min(i + batch_size, test_X.shape[0])]\n",
- " batch_depends = test_depends[i : min(i + batch_size, test_X.shape[0])]\n",
- " acc_depends, acc, cost = sess.run(\n",
- " [model.accuracy_depends, model.accuracy, model.cost],\n",
- " feed_dict = {\n",
- " model.word_ids: batch_x,\n",
- " model.char_ids: batch_char,\n",
- " model.labels: batch_y,\n",
- " model.depends: batch_depends\n",
- " },\n",
- " )\n",
- " assert not np.isnan(cost)\n",
- " test_loss += cost\n",
- " test_acc += acc\n",
- " test_acc_depends += acc_depends\n",
- " pbar.set_postfix(cost = cost, accuracy = acc, accuracy_depends = acc_depends)\n",
- " \n",
- " train_loss /= len(train_X) / batch_size\n",
- " train_acc /= len(train_X) / batch_size\n",
- " train_acc_depends /= len(train_X) / batch_size\n",
- " test_loss /= len(test_X) / batch_size\n",
- " test_acc /= len(test_X) / batch_size\n",
- " test_acc_depends /= len(test_X) / batch_size\n",
- "\n",
- " print('time taken:', time.time() - lasttime)\n",
- " print(\n",
- " 'epoch: %d, training loss: %f, training acc: %f, training depends: %f, valid loss: %f, valid acc: %f, valid depends: %f\\n'\n",
- " % (e, train_loss, train_acc, train_acc_depends, test_loss, test_acc, test_acc_depends)\n",
- " )"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": 12,
- "metadata": {},
- "outputs": [],
- "source": [
- "seq, deps = sess.run([model.tags_seq, model.tags_seq_depends],\n",
- " feed_dict={model.word_ids:batch_x[:1],\n",
- " model.char_ids:batch_char[:1]})"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": 13,
- "metadata": {},
- "outputs": [],
- "source": [
- "seq = seq[0]\n",
- "deps = deps[0]"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": 14,
- "metadata": {},
- "outputs": [
- {
- "data": {
- "text/plain": [
- "array([18, 19, 2, 6, 3, 4, 16, 18, 23, 20, 19, 2], dtype=int32)"
- ]
- },
- "execution_count": 14,
- "metadata": {},
- "output_type": "execute_result"
- }
- ],
- "source": [
- "seq[seq>0]"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": 15,
- "metadata": {},
- "outputs": [
- {
- "data": {
- "text/plain": [
- "array([18, 19, 2, 6, 3, 7, 16, 18, 23, 20, 19, 2], dtype=int32)"
- ]
- },
- "execution_count": 15,
- "metadata": {},
- "output_type": "execute_result"
- }
- ],
- "source": [
- "batch_y[0][seq>0]"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": 16,
- "metadata": {},
- "outputs": [
- {
- "data": {
- "text/plain": [
- "array([ 2, 4, 4, 4, 8, 8, 4, 10, 12, 12, 8, 4], dtype=int32)"
- ]
- },
- "execution_count": 16,
- "metadata": {},
- "output_type": "execute_result"
- }
- ],
- "source": [
- "deps[seq>0]"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": 17,
- "metadata": {},
- "outputs": [
- {
- "data": {
- "text/plain": [
- "array([ 2, 6, 6, 5, 6, 0, 6, 11, 11, 11, 6, 6], dtype=int32)"
- ]
- },
- "execution_count": 17,
- "metadata": {},
- "output_type": "execute_result"
- }
- ],
- "source": [
- "batch_depends[0][seq>0]"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "metadata": {},
- "outputs": [],
- "source": []
- }
- ],
- "metadata": {
- "kernelspec": {
- "display_name": "Python 3",
- "language": "python",
- "name": "python3"
- },
- "language_info": {
- "codemirror_mode": {
- "name": "ipython",
- "version": 3
- },
- "file_extension": ".py",
- "mimetype": "text/x-python",
- "name": "python",
- "nbconvert_exporter": "python",
- "pygments_lexer": "ipython3",
- "version": "3.6.8"
- }
- },
- "nbformat": 4,
- "nbformat_minor": 2
-}
diff --git a/dependency-parser/2.lstm-birnn-bahdanau-crf-biaffine.ipynb b/dependency-parser/2.lstm-birnn-bahdanau-crf-biaffine.ipynb
new file mode 100644
index 0000000..9912dfc
--- /dev/null
+++ b/dependency-parser/2.lstm-birnn-bahdanau-crf-biaffine.ipynb
@@ -0,0 +1,1622 @@
+{
+ "nbformat": 4,
+ "nbformat_minor": 0,
+ "metadata": {
+ "kernelspec": {
+ "name": "python3",
+ "display_name": "Python 3"
+ },
+ "language_info": {
+ "pygments_lexer": "ipython3",
+ "nbconvert_exporter": "python",
+ "version": "3.6.4",
+ "file_extension": ".py",
+ "codemirror_mode": {
+ "name": "ipython",
+ "version": 3
+ },
+ "name": "python",
+ "mimetype": "text/x-python"
+ },
+ "colab": {
+ "name": "lstm-birnn-bahdanau-crf-biaffine.ipynb",
+ "provenance": [],
+ "collapsed_sections": []
+ },
+ "accelerator": "GPU"
+ },
+ "cells": [
+ {
+ "cell_type": "code",
+ "metadata": {
+ "_uuid": "8f2839f25d086af736a60e9eeb907d3b93b6e0e5",
+ "_cell_guid": "b1076dfc-b9ad-4769-8c92-a6c4dae69d19",
+ "trusted": true,
+ "id": "Ljz2IbsWluHv",
+ "colab_type": "code",
+ "colab": {
+ "base_uri": "https://localhost:8080/",
+ "height": 1000
+ },
+ "outputId": "a2204647-1d43-4934-cec4-28a412730e57"
+ },
+ "source": [
+ "!wget https://raw.githubusercontent.com/UniversalDependencies/UD_English-EWT/master/en_ewt-ud-dev.conllu\n",
+ "!wget https://raw.githubusercontent.com/UniversalDependencies/UD_English-EWT/master/en_ewt-ud-train.conllu\n",
+ "!wget https://raw.githubusercontent.com/UniversalDependencies/UD_English-EWT/master/en_ewt-ud-test.conllu\n",
+ "!pip install malaya -U"
+ ],
+ "execution_count": 1,
+ "outputs": [
+ {
+ "output_type": "stream",
+ "text": [
+ "--2019-09-30 05:12:41-- https://raw.githubusercontent.com/UniversalDependencies/UD_English-EWT/master/en_ewt-ud-dev.conllu\n",
+ "Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 151.101.0.133, 151.101.64.133, 151.101.128.133, ...\n",
+ "Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|151.101.0.133|:443... connected.\n",
+ "HTTP request sent, awaiting response... 200 OK\n",
+ "Length: 1668174 (1.6M) [text/plain]\n",
+ "Saving to: ‘en_ewt-ud-dev.conllu’\n",
+ "\n",
+ "en_ewt-ud-dev.conll 100%[===================>] 1.59M --.-KB/s in 0.01s \n",
+ "\n",
+ "2019-09-30 05:12:47 (108 MB/s) - ‘en_ewt-ud-dev.conllu’ saved [1668174/1668174]\n",
+ "\n",
+ "--2019-09-30 05:12:49-- https://raw.githubusercontent.com/UniversalDependencies/UD_English-EWT/master/en_ewt-ud-train.conllu\n",
+ "Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 151.101.0.133, 151.101.64.133, 151.101.128.133, ...\n",
+ "Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|151.101.0.133|:443... connected.\n",
+ "HTTP request sent, awaiting response... 200 OK\n",
+ "Length: 13303045 (13M) [text/plain]\n",
+ "Saving to: ‘en_ewt-ud-train.conllu’\n",
+ "\n",
+ "en_ewt-ud-train.con 100%[===================>] 12.69M --.-KB/s in 0.07s \n",
+ "\n",
+ "2019-09-30 05:12:51 (178 MB/s) - ‘en_ewt-ud-train.conllu’ saved [13303045/13303045]\n",
+ "\n",
+ "--2019-09-30 05:12:53-- https://raw.githubusercontent.com/UniversalDependencies/UD_English-EWT/master/en_ewt-ud-test.conllu\n",
+ "Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 151.101.0.133, 151.101.64.133, 151.101.128.133, ...\n",
+ "Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|151.101.0.133|:443... connected.\n",
+ "HTTP request sent, awaiting response... 200 OK\n",
+ "Length: 1661985 (1.6M) [text/plain]\n",
+ "Saving to: ‘en_ewt-ud-test.conllu’\n",
+ "\n",
+ "en_ewt-ud-test.conl 100%[===================>] 1.58M --.-KB/s in 0.03s \n",
+ "\n",
+ "2019-09-30 05:12:54 (58.9 MB/s) - ‘en_ewt-ud-test.conllu’ saved [1661985/1661985]\n",
+ "\n",
+ "Collecting malaya\n",
+ "\u001b[?25l Downloading https://files.pythonhosted.org/packages/b1/11/5f8ea8da94136d1fb4db39931d4ed55ae51655a3212b33e5bf607271646e/malaya-2.7.7.0-py3-none-any.whl (2.1MB)\n",
+ "\u001b[K |████████████████████████████████| 2.1MB 34.6MB/s \n",
+ "\u001b[?25hRequirement already satisfied, skipping upgrade: tensorflow in /usr/local/lib/python3.6/dist-packages (from malaya) (1.14.0)\n",
+ "Collecting sentencepiece (from malaya)\n",
+ "\u001b[?25l Downloading https://files.pythonhosted.org/packages/14/3d/efb655a670b98f62ec32d66954e1109f403db4d937c50d779a75b9763a29/sentencepiece-0.1.83-cp36-cp36m-manylinux1_x86_64.whl (1.0MB)\n",
+ "\u001b[K |████████████████████████████████| 1.0MB 42.9MB/s \n",
+ "\u001b[?25hCollecting PySastrawi (from malaya)\n",
+ "\u001b[?25l Downloading https://files.pythonhosted.org/packages/61/84/b0a5454a040f81e81e6a95a5d5635f20ad43cc0c288f8b4966b339084962/PySastrawi-1.2.0-py2.py3-none-any.whl (210kB)\n",
+ "\u001b[K |████████████████████████████████| 215kB 54.5MB/s \n",
+ "\u001b[?25hRequirement already satisfied, skipping upgrade: sklearn in /usr/local/lib/python3.6/dist-packages (from malaya) (0.0)\n",
+ "Collecting unidecode (from malaya)\n",
+ "\u001b[?25l Downloading https://files.pythonhosted.org/packages/d0/42/d9edfed04228bacea2d824904cae367ee9efd05e6cce7ceaaedd0b0ad964/Unidecode-1.1.1-py2.py3-none-any.whl (238kB)\n",
+ "\u001b[K |████████████████████████████████| 245kB 55.3MB/s \n",
+ "\u001b[?25hRequirement already satisfied, skipping upgrade: xgboost in /usr/local/lib/python3.6/dist-packages (from malaya) (0.90)\n",
+ "Requirement already satisfied, skipping upgrade: numpy in /usr/local/lib/python3.6/dist-packages (from malaya) (1.16.5)\n",
+ "Collecting bert-tensorflow (from malaya)\n",
+ "\u001b[?25l Downloading https://files.pythonhosted.org/packages/a6/66/7eb4e8b6ea35b7cc54c322c816f976167a43019750279a8473d355800a93/bert_tensorflow-1.0.1-py2.py3-none-any.whl (67kB)\n",
+ "\u001b[K |████████████████████████████████| 71kB 37.1MB/s \n",
+ "\u001b[?25hRequirement already satisfied, skipping upgrade: requests in /usr/local/lib/python3.6/dist-packages (from malaya) (2.21.0)\n",
+ "Requirement already satisfied, skipping upgrade: networkx in /usr/local/lib/python3.6/dist-packages (from malaya) (2.3)\n",
+ "Collecting dateparser (from malaya)\n",
+ "\u001b[?25l Downloading https://files.pythonhosted.org/packages/82/9d/51126ac615bbc4418478d725a5fa1a0f112059f6f111e4b48cfbe17ef9d0/dateparser-0.7.2-py2.py3-none-any.whl (352kB)\n",
+ "\u001b[K |████████████████████████████████| 358kB 59.1MB/s \n",
+ "\u001b[?25hRequirement already satisfied, skipping upgrade: scipy in /usr/local/lib/python3.6/dist-packages (from malaya) (1.3.1)\n",
+ "Requirement already satisfied, skipping upgrade: scikit-learn in /usr/local/lib/python3.6/dist-packages (from malaya) (0.21.3)\n",
+ "Collecting ftfy (from malaya)\n",
+ "\u001b[?25l Downloading https://files.pythonhosted.org/packages/75/ca/2d9a5030eaf1bcd925dab392762b9709a7ad4bd486a90599d93cd79cb188/ftfy-5.6.tar.gz (58kB)\n",
+ "\u001b[K |████████████████████████████████| 61kB 27.2MB/s \n",
+ "\u001b[?25hRequirement already satisfied, skipping upgrade: keras-applications>=1.0.6 in /usr/local/lib/python3.6/dist-packages (from tensorflow->malaya) (1.0.8)\n",
+ "Requirement already satisfied, skipping upgrade: wrapt>=1.11.1 in /usr/local/lib/python3.6/dist-packages (from tensorflow->malaya) (1.11.2)\n",
+ "Requirement already satisfied, skipping upgrade: wheel>=0.26 in /usr/local/lib/python3.6/dist-packages (from tensorflow->malaya) (0.33.6)\n",
+ "Requirement already satisfied, skipping upgrade: gast>=0.2.0 in /usr/local/lib/python3.6/dist-packages (from tensorflow->malaya) (0.2.2)\n",
+ "Requirement already satisfied, skipping upgrade: grpcio>=1.8.6 in /usr/local/lib/python3.6/dist-packages (from tensorflow->malaya) (1.15.0)\n",
+ "Requirement already satisfied, skipping upgrade: tensorboard<1.15.0,>=1.14.0 in /usr/local/lib/python3.6/dist-packages (from tensorflow->malaya) (1.14.0)\n",
+ "Requirement already satisfied, skipping upgrade: astor>=0.6.0 in /usr/local/lib/python3.6/dist-packages (from tensorflow->malaya) (0.8.0)\n",
+ "Requirement already satisfied, skipping upgrade: protobuf>=3.6.1 in /usr/local/lib/python3.6/dist-packages (from tensorflow->malaya) (3.7.1)\n",
+ "Requirement already satisfied, skipping upgrade: six>=1.10.0 in /usr/local/lib/python3.6/dist-packages (from tensorflow->malaya) (1.12.0)\n",
+ "Requirement already satisfied, skipping upgrade: termcolor>=1.1.0 in /usr/local/lib/python3.6/dist-packages (from tensorflow->malaya) (1.1.0)\n",
+ "Requirement already satisfied, skipping upgrade: keras-preprocessing>=1.0.5 in /usr/local/lib/python3.6/dist-packages (from tensorflow->malaya) (1.1.0)\n",
+ "Requirement already satisfied, skipping upgrade: tensorflow-estimator<1.15.0rc0,>=1.14.0rc0 in /usr/local/lib/python3.6/dist-packages (from tensorflow->malaya) (1.14.0)\n",
+ "Requirement already satisfied, skipping upgrade: google-pasta>=0.1.6 in /usr/local/lib/python3.6/dist-packages (from tensorflow->malaya) (0.1.7)\n",
+ "Requirement already satisfied, skipping upgrade: absl-py>=0.7.0 in /usr/local/lib/python3.6/dist-packages (from tensorflow->malaya) (0.8.0)\n",
+ "Requirement already satisfied, skipping upgrade: urllib3<1.25,>=1.21.1 in /usr/local/lib/python3.6/dist-packages (from requests->malaya) (1.24.3)\n",
+ "Requirement already satisfied, skipping upgrade: certifi>=2017.4.17 in /usr/local/lib/python3.6/dist-packages (from requests->malaya) (2019.6.16)\n",
+ "Requirement already satisfied, skipping upgrade: idna<2.9,>=2.5 in /usr/local/lib/python3.6/dist-packages (from requests->malaya) (2.8)\n",
+ "Requirement already satisfied, skipping upgrade: chardet<3.1.0,>=3.0.2 in /usr/local/lib/python3.6/dist-packages (from requests->malaya) (3.0.4)\n",
+ "Requirement already satisfied, skipping upgrade: decorator>=4.3.0 in /usr/local/lib/python3.6/dist-packages (from networkx->malaya) (4.4.0)\n",
+ "Requirement already satisfied, skipping upgrade: python-dateutil in /usr/local/lib/python3.6/dist-packages (from dateparser->malaya) (2.5.3)\n",
+ "Collecting regex (from dateparser->malaya)\n",
+ "\u001b[?25l Downloading https://files.pythonhosted.org/packages/6f/a6/99eeb5904ab763db87af4bd71d9b1dfdd9792681240657a4c0a599c10a81/regex-2019.08.19.tar.gz (654kB)\n",
+ "\u001b[K |████████████████████████████████| 655kB 45.7MB/s \n",
+ "\u001b[?25hRequirement already satisfied, skipping upgrade: pytz in /usr/local/lib/python3.6/dist-packages (from dateparser->malaya) (2018.9)\n",
+ "Requirement already satisfied, skipping upgrade: tzlocal in /usr/local/lib/python3.6/dist-packages (from dateparser->malaya) (1.5.1)\n",
+ "Requirement already satisfied, skipping upgrade: joblib>=0.11 in /usr/local/lib/python3.6/dist-packages (from scikit-learn->malaya) (0.13.2)\n",
+ "Requirement already satisfied, skipping upgrade: wcwidth in /usr/local/lib/python3.6/dist-packages (from ftfy->malaya) (0.1.7)\n",
+ "Requirement already satisfied, skipping upgrade: h5py in /usr/local/lib/python3.6/dist-packages (from keras-applications>=1.0.6->tensorflow->malaya) (2.8.0)\n",
+ "Requirement already satisfied, skipping upgrade: markdown>=2.6.8 in /usr/local/lib/python3.6/dist-packages (from tensorboard<1.15.0,>=1.14.0->tensorflow->malaya) (3.1.1)\n",
+ "Requirement already satisfied, skipping upgrade: setuptools>=41.0.0 in /usr/local/lib/python3.6/dist-packages (from tensorboard<1.15.0,>=1.14.0->tensorflow->malaya) (41.2.0)\n",
+ "Requirement already satisfied, skipping upgrade: werkzeug>=0.11.15 in /usr/local/lib/python3.6/dist-packages (from tensorboard<1.15.0,>=1.14.0->tensorflow->malaya) (0.15.6)\n",
+ "Building wheels for collected packages: ftfy, regex\n",
+ " Building wheel for ftfy (setup.py) ... \u001b[?25l\u001b[?25hdone\n",
+ " Created wheel for ftfy: filename=ftfy-5.6-cp36-none-any.whl size=44553 sha256=a67cd3a8dec5d9ab36f166a19c9d55a545fcf8376d2dd4be7f822f0a7bd433ec\n",
+ " Stored in directory: /root/.cache/pip/wheels/43/34/ce/cbb38d71543c408de56f3c5e26ce8ba495a0fa5a28eaaf1046\n",
+ " Building wheel for regex (setup.py) ... \u001b[?25l\u001b[?25hdone\n",
+ " Created wheel for regex: filename=regex-2019.8.19-cp36-cp36m-linux_x86_64.whl size=609237 sha256=98055d96bc0b1d7a1f7761963e06648e4aee0db39cb10bb5ccdb3a451bb447ce\n",
+ " Stored in directory: /root/.cache/pip/wheels/90/04/07/b5010fb816721eb3d6dd64ed5cc8111ca23f97fdab8619b5be\n",
+ "Successfully built ftfy regex\n",
+ "Installing collected packages: sentencepiece, PySastrawi, unidecode, bert-tensorflow, regex, dateparser, ftfy, malaya\n",
+ "Successfully installed PySastrawi-1.2.0 bert-tensorflow-1.0.1 dateparser-0.7.2 ftfy-5.6 malaya-2.7.7.0 regex-2019.8.19 sentencepiece-0.1.83 unidecode-1.1.1\n"
+ ],
+ "name": "stdout"
+ }
+ ]
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "_uuid": "d629ff2d2480ee46fbb7e2d37f6b5fab8052498a",
+ "_cell_guid": "79c7e3d0-c299-4dcb-8224-4455121ee9b0",
+ "trusted": true,
+ "id": "r3Uhw481luH7",
+ "colab_type": "code",
+ "colab": {
+ "base_uri": "https://localhost:8080/",
+ "height": 34
+ },
+ "outputId": "91eb9012-8b77-4c5b-8a3e-b3e11576028d"
+ },
+ "source": [
+ "import malaya\n",
+ "import re\n",
+ "from malaya.texts._text_functions import split_into_sentences\n",
+ "from malaya.texts import _regex\n",
+ "import numpy as np\n",
+ "import itertools\n",
+ "import tensorflow as tf\n",
+ "from tensorflow.keras.preprocessing.sequence import pad_sequences\n",
+ "\n",
+ "tokenizer = malaya.preprocessing._tokenizer\n",
+ "splitter = split_into_sentences"
+ ],
+ "execution_count": 2,
+ "outputs": [
+ {
+ "output_type": "stream",
+ "text": [
+ "not found any version, deleting previous version models..\n"
+ ],
+ "name": "stdout"
+ }
+ ]
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "trusted": true,
+ "id": "nrGvtUSEluIC",
+ "colab_type": "code",
+ "colab": {}
+ },
+ "source": [
+ "def is_number_regex(s):\n",
+ " if re.match(\"^\\d+?\\.\\d+?$\", s) is None:\n",
+ " return s.isdigit()\n",
+ " return True\n",
+ "\n",
+ "def preprocessing(w):\n",
+ " if is_number_regex(w):\n",
+ " return ''\n",
+ " elif re.match(_regex._money, w):\n",
+ " return ''\n",
+ " elif re.match(_regex._date, w):\n",
+ " return ''\n",
+ " elif re.match(_regex._expressions['email'], w):\n",
+ " return ''\n",
+ " elif re.match(_regex._expressions['url'], w):\n",
+ " return ''\n",
+ " else:\n",
+ " w = ''.join(''.join(s)[:2] for _, s in itertools.groupby(w))\n",
+ " return w"
+ ],
+ "execution_count": 0,
+ "outputs": []
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "trusted": true,
+ "id": "4uJQAfRNluIH",
+ "colab_type": "code",
+ "colab": {
+ "base_uri": "https://localhost:8080/",
+ "height": 289
+ },
+ "outputId": "c9a41167-3117-49dd-c3f7-95c580e783ec"
+ },
+ "source": [
+ "word2idx = {'PAD': 0,'UNK':1, '_ROOT': 2}\n",
+ "tag2idx = {'PAD': 0, '_': 1}\n",
+ "char2idx = {'PAD': 0,'UNK':1, '_ROOT': 2}\n",
+ "word_idx = 3\n",
+ "tag_idx = 2\n",
+ "char_idx = 3\n",
+ "\n",
+ "special_tokens = ['', '', '', '', '']\n",
+ "\n",
+ "for t in special_tokens:\n",
+ " word2idx[t] = word_idx\n",
+ " word_idx += 1\n",
+ " char2idx[t] = char_idx\n",
+ " char_idx += 1\n",
+ " \n",
+ "word2idx, char2idx"
+ ],
+ "execution_count": 4,
+ "outputs": [
+ {
+ "output_type": "execute_result",
+ "data": {
+ "text/plain": [
+ "({'': 5,\n",
+ " '': 7,\n",
+ " '': 4,\n",
+ " '': 3,\n",
+ " '': 6,\n",
+ " 'PAD': 0,\n",
+ " 'UNK': 1,\n",
+ " '_ROOT': 2},\n",
+ " {'': 5,\n",
+ " '': 7,\n",
+ " '': 4,\n",
+ " '': 3,\n",
+ " '': 6,\n",
+ " 'PAD': 0,\n",
+ " 'UNK': 1,\n",
+ " '_ROOT': 2})"
+ ]
+ },
+ "metadata": {
+ "tags": []
+ },
+ "execution_count": 4
+ }
+ ]
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "trusted": true,
+ "id": "HfJNrFwZluIO",
+ "colab_type": "code",
+ "colab": {}
+ },
+ "source": [
+ "PAD = \"_PAD\"\n",
+ "PAD_POS = \"_PAD_POS\"\n",
+ "PAD_TYPE = \"_\"\n",
+ "PAD_CHAR = \"_PAD_CHAR\"\n",
+ "ROOT = \"_ROOT\"\n",
+ "ROOT_POS = \"_ROOT_POS\"\n",
+ "ROOT_TYPE = \"_\"\n",
+ "ROOT_CHAR = \"_ROOT_CHAR\"\n",
+ "END = \"_END\"\n",
+ "END_POS = \"_END_POS\"\n",
+ "END_TYPE = \"_\"\n",
+ "END_CHAR = \"_END_CHAR\"\n",
+ "\n",
+ "def process_corpus(corpus, until = None):\n",
+ " global word2idx, tag2idx, char2idx, word_idx, tag_idx, char_idx\n",
+ " sentences, words, depends, labels, pos, chars = [], [], [], [], [], []\n",
+ " temp_sentence, temp_word, temp_depend, temp_label, temp_pos = [], [], [], [], []\n",
+ " first_time = True\n",
+ " for sentence in corpus:\n",
+ " try:\n",
+ " if len(sentence):\n",
+ " if sentence[0] == '#':\n",
+ " continue\n",
+ " if first_time:\n",
+ " print(sentence)\n",
+ " first_time = False\n",
+ " sentence = sentence.split('\\t')\n",
+ " for c in sentence[1]:\n",
+ " if c not in char2idx:\n",
+ " char2idx[c] = char_idx\n",
+ " char_idx += 1\n",
+ " if sentence[7] not in tag2idx:\n",
+ " tag2idx[sentence[7]] = tag_idx\n",
+ " tag_idx += 1\n",
+ " sentence[1] = preprocessing(sentence[1])\n",
+ " if sentence[1] not in word2idx:\n",
+ " word2idx[sentence[1]] = word_idx\n",
+ " word_idx += 1\n",
+ " temp_word.append(word2idx[sentence[1]])\n",
+ " temp_depend.append(int(sentence[6]))\n",
+ " temp_label.append(tag2idx[sentence[7]])\n",
+ " temp_sentence.append(sentence[1])\n",
+ " temp_pos.append(sentence[3])\n",
+ " else:\n",
+ " if len(temp_sentence) < 2 or len(temp_word) != len(temp_label):\n",
+ " temp_word = []\n",
+ " temp_depend = []\n",
+ " temp_label = []\n",
+ " temp_sentence = []\n",
+ " temp_pos = []\n",
+ " continue\n",
+ " words.append(temp_word)\n",
+ " depends.append(temp_depend)\n",
+ " labels.append(temp_label)\n",
+ " sentences.append( temp_sentence)\n",
+ " pos.append(temp_pos)\n",
+ " char_ = [[char2idx['_ROOT']]]\n",
+ " for w in temp_sentence:\n",
+ " if w in char2idx:\n",
+ " char_.append([char2idx[w]])\n",
+ " else:\n",
+ " char_.append([char2idx[c] for c in w])\n",
+ " chars.append(char_)\n",
+ " temp_word = []\n",
+ " temp_depend = []\n",
+ " temp_label = []\n",
+ " temp_sentence = []\n",
+ " temp_pos = []\n",
+ " except Exception as e:\n",
+ " print(e, sentence)\n",
+ " return sentences[:-1], words[:-1], depends[:-1], labels[:-1], pos[:-1], chars[:-1]"
+ ],
+ "execution_count": 0,
+ "outputs": []
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "trusted": true,
+ "id": "aLFEmcKPluIV",
+ "colab_type": "code",
+ "colab": {
+ "base_uri": "https://localhost:8080/",
+ "height": 68
+ },
+ "outputId": "8e31626b-91e1-4951-96c2-01341e18ddea"
+ },
+ "source": [
+ "with open('en_ewt-ud-dev.conllu') as fopen:\n",
+ " dev = fopen.read().split('\\n')\n",
+ "\n",
+ "sentences_dev, words_dev, depends_dev, labels_dev, _, _ = process_corpus(dev)"
+ ],
+ "execution_count": 6,
+ "outputs": [
+ {
+ "output_type": "stream",
+ "text": [
+ "1\tFrom\tfrom\tADP\tIN\t_\t3\tcase\t3:case\t_\n",
+ "invalid literal for int() with base 10: '_' ['10.1', 'has', 'have', 'VERB', 'VBZ', '_', '_', '_', '8:parataxis', 'CopyOf=-1']\n",
+ "invalid literal for int() with base 10: '_' ['21.1', 'has', 'have', 'VERB', 'VBZ', '_', '_', '_', '16:conj:and', 'CopyOf=-1']\n"
+ ],
+ "name": "stdout"
+ }
+ ]
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "trusted": true,
+ "id": "AHD5Kgh_luIZ",
+ "colab_type": "code",
+ "colab": {
+ "base_uri": "https://localhost:8080/",
+ "height": 71
+ },
+ "outputId": "dc86ed43-5ce2-4747-dc6a-76e30e9ab2c4"
+ },
+ "source": [
+ "with open('en_ewt-ud-test.conllu') as fopen:\n",
+ " test = fopen.read().split('\\n')\n",
+ "\n",
+ "sentences_test, words_test, depends_test, labels_test, _, _ = process_corpus(test)\n",
+ "sentences_test.extend(sentences_dev)\n",
+ "words_test.extend(words_dev)\n",
+ "depends_test.extend(depends_dev)\n",
+ "labels_test.extend(labels_dev)"
+ ],
+ "execution_count": 7,
+ "outputs": [
+ {
+ "output_type": "stream",
+ "text": [
+ "1\tWhat\twhat\tPRON\tWP\tPronType=Int\t0\troot\t0:root\t_\n",
+ "invalid literal for int() with base 10: '_' ['24.1', 'left', 'left', 'VERB', 'VBN', 'Tense=Past|VerbForm=Part', '_', '_', '6:parataxis', 'CopyOf=6']\n"
+ ],
+ "name": "stdout"
+ }
+ ]
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "trusted": true,
+ "id": "n39ztGEXluIe",
+ "colab_type": "code",
+ "colab": {
+ "base_uri": "https://localhost:8080/",
+ "height": 445
+ },
+ "outputId": "fd316d0f-b840-4ce3-fefe-edb15017dc93"
+ },
+ "source": [
+ "with open('en_ewt-ud-train.conllu') as fopen:\n",
+ " train = fopen.read().split('\\n')\n",
+ "\n",
+ "sentences_train, words_train, depends_train, labels_train, _, _ = process_corpus(train)"
+ ],
+ "execution_count": 8,
+ "outputs": [
+ {
+ "output_type": "stream",
+ "text": [
+ "1\tAl\tAl\tPROPN\tNNP\tNumber=Sing\t0\troot\t0:root\tSpaceAfter=No\n",
+ "invalid literal for int() with base 10: '_' ['8.1', 'reported', 'report', 'VERB', 'VBN', 'Tense=Past|VerbForm=Part|Voice=Pass', '_', '_', '5:conj:and', 'CopyOf=5']\n",
+ "invalid literal for int() with base 10: '_' ['22.1', 'used', 'use', 'VERB', 'VBN', 'Tense=Past|VerbForm=Part', '_', '_', '13:advcl:with|17:conj:and', 'CopyOf=17']\n",
+ "invalid literal for int() with base 10: '_' ['22.1', 'used', 'use', 'VERB', 'VBN', 'Tense=Past|VerbForm=Part', '_', '_', '13:advcl:with|17:conj:and', 'CopyOf=17']\n",
+ "invalid literal for int() with base 10: '_' ['11.1', 'called', 'call', 'VERB', 'VBN', 'Tense=Past|VerbForm=Part|Voice=Pass', '_', '_', '3:conj:and', 'CopyOf=3']\n",
+ "invalid literal for int() with base 10: '_' ['14.1', 'is', 'be', 'VERB', 'VBZ', '_', '_', '_', '1:conj:and', 'CopyOf=1']\n",
+ "invalid literal for int() with base 10: '_' ['20.1', 'reflect', 'reflect', 'VERB', 'VBP', 'Mood=Ind|Tense=Pres|VerbForm=Fin', '_', '_', '7:acl:relcl|9:conj', 'CopyOf=9']\n",
+ "invalid literal for int() with base 10: '_' ['21.1', 'recruited', 'recruit', 'VERB', 'VBD', 'Mood=Ind|Tense=Past|VerbForm=Fin', '_', '_', '9:conj:and', 'CopyOf=9']\n",
+ "invalid literal for int() with base 10: '_' ['9.1', 'wish', 'wish', 'VERB', 'VBP', 'Mood=Ind|Tense=Pres|VerbForm=Fin', '_', '_', '2:conj:and', 'CopyOf=2']\n",
+ "invalid literal for int() with base 10: '_' ['38.1', 'supplied', 'supply', 'VERB', 'VBN', 'Tense=Past|VerbForm=Part|Voice=Pass', '_', '_', '16:conj:and', 'CopyOf=16']\n",
+ "invalid literal for int() with base 10: '_' ['18.1', 'keep', 'keep', 'VERB', 'VB', 'Mood=Imp|VerbForm=Fin', '_', '_', '14:conj:and', 'CopyOf=14']\n",
+ "invalid literal for int() with base 10: '_' ['21.1', 'keep', 'keep', 'VERB', 'VB', 'Mood=Imp|VerbForm=Fin', '_', '_', '14:conj:and', 'CopyOf=14']\n",
+ "invalid literal for int() with base 10: '_' ['18.1', 'mean', 'mean', 'VERB', 'VB', 'VerbForm=Inf', '_', '_', '8:conj', 'CopyOf=8']\n",
+ "invalid literal for int() with base 10: '_' ['30.1', 'play', 'play', 'VERB', 'VBP', 'Mood=Ind|Tense=Pres|VerbForm=Fin', '_', '_', '18:acl:relcl|27:conj:but', 'CopyOf=27']\n",
+ "invalid literal for int() with base 10: '_' ['22.1', 'have', 'have', 'VERB', 'VBP', 'Mood=Ind|Tense=Pres|VerbForm=Fin', '_', '_', '17:conj', 'CopyOf=17']\n",
+ "invalid literal for int() with base 10: '_' ['27.1', 'have', 'have', 'VERB', 'VBP', 'Mood=Ind|Tense=Pres|VerbForm=Fin', '_', '_', '17:conj', 'CopyOf=17']\n",
+ "invalid literal for int() with base 10: '_' ['49.1', 'helped', 'help', 'VERB', 'VBD', '_', '_', '_', '38:conj:but', 'CopyOf=38']\n",
+ "invalid literal for int() with base 10: '_' ['7.1', 'found', 'find', 'VERB', 'VBD', 'Mood=Ind|Tense=Past|VerbForm=Fin', '_', '_', '3:conj', 'CopyOf=3']\n",
+ "invalid literal for int() with base 10: '_' ['10.1', 'excited', 'excited', 'ADJ', 'JJ', 'Degree=Pos', '_', '_', '4:advcl', 'CopyOf=4']\n",
+ "invalid literal for int() with base 10: '_' ['15.1', \"'s\", 'be', 'VERB', 'VBZ', '_', '_', '_', '2:conj:and', 'CopyOf=2']\n",
+ "invalid literal for int() with base 10: '_' ['25.1', 'took', 'take', 'VERB', 'VBD', 'Mood=Ind|Tense=Past|VerbForm=Fin', '_', '_', '17:conj:and', 'CopyOf=17']\n",
+ "invalid literal for int() with base 10: '_' ['10.1', 'loss', 'lose', 'VERB', 'VBD', 'Mood=Ind|Tense=Past|VerbForm=Fin', '_', '_', '3:conj:and', 'CopyOf=3']\n",
+ "invalid literal for int() with base 10: '_' ['11.1', 'leave', 'leave', 'VERB', 'VB', 'VerbForm=Inf', '_', '_', '7:parataxis', 'CopyOf=7']\n",
+ "invalid literal for int() with base 10: '_' ['24.1', 'charge', 'charge', 'VERB', 'VBP', 'Mood=Ind|Tense=Pres|VerbForm=Fin', '_', '_', '16:conj:and', 'CopyOf=16']\n"
+ ],
+ "name": "stdout"
+ }
+ ]
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "trusted": true,
+ "id": "RZ8MwuF9luIo",
+ "colab_type": "code",
+ "colab": {
+ "base_uri": "https://localhost:8080/",
+ "height": 34
+ },
+ "outputId": "28460c3d-86bf-4e15-ef4b-2217734596a2"
+ },
+ "source": [
+ "len(sentences_train), len(sentences_test)"
+ ],
+ "execution_count": 9,
+ "outputs": [
+ {
+ "output_type": "execute_result",
+ "data": {
+ "text/plain": [
+ "(12000, 3824)"
+ ]
+ },
+ "metadata": {
+ "tags": []
+ },
+ "execution_count": 9
+ }
+ ]
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "trusted": true,
+ "id": "Z7oKPBiMluIx",
+ "colab_type": "code",
+ "colab": {
+ "base_uri": "https://localhost:8080/",
+ "height": 34
+ },
+ "outputId": "09b7af1e-8ad9-4ede-eb27-720e891dfa5a"
+ },
+ "source": [
+ "idx2word = {v:k for k, v in word2idx.items()}\n",
+ "idx2tag = {v:k for k, v in tag2idx.items()}\n",
+ "len(idx2word)"
+ ],
+ "execution_count": 10,
+ "outputs": [
+ {
+ "output_type": "execute_result",
+ "data": {
+ "text/plain": [
+ "21974"
+ ]
+ },
+ "metadata": {
+ "tags": []
+ },
+ "execution_count": 10
+ }
+ ]
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "trusted": true,
+ "id": "EikVfMyQluI2",
+ "colab_type": "code",
+ "colab": {}
+ },
+ "source": [
+ "def generate_char_seq(batch, UNK = 2):\n",
+ " maxlen_c = max([len(k) for k in batch])\n",
+ " x = [[len(i) for i in k] for k in batch]\n",
+ " maxlen = max([j for i in x for j in i])\n",
+ " temp = np.zeros((len(batch),maxlen_c,maxlen),dtype=np.int32)\n",
+ " for i in range(len(batch)):\n",
+ " for k in range(len(batch[i])):\n",
+ " for no, c in enumerate(batch[i][k]):\n",
+ " temp[i,k,-1-no] = char2idx.get(c, UNK)\n",
+ " return temp"
+ ],
+ "execution_count": 0,
+ "outputs": []
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "trusted": true,
+ "id": "izRVCDaNluI5",
+ "colab_type": "code",
+ "colab": {
+ "base_uri": "https://localhost:8080/",
+ "height": 34
+ },
+ "outputId": "bd93b03a-0a4b-4eb7-ed14-44876e33ca0d"
+ },
+ "source": [
+ "generate_char_seq(sentences_train[:5]).shape"
+ ],
+ "execution_count": 12,
+ "outputs": [
+ {
+ "output_type": "execute_result",
+ "data": {
+ "text/plain": [
+ "(5, 36, 11)"
+ ]
+ },
+ "metadata": {
+ "tags": []
+ },
+ "execution_count": 12
+ }
+ ]
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "trusted": true,
+ "id": "gS8Wlel5luJD",
+ "colab_type": "code",
+ "colab": {
+ "base_uri": "https://localhost:8080/",
+ "height": 34
+ },
+ "outputId": "fb6df063-31c4-45e4-e590-8220685d4911"
+ },
+ "source": [
+ "pad_sequences(words_train[:5],padding='post').shape"
+ ],
+ "execution_count": 13,
+ "outputs": [
+ {
+ "output_type": "execute_result",
+ "data": {
+ "text/plain": [
+ "(5, 36)"
+ ]
+ },
+ "metadata": {
+ "tags": []
+ },
+ "execution_count": 13
+ }
+ ]
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "trusted": true,
+ "id": "2EKNPE4mluJH",
+ "colab_type": "code",
+ "colab": {}
+ },
+ "source": [
+ "train_X = words_train\n",
+ "train_Y = labels_train\n",
+ "train_depends = depends_train\n",
+ "train_char = sentences_train\n",
+ "\n",
+ "test_X = words_test\n",
+ "test_Y = labels_test\n",
+ "test_depends = depends_test\n",
+ "test_char = sentences_test"
+ ],
+ "execution_count": 0,
+ "outputs": []
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "trusted": true,
+ "id": "IechxNL3luJW",
+ "colab_type": "code",
+ "colab": {}
+ },
+ "source": [
+ "class BiAAttention:\n",
+ " def __init__(self, input_size_encoder, input_size_decoder, num_labels):\n",
+ " self.input_size_encoder = input_size_encoder\n",
+ " self.input_size_decoder = input_size_decoder\n",
+ " self.num_labels = num_labels\n",
+ " \n",
+ " self.W_d = tf.get_variable(\"W_d\", shape=[self.num_labels, self.input_size_decoder],\n",
+ " initializer=tf.contrib.layers.xavier_initializer())\n",
+ " self.W_e = tf.get_variable(\"W_e\", shape=[self.num_labels, self.input_size_encoder],\n",
+ " initializer=tf.contrib.layers.xavier_initializer())\n",
+ " self.U = tf.get_variable(\"U\", shape=[self.num_labels, self.input_size_decoder, self.input_size_encoder],\n",
+ " initializer=tf.contrib.layers.xavier_initializer())\n",
+ " \n",
+ " def forward(self, input_d, input_e, mask_d=None, mask_e=None):\n",
+ " batch = tf.shape(input_d)[0]\n",
+ " length_decoder = tf.shape(input_d)[1]\n",
+ " length_encoder = tf.shape(input_e)[1]\n",
+ " out_d = tf.expand_dims(tf.matmul(self.W_d, tf.transpose(input_d, [0, 2, 1])), 3)\n",
+ " out_e = tf.expand_dims(tf.matmul(self.W_e, tf.transpose(input_e, [0, 2, 1])), 2)\n",
+ " output = tf.matmul(tf.expand_dims(input_d, 1), self.U)\n",
+ " output = tf.matmul(output, tf.transpose(tf.expand_dims(input_e, 1), [0, 1, 3, 2]))\n",
+ " \n",
+ " output = output + out_d + out_e\n",
+ " \n",
+ " if mask_d is not None:\n",
+ " d = tf.expand_dims(tf.expand_dims(mask_d, 1), 3)\n",
+ " e = tf.expand_dims(tf.expand_dims(mask_e, 1), 2)\n",
+ " output = output * d * e\n",
+ " \n",
+ " return output\n",
+ "\n",
+ "class Model:\n",
+ " def __init__(\n",
+ " self,\n",
+ " dim_word,\n",
+ " dim_char,\n",
+ " dropout,\n",
+ " learning_rate,\n",
+ " hidden_size_char,\n",
+ " hidden_size_word,\n",
+ " num_layers\n",
+ " ):\n",
+ " def cells(size, reuse = False):\n",
+ " return tf.contrib.rnn.DropoutWrapper(\n",
+ " tf.nn.rnn_cell.LSTMCell(\n",
+ " size,\n",
+ " initializer = tf.orthogonal_initializer(),\n",
+ " reuse = reuse,\n",
+ " ),\n",
+ " output_keep_prob = dropout,\n",
+ " )\n",
+ " \n",
+ " def bahdanau(embedded, size):\n",
+ " attention_mechanism = tf.contrib.seq2seq.BahdanauAttention(\n",
+ " num_units = hidden_size_word, memory = embedded\n",
+ " )\n",
+ " return tf.contrib.seq2seq.AttentionWrapper(\n",
+ " cell = cells(hidden_size_word),\n",
+ " attention_mechanism = attention_mechanism,\n",
+ " attention_layer_size = hidden_size_word,\n",
+ " )\n",
+ " \n",
+ " self.word_ids = tf.placeholder(tf.int32, shape = [None, None])\n",
+ " self.char_ids = tf.placeholder(tf.int32, shape = [None, None, None])\n",
+ " self.labels = tf.placeholder(tf.int32, shape = [None, None])\n",
+ " self.depends = tf.placeholder(tf.int32, shape = [None, None])\n",
+ " self.maxlen = tf.shape(self.word_ids)[1]\n",
+ " self.lengths = tf.count_nonzero(self.word_ids, 1)\n",
+ " self.mask = tf.math.not_equal(self.word_ids, 0)\n",
+ " float_mask = tf.cast(self.mask, tf.float32)\n",
+ " \n",
+ " self.arc_h = tf.layers.Dense(hidden_size_word)\n",
+ " self.arc_c = tf.layers.Dense(hidden_size_word)\n",
+ " self.attention = BiAAttention(hidden_size_word, hidden_size_word, 1)\n",
+ "\n",
+ " self.word_embeddings = tf.Variable(\n",
+ " tf.truncated_normal(\n",
+ " [len(word2idx), dim_word], stddev = 1.0 / np.sqrt(dim_word)\n",
+ " )\n",
+ " )\n",
+ " self.char_embeddings = tf.Variable(\n",
+ " tf.truncated_normal(\n",
+ " [len(char2idx), dim_char], stddev = 1.0 / np.sqrt(dim_char)\n",
+ " )\n",
+ " )\n",
+ "\n",
+ " word_embedded = tf.nn.embedding_lookup(\n",
+ " self.word_embeddings, self.word_ids\n",
+ " )\n",
+ " char_embedded = tf.nn.embedding_lookup(\n",
+ " self.char_embeddings, self.char_ids\n",
+ " )\n",
+ " s = tf.shape(char_embedded)\n",
+ " char_embedded = tf.reshape(\n",
+ " char_embedded, shape = [s[0] * s[1], s[-2], dim_char]\n",
+ " )\n",
+ "\n",
+ " for n in range(num_layers):\n",
+ " (out_fw, out_bw), (\n",
+ " state_fw,\n",
+ " state_bw,\n",
+ " ) = tf.nn.bidirectional_dynamic_rnn(\n",
+ " cell_fw = cells(hidden_size_char),\n",
+ " cell_bw = cells(hidden_size_char),\n",
+ " inputs = char_embedded,\n",
+ " dtype = tf.float32,\n",
+ " scope = 'bidirectional_rnn_char_%d' % (n),\n",
+ " )\n",
+ " char_embedded = tf.concat((out_fw, out_bw), 2)\n",
+ " output = tf.reshape(\n",
+ " char_embedded[:, -1], shape = [s[0], s[1], 2 * hidden_size_char]\n",
+ " )\n",
+ " word_embedded = tf.concat([word_embedded, output], axis = -1)\n",
+ "\n",
+ " for n in range(num_layers):\n",
+ " (out_fw, out_bw), (\n",
+ " state_fw,\n",
+ " state_bw,\n",
+ " ) = tf.nn.bidirectional_dynamic_rnn(\n",
+ " cell_fw = bahdanau(word_embedded, hidden_size_word),\n",
+ " cell_bw = bahdanau(word_embedded, hidden_size_word),\n",
+ " inputs = word_embedded,\n",
+ " dtype = tf.float32,\n",
+ " scope = 'bidirectional_rnn_word_%d' % (n),\n",
+ " )\n",
+ " word_embedded = tf.concat((out_fw, out_bw), 2)\n",
+ "\n",
+ " logits = tf.layers.dense(word_embedded, len(idx2tag))\n",
+ " log_likelihood, transition_params = tf.contrib.crf.crf_log_likelihood(\n",
+ " logits, self.labels, self.lengths\n",
+ " )\n",
+ " arc_h = tf.nn.elu(self.arc_h(word_embedded))\n",
+ " arc_c = tf.nn.elu(self.arc_c(word_embedded))\n",
+ " out_arc = tf.squeeze(self.attention.forward(arc_h, arc_h, mask_d=float_mask, mask_e=float_mask), axis = 1)\n",
+ " \n",
+ " batch = tf.shape(out_arc)[0]\n",
+ " batch_index = tf.range(0, batch)\n",
+ " max_len = tf.shape(out_arc)[1]\n",
+ " sec_max_len = tf.shape(out_arc)[2]\n",
+ " \n",
+ " minus_inf = -1e8\n",
+ " minus_mask = (1 - float_mask) * minus_inf\n",
+ " out_arc = out_arc + tf.expand_dims(minus_mask, axis = 2) + tf.expand_dims(minus_mask, axis = 1)\n",
+ " loss_arc = tf.nn.log_softmax(out_arc, dim=1)\n",
+ " loss_arc = loss_arc * tf.expand_dims(float_mask, axis = 2) * tf.expand_dims(float_mask, axis = 1)\n",
+ " num = tf.reduce_sum(float_mask) - tf.cast(batch, tf.float32)\n",
+ " \n",
+ " child_index = tf.tile(tf.expand_dims(tf.range(0, max_len), 1), [1, batch])\n",
+ " t = tf.transpose(self.depends)\n",
+ " broadcasted = tf.broadcast_to(batch_index, tf.shape(t))\n",
+ " concatenated = tf.transpose(tf.concat([tf.expand_dims(broadcasted, axis = 0),\n",
+ " tf.expand_dims(t, axis = 0),\n",
+ " tf.expand_dims(child_index, axis = 0)], axis = 0))\n",
+ " loss_arc = tf.gather_nd(loss_arc, concatenated)\n",
+ " loss_arc = tf.transpose(loss_arc, [1, 0])[1:]\n",
+ " \n",
+ " loss_arc = tf.reduce_sum(-loss_arc) / num\n",
+ " \n",
+ " self.cost = tf.reduce_mean(-log_likelihood) + loss_arc\n",
+ " \n",
+ " self.optimizer = tf.train.AdamOptimizer(\n",
+ " learning_rate = learning_rate\n",
+ " ).minimize(self.cost)\n",
+ " \n",
+ " mask = tf.sequence_mask(self.lengths, maxlen = self.maxlen)\n",
+ " \n",
+ " self.tags_seq, _ = tf.contrib.crf.crf_decode(\n",
+ " logits, transition_params, self.lengths\n",
+ " )\n",
+ " \n",
+ " out_arc = out_arc + tf.linalg.diag(tf.fill([max_len], -np.inf))\n",
+ " minus_mask = tf.expand_dims(tf.cast(1.0 - float_mask, tf.bool), axis = 2)\n",
+ " minus_mask = tf.tile(minus_mask, [1, 1, sec_max_len])\n",
+ " out_arc = tf.where(minus_mask, tf.fill(tf.shape(out_arc), -np.inf), out_arc)\n",
+ " self.heads = tf.argmax(out_arc, axis = 1)\n",
+ " \n",
+ " self.prediction = tf.boolean_mask(self.tags_seq, mask)\n",
+ " mask_label = tf.boolean_mask(self.labels, mask)\n",
+ " correct_pred = tf.equal(self.prediction, mask_label)\n",
+ " correct_index = tf.cast(correct_pred, tf.float32)\n",
+ " self.accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32))\n",
+ " \n",
+ " self.prediction = tf.cast(tf.boolean_mask(self.heads, mask), tf.int32)\n",
+ " mask_label = tf.boolean_mask(self.depends, mask)\n",
+ " correct_pred = tf.equal(self.prediction, mask_label)\n",
+ " correct_index = tf.cast(correct_pred, tf.float32)\n",
+ " self.accuracy_depends = tf.reduce_mean(tf.cast(correct_pred, tf.float32))"
+ ],
+ "execution_count": 0,
+ "outputs": []
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "trusted": true,
+ "id": "ORr-2ouXluJl",
+ "colab_type": "code",
+ "colab": {
+ "base_uri": "https://localhost:8080/",
+ "height": 496
+ },
+ "outputId": "5c0c29e2-2502-49ce-f641-8973f436b29e"
+ },
+ "source": [
+ "tf.reset_default_graph()\n",
+ "sess = tf.InteractiveSession()\n",
+ "\n",
+ "dim_word = 128\n",
+ "dim_char = 256\n",
+ "dropout = 1.0\n",
+ "learning_rate = 1e-3\n",
+ "hidden_size_char = 128\n",
+ "hidden_size_word = 128\n",
+ "num_layers = 2\n",
+ "\n",
+ "model = Model(dim_word,dim_char,dropout,learning_rate,hidden_size_char,hidden_size_word,num_layers)\n",
+ "sess.run(tf.global_variables_initializer())"
+ ],
+ "execution_count": 16,
+ "outputs": [
+ {
+ "output_type": "stream",
+ "text": [
+ "WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/tensorflow/python/util/deprecation.py:507: calling count_nonzero (from tensorflow.python.ops.math_ops) with axis is deprecated and will be removed in a future version.\n",
+ "Instructions for updating:\n",
+ "reduction_indices is deprecated, use axis instead\n",
+ "WARNING:tensorflow:From :48: LSTMCell.__init__ (from tensorflow.python.ops.rnn_cell_impl) is deprecated and will be removed in a future version.\n",
+ "Instructions for updating:\n",
+ "This class is equivalent as tf.keras.layers.LSTMCell, and will be replaced by that in Tensorflow 2.0.\n",
+ "WARNING:tensorflow:From :107: bidirectional_dynamic_rnn (from tensorflow.python.ops.rnn) is deprecated and will be removed in a future version.\n",
+ "Instructions for updating:\n",
+ "Please use `keras.layers.Bidirectional(keras.layers.RNN(cell))`, which is equivalent to this API\n",
+ "WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/tensorflow/python/ops/rnn.py:464: dynamic_rnn (from tensorflow.python.ops.rnn) is deprecated and will be removed in a future version.\n",
+ "Instructions for updating:\n",
+ "Please use `keras.layers.RNN(cell)`, which is equivalent to this API\n",
+ "WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/tensorflow/python/ops/rnn_cell_impl.py:961: calling Zeros.__init__ (from tensorflow.python.ops.init_ops) with dtype is deprecated and will be removed in a future version.\n",
+ "Instructions for updating:\n",
+ "Call initializer instance with the dtype argument instead of passing it to the constructor\n",
+ "WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/tensorflow/python/ops/init_ops.py:1251: calling VarianceScaling.__init__ (from tensorflow.python.ops.init_ops) with dtype is deprecated and will be removed in a future version.\n",
+ "Instructions for updating:\n",
+ "Call initializer instance with the dtype argument instead of passing it to the constructor\n",
+ "WARNING:tensorflow:From :128: dense (from tensorflow.python.layers.core) is deprecated and will be removed in a future version.\n",
+ "Instructions for updating:\n",
+ "Use keras.layers.dense instead.\n",
+ "WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/tensorflow/contrib/crf/python/ops/crf.py:99: add_dispatch_support..wrapper (from tensorflow.python.ops.array_ops) is deprecated and will be removed in a future version.\n",
+ "Instructions for updating:\n",
+ "Use tf.where in 2.0, which has the same broadcast rule as np.where\n",
+ "WARNING:tensorflow:From :144: calling log_softmax (from tensorflow.python.ops.nn_ops) with dim is deprecated and will be removed in a future version.\n",
+ "Instructions for updating:\n",
+ "dim is deprecated, use axis instead\n"
+ ],
+ "name": "stdout"
+ }
+ ]
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "trusted": true,
+ "id": "4zkpDRaDluJq",
+ "colab_type": "code",
+ "colab": {}
+ },
+ "source": [
+ "batch_x = train_X[:5]\n",
+ "batch_x = pad_sequences(batch_x,padding='post')\n",
+ "batch_char = train_char[:5]\n",
+ "batch_char = generate_char_seq(batch_char)\n",
+ "batch_y = train_Y[:5]\n",
+ "batch_y = pad_sequences(batch_y,padding='post')\n",
+ "batch_depends = train_depends[:5]\n",
+ "batch_depends = pad_sequences(batch_depends,padding='post')"
+ ],
+ "execution_count": 0,
+ "outputs": []
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "trusted": true,
+ "id": "wL67WIkMluJz",
+ "colab_type": "code",
+ "colab": {
+ "base_uri": "https://localhost:8080/",
+ "height": 34
+ },
+ "outputId": "5a4c71b2-49a3-439c-ed41-be31555eefb3"
+ },
+ "source": [
+ "sess.run([model.accuracy, model.accuracy_depends, model.cost],\n",
+ " feed_dict = {model.word_ids: batch_x,\n",
+ " model.char_ids: batch_char,\n",
+ " model.labels: batch_y,\n",
+ " model.depends: batch_depends})"
+ ],
+ "execution_count": 18,
+ "outputs": [
+ {
+ "output_type": "execute_result",
+ "data": {
+ "text/plain": [
+ "[0.0, 0.094827585, 95.5533]"
+ ]
+ },
+ "metadata": {
+ "tags": []
+ },
+ "execution_count": 18
+ }
+ ]
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "trusted": true,
+ "id": "I0lyT0z-luJ3",
+ "colab_type": "code",
+ "colab": {
+ "base_uri": "https://localhost:8080/",
+ "height": 1000
+ },
+ "outputId": "4bbc5a14-c8ac-4123-a27b-e8a4e4c852a1"
+ },
+ "source": [
+ "from tqdm import tqdm\n",
+ "\n",
+ "batch_size = 32\n",
+ "epoch = 15\n",
+ "\n",
+ "for e in range(epoch):\n",
+ " train_acc, train_loss = [], []\n",
+ " test_acc, test_loss = [], []\n",
+ " train_acc_depends, test_acc_depends = [], []\n",
+ " \n",
+ " pbar = tqdm(\n",
+ " range(0, len(train_X), batch_size), desc = 'train minibatch loop'\n",
+ " )\n",
+ " for i in pbar:\n",
+ " index = min(i + batch_size, len(train_X))\n",
+ " batch_x = train_X[i: index]\n",
+ " batch_x = pad_sequences(batch_x,padding='post')\n",
+ " batch_char = train_char[i: index]\n",
+ " batch_char = generate_char_seq(batch_char)\n",
+ " batch_y = train_Y[i: index]\n",
+ " batch_y = pad_sequences(batch_y,padding='post')\n",
+ " batch_depends = train_depends[i: index]\n",
+ " batch_depends = pad_sequences(batch_depends,padding='post')\n",
+ " \n",
+ " acc_depends, acc, cost, _ = sess.run(\n",
+ " [model.accuracy_depends, model.accuracy, model.cost, model.optimizer],\n",
+ " feed_dict = {\n",
+ " model.word_ids: batch_x,\n",
+ " model.char_ids: batch_char,\n",
+ " model.labels: batch_y,\n",
+ " model.depends: batch_depends\n",
+ " },\n",
+ " )\n",
+ " train_loss.append(cost)\n",
+ " train_acc.append(acc)\n",
+ " train_acc_depends.append(acc_depends)\n",
+ " pbar.set_postfix(cost = cost, accuracy = acc, accuracy_depends = acc_depends)\n",
+ " \n",
+ " pbar = tqdm(\n",
+ " range(0, len(test_X), batch_size), desc = 'test minibatch loop'\n",
+ " )\n",
+ " for i in pbar:\n",
+ " index = min(i + batch_size, len(test_X))\n",
+ " batch_x = test_X[i: index]\n",
+ " batch_x = pad_sequences(batch_x,padding='post')\n",
+ " batch_char = test_char[i: index]\n",
+ " batch_char = generate_char_seq(batch_char)\n",
+ " batch_y = test_Y[i: index]\n",
+ " batch_y = pad_sequences(batch_y,padding='post')\n",
+ " batch_depends = test_depends[i: index]\n",
+ " batch_depends = pad_sequences(batch_depends,padding='post')\n",
+ " \n",
+ " acc_depends, acc, cost = sess.run(\n",
+ " [model.accuracy_depends, model.accuracy, model.cost],\n",
+ " feed_dict = {\n",
+ " model.word_ids: batch_x,\n",
+ " model.char_ids: batch_char,\n",
+ " model.labels: batch_y,\n",
+ " model.depends: batch_depends\n",
+ " },\n",
+ " )\n",
+ " test_loss.append(cost)\n",
+ " test_acc.append(acc)\n",
+ " test_acc_depends.append(acc_depends)\n",
+ " pbar.set_postfix(cost = cost, accuracy = acc, accuracy_depends = acc_depends)\n",
+ " \n",
+ " \n",
+ " print(\n",
+ " 'epoch: %d, training loss: %f, training acc: %f, training depends: %f, valid loss: %f, valid acc: %f, valid depends: %f\\n'\n",
+ " % (e, np.mean(train_loss), \n",
+ " np.mean(train_acc), \n",
+ " np.mean(train_acc_depends), \n",
+ " np.mean(test_loss), \n",
+ " np.mean(test_acc), \n",
+ " np.mean(test_acc_depends)\n",
+ " ))\n",
+ " "
+ ],
+ "execution_count": 19,
+ "outputs": [
+ {
+ "output_type": "stream",
+ "text": [
+ "train minibatch loop: 100%|██████████| 375/375 [02:50<00:00, 1.73it/s, accuracy=0.803, accuracy_depends=0.559, cost=16.9]\n",
+ "test minibatch loop: 100%|██████████| 120/120 [00:21<00:00, 6.03it/s, accuracy=0.862, accuracy_depends=0.636, cost=10.2]\n",
+ "train minibatch loop: 0%| | 0/375 [00:00, ?it/s]"
+ ],
+ "name": "stderr"
+ },
+ {
+ "output_type": "stream",
+ "text": [
+ "epoch: 0, training loss: 32.401157, training acc: 0.499893, training depends: 0.333395, valid loss: 12.616477, valid acc: 0.752286, valid depends: 0.550964\n",
+ "\n"
+ ],
+ "name": "stdout"
+ },
+ {
+ "output_type": "stream",
+ "text": [
+ "train minibatch loop: 100%|██████████| 375/375 [02:46<00:00, 1.74it/s, accuracy=0.863, accuracy_depends=0.719, cost=10.5]\n",
+ "test minibatch loop: 100%|██████████| 120/120 [00:21<00:00, 6.09it/s, accuracy=0.891, accuracy_depends=0.704, cost=7.06]\n",
+ "train minibatch loop: 0%| | 0/375 [00:00, ?it/s]"
+ ],
+ "name": "stderr"
+ },
+ {
+ "output_type": "stream",
+ "text": [
+ "epoch: 1, training loss: 11.654610, training acc: 0.817275, training depends: 0.613238, valid loss: 8.984379, valid acc: 0.822647, valid depends: 0.640479\n",
+ "\n"
+ ],
+ "name": "stdout"
+ },
+ {
+ "output_type": "stream",
+ "text": [
+ "train minibatch loop: 100%|██████████| 375/375 [02:49<00:00, 1.71it/s, accuracy=0.903, accuracy_depends=0.752, cost=8.08]\n",
+ "test minibatch loop: 100%|██████████| 120/120 [00:21<00:00, 5.75it/s, accuracy=0.899, accuracy_depends=0.733, cost=6.17]\n",
+ "train minibatch loop: 0%| | 0/375 [00:00, ?it/s]"
+ ],
+ "name": "stderr"
+ },
+ {
+ "output_type": "stream",
+ "text": [
+ "epoch: 2, training loss: 7.874226, training acc: 0.877587, training depends: 0.682966, valid loss: 8.313724, valid acc: 0.838643, valid depends: 0.667915\n",
+ "\n"
+ ],
+ "name": "stdout"
+ },
+ {
+ "output_type": "stream",
+ "text": [
+ "train minibatch loop: 100%|██████████| 375/375 [02:46<00:00, 1.71it/s, accuracy=0.92, accuracy_depends=0.768, cost=6.84]\n",
+ "test minibatch loop: 100%|██████████| 120/120 [00:21<00:00, 6.06it/s, accuracy=0.903, accuracy_depends=0.777, cost=5.96]\n",
+ "train minibatch loop: 0%| | 0/375 [00:00, ?it/s]"
+ ],
+ "name": "stderr"
+ },
+ {
+ "output_type": "stream",
+ "text": [
+ "epoch: 3, training loss: 6.043023, training acc: 0.908013, training depends: 0.717450, valid loss: 8.258953, valid acc: 0.842917, valid depends: 0.681678\n",
+ "\n"
+ ],
+ "name": "stdout"
+ },
+ {
+ "output_type": "stream",
+ "text": [
+ "train minibatch loop: 100%|██████████| 375/375 [02:44<00:00, 1.75it/s, accuracy=0.939, accuracy_depends=0.799, cost=5.56]\n",
+ "test minibatch loop: 100%|██████████| 120/120 [00:21<00:00, 6.17it/s, accuracy=0.919, accuracy_depends=0.798, cost=5.71]\n",
+ "train minibatch loop: 0%| | 0/375 [00:00, ?it/s]"
+ ],
+ "name": "stderr"
+ },
+ {
+ "output_type": "stream",
+ "text": [
+ "epoch: 4, training loss: 4.851332, training acc: 0.926946, training depends: 0.738619, valid loss: 8.510642, valid acc: 0.848766, valid depends: 0.691324\n",
+ "\n"
+ ],
+ "name": "stdout"
+ },
+ {
+ "output_type": "stream",
+ "text": [
+ "train minibatch loop: 100%|██████████| 375/375 [02:44<00:00, 1.75it/s, accuracy=0.947, accuracy_depends=0.815, cost=4.41]\n",
+ "test minibatch loop: 100%|██████████| 120/120 [00:21<00:00, 6.02it/s, accuracy=0.919, accuracy_depends=0.81, cost=5.32]\n",
+ "train minibatch loop: 0%| | 0/375 [00:00, ?it/s]"
+ ],
+ "name": "stderr"
+ },
+ {
+ "output_type": "stream",
+ "text": [
+ "epoch: 5, training loss: 3.947296, training acc: 0.941130, training depends: 0.751846, valid loss: 8.953300, valid acc: 0.848426, valid depends: 0.688125\n",
+ "\n"
+ ],
+ "name": "stdout"
+ },
+ {
+ "output_type": "stream",
+ "text": [
+ "train minibatch loop: 100%|██████████| 375/375 [02:44<00:00, 1.77it/s, accuracy=0.955, accuracy_depends=0.822, cost=3.45]\n",
+ "test minibatch loop: 100%|██████████| 120/120 [00:21<00:00, 6.22it/s, accuracy=0.911, accuracy_depends=0.798, cost=6.26]\n",
+ "train minibatch loop: 0%| | 0/375 [00:00, ?it/s]"
+ ],
+ "name": "stderr"
+ },
+ {
+ "output_type": "stream",
+ "text": [
+ "epoch: 6, training loss: 3.262760, training acc: 0.952513, training depends: 0.763379, valid loss: 9.313190, valid acc: 0.850958, valid depends: 0.693280\n",
+ "\n"
+ ],
+ "name": "stdout"
+ },
+ {
+ "output_type": "stream",
+ "text": [
+ "train minibatch loop: 100%|██████████| 375/375 [02:43<00:00, 1.76it/s, accuracy=0.967, accuracy_depends=0.827, cost=3.08]\n",
+ "test minibatch loop: 100%|██████████| 120/120 [00:21<00:00, 5.85it/s, accuracy=0.907, accuracy_depends=0.789, cost=6.98]\n",
+ "train minibatch loop: 0%| | 0/375 [00:00, ?it/s]"
+ ],
+ "name": "stderr"
+ },
+ {
+ "output_type": "stream",
+ "text": [
+ "epoch: 7, training loss: 2.569048, training acc: 0.963230, training depends: 0.775451, valid loss: 9.958177, valid acc: 0.849756, valid depends: 0.694797\n",
+ "\n"
+ ],
+ "name": "stdout"
+ },
+ {
+ "output_type": "stream",
+ "text": [
+ "train minibatch loop: 100%|██████████| 375/375 [02:44<00:00, 1.76it/s, accuracy=0.973, accuracy_depends=0.834, cost=2.14]\n",
+ "test minibatch loop: 100%|██████████| 120/120 [00:21<00:00, 6.07it/s, accuracy=0.919, accuracy_depends=0.789, cost=6.42]\n",
+ "train minibatch loop: 0%| | 0/375 [00:00, ?it/s]"
+ ],
+ "name": "stderr"
+ },
+ {
+ "output_type": "stream",
+ "text": [
+ "epoch: 8, training loss: 1.959417, training acc: 0.972763, training depends: 0.788287, valid loss: 10.350948, valid acc: 0.852834, valid depends: 0.695817\n",
+ "\n"
+ ],
+ "name": "stdout"
+ },
+ {
+ "output_type": "stream",
+ "text": [
+ "train minibatch loop: 100%|██████████| 375/375 [02:43<00:00, 1.79it/s, accuracy=0.971, accuracy_depends=0.836, cost=2.1]\n",
+ "test minibatch loop: 100%|██████████| 120/120 [00:21<00:00, 6.14it/s, accuracy=0.879, accuracy_depends=0.773, cost=8.35]\n",
+ "train minibatch loop: 0%| | 0/375 [00:00, ?it/s]"
+ ],
+ "name": "stderr"
+ },
+ {
+ "output_type": "stream",
+ "text": [
+ "epoch: 9, training loss: 1.574149, training acc: 0.979302, training depends: 0.795839, valid loss: 11.177638, valid acc: 0.852465, valid depends: 0.702183\n",
+ "\n"
+ ],
+ "name": "stdout"
+ },
+ {
+ "output_type": "stream",
+ "text": [
+ "train minibatch loop: 100%|██████████| 375/375 [02:43<00:00, 1.75it/s, accuracy=0.987, accuracy_depends=0.834, cost=1.23]\n",
+ "test minibatch loop: 100%|██████████| 120/120 [00:20<00:00, 6.26it/s, accuracy=0.915, accuracy_depends=0.794, cost=8.7]\n",
+ "train minibatch loop: 0%| | 0/375 [00:00, ?it/s]"
+ ],
+ "name": "stderr"
+ },
+ {
+ "output_type": "stream",
+ "text": [
+ "epoch: 10, training loss: 1.237997, training acc: 0.984891, training depends: 0.804241, valid loss: 11.869824, valid acc: 0.848389, valid depends: 0.701258\n",
+ "\n"
+ ],
+ "name": "stdout"
+ },
+ {
+ "output_type": "stream",
+ "text": [
+ "train minibatch loop: 100%|██████████| 375/375 [02:43<00:00, 1.78it/s, accuracy=0.988, accuracy_depends=0.85, cost=1.12]\n",
+ "test minibatch loop: 100%|██████████| 120/120 [00:21<00:00, 6.25it/s, accuracy=0.874, accuracy_depends=0.798, cost=8.92]\n",
+ "train minibatch loop: 0%| | 0/375 [00:00, ?it/s]"
+ ],
+ "name": "stderr"
+ },
+ {
+ "output_type": "stream",
+ "text": [
+ "epoch: 11, training loss: 1.054887, training acc: 0.987574, training depends: 0.808066, valid loss: 11.984483, valid acc: 0.853984, valid depends: 0.705631\n",
+ "\n"
+ ],
+ "name": "stdout"
+ },
+ {
+ "output_type": "stream",
+ "text": [
+ "train minibatch loop: 100%|██████████| 375/375 [02:43<00:00, 1.76it/s, accuracy=0.992, accuracy_depends=0.846, cost=1.08]\n",
+ "test minibatch loop: 100%|██████████| 120/120 [00:21<00:00, 5.93it/s, accuracy=0.915, accuracy_depends=0.789, cost=7.69]\n",
+ "train minibatch loop: 0%| | 0/375 [00:00, ?it/s]"
+ ],
+ "name": "stderr"
+ },
+ {
+ "output_type": "stream",
+ "text": [
+ "epoch: 12, training loss: 0.890403, training acc: 0.990091, training depends: 0.816699, valid loss: 12.758488, valid acc: 0.852551, valid depends: 0.706672\n",
+ "\n"
+ ],
+ "name": "stdout"
+ },
+ {
+ "output_type": "stream",
+ "text": [
+ "train minibatch loop: 100%|██████████| 375/375 [02:43<00:00, 1.72it/s, accuracy=0.992, accuracy_depends=0.847, cost=0.864]\n",
+ "test minibatch loop: 100%|██████████| 120/120 [00:20<00:00, 6.14it/s, accuracy=0.907, accuracy_depends=0.781, cost=8.05]\n",
+ "train minibatch loop: 0%| | 0/375 [00:00, ?it/s]"
+ ],
+ "name": "stderr"
+ },
+ {
+ "output_type": "stream",
+ "text": [
+ "epoch: 13, training loss: 0.821644, training acc: 0.990760, training depends: 0.821848, valid loss: 12.964406, valid acc: 0.851949, valid depends: 0.706212\n",
+ "\n"
+ ],
+ "name": "stdout"
+ },
+ {
+ "output_type": "stream",
+ "text": [
+ "train minibatch loop: 100%|██████████| 375/375 [02:43<00:00, 1.72it/s, accuracy=0.995, accuracy_depends=0.852, cost=0.71]\n",
+ "test minibatch loop: 100%|██████████| 120/120 [00:20<00:00, 6.39it/s, accuracy=0.895, accuracy_depends=0.789, cost=10]"
+ ],
+ "name": "stderr"
+ },
+ {
+ "output_type": "stream",
+ "text": [
+ "epoch: 14, training loss: 0.734888, training acc: 0.992246, training depends: 0.824531, valid loss: 13.220814, valid acc: 0.851365, valid depends: 0.708206\n",
+ "\n"
+ ],
+ "name": "stdout"
+ },
+ {
+ "output_type": "stream",
+ "text": [
+ "\n"
+ ],
+ "name": "stderr"
+ }
+ ]
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "trusted": true,
+ "id": "OA5fV4xxluJ6",
+ "colab_type": "code",
+ "colab": {}
+ },
+ "source": [
+ "def evaluate(heads_pred, types_pred, heads, types, lengths,\n",
+ " symbolic_root=False, symbolic_end=False):\n",
+ " batch_size, _ = words.shape\n",
+ " ucorr = 0.\n",
+ " lcorr = 0.\n",
+ " total = 0.\n",
+ " ucomplete_match = 0.\n",
+ " lcomplete_match = 0.\n",
+ "\n",
+ " corr_root = 0.\n",
+ " total_root = 0.\n",
+ " start = 1 if symbolic_root else 0\n",
+ " end = 1 if symbolic_end else 0\n",
+ " for i in range(batch_size):\n",
+ " ucm = 1.\n",
+ " lcm = 1.\n",
+ " for j in range(start, lengths[i] - end):\n",
+ "\n",
+ " total += 1\n",
+ " if heads[i, j] == heads_pred[i, j]:\n",
+ " ucorr += 1\n",
+ " if types[i, j] == types_pred[i, j]:\n",
+ " lcorr += 1\n",
+ " else:\n",
+ " lcm = 0\n",
+ " else:\n",
+ " ucm = 0\n",
+ " lcm = 0\n",
+ "\n",
+ " if heads[i, j] == 0:\n",
+ " total_root += 1\n",
+ " corr_root += 1 if heads_pred[i, j] == 0 else 0\n",
+ "\n",
+ " ucomplete_match += ucm\n",
+ " lcomplete_match += lcm\n",
+ "\n",
+ " return (ucorr, lcorr, total, ucomplete_match, lcomplete_match), \\\n",
+ " (corr_root, total_root), batch_size"
+ ],
+ "execution_count": 0,
+ "outputs": []
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "trusted": true,
+ "id": "9fBypTzSluKC",
+ "colab_type": "code",
+ "colab": {
+ "base_uri": "https://localhost:8080/",
+ "height": 170
+ },
+ "outputId": "2c26e01b-a6a4-4871-9f38-812df371c65f"
+ },
+ "source": [
+ "tags_seq, heads = sess.run(\n",
+ " [model.tags_seq, model.heads],\n",
+ " feed_dict = {\n",
+ " model.word_ids: batch_x,\n",
+ " model.char_ids: batch_char\n",
+ " },\n",
+ ")\n",
+ "tags_seq[0], heads[0], batch_depends[0]"
+ ],
+ "execution_count": 21,
+ "outputs": [
+ {
+ "output_type": "execute_result",
+ "data": {
+ "text/plain": [
+ "(array([16, 6, 22, 26, 6, 18, 16, 5, 3, 13, 10, 11, 6, 12, 13, 10, 16,\n",
+ " 7, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,\n",
+ " 0], dtype=int32),\n",
+ " array([ 2, 8, 5, 5, 2, 8, 8, 0, 11, 11, 8, 14, 14, 8, 16, 14, 14,\n",
+ " 8, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,\n",
+ " 0]),\n",
+ " array([ 2, 8, 5, 5, 2, 8, 8, 0, 11, 11, 8, 14, 14, 8, 16, 14, 14,\n",
+ " 8, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,\n",
+ " 0], dtype=int32))"
+ ]
+ },
+ "metadata": {
+ "tags": []
+ },
+ "execution_count": 21
+ }
+ ]
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "trusted": true,
+ "id": "Afwz-4bvluKM",
+ "colab_type": "code",
+ "colab": {}
+ },
+ "source": [
+ "def evaluate(heads_pred, types_pred, heads, types, lengths,\n",
+ " symbolic_root=False, symbolic_end=False):\n",
+ " batch_size, _ = heads_pred.shape\n",
+ " ucorr = 0.\n",
+ " lcorr = 0.\n",
+ " total = 0.\n",
+ " ucomplete_match = 0.\n",
+ " lcomplete_match = 0.\n",
+ "\n",
+ " corr_root = 0.\n",
+ " total_root = 0.\n",
+ " start = 1 if symbolic_root else 0\n",
+ " end = 1 if symbolic_end else 0\n",
+ " for i in range(batch_size):\n",
+ " ucm = 1.\n",
+ " lcm = 1.\n",
+ " for j in range(start, lengths[i] - end):\n",
+ "\n",
+ " total += 1\n",
+ " if heads[i, j] == heads_pred[i, j]:\n",
+ " ucorr += 1\n",
+ " if types[i, j] == types_pred[i, j]:\n",
+ " lcorr += 1\n",
+ " else:\n",
+ " lcm = 0\n",
+ " else:\n",
+ " ucm = 0\n",
+ " lcm = 0\n",
+ "\n",
+ " if heads[i, j] == 0:\n",
+ " total_root += 1\n",
+ " corr_root += 1 if heads_pred[i, j] == 0 else 0\n",
+ "\n",
+ " ucomplete_match += ucm\n",
+ " lcomplete_match += lcm\n",
+ " \n",
+ " return ucorr / total, lcorr / total, corr_root / total_root"
+ ],
+ "execution_count": 0,
+ "outputs": []
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "trusted": true,
+ "id": "L_YC7rYLluKU",
+ "colab_type": "code",
+ "colab": {
+ "base_uri": "https://localhost:8080/",
+ "height": 34
+ },
+ "outputId": "cea2aa49-fec2-4adb-fdea-91f01f0fa237"
+ },
+ "source": [
+ "arc_accuracy, type_accuracy, root_accuracy = evaluate(heads, tags_seq, batch_depends, batch_y, \n",
+ " np.count_nonzero(batch_x, axis = 1))\n",
+ "arc_accuracy, type_accuracy, root_accuracy"
+ ],
+ "execution_count": 23,
+ "outputs": [
+ {
+ "output_type": "execute_result",
+ "data": {
+ "text/plain": [
+ "(0.7894736842105263, 0.7611336032388664, 0.875)"
+ ]
+ },
+ "metadata": {
+ "tags": []
+ },
+ "execution_count": 23
+ }
+ ]
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "trusted": true,
+ "id": "OM8uPNqFluKY",
+ "colab_type": "code",
+ "colab": {
+ "base_uri": "https://localhost:8080/",
+ "height": 34
+ },
+ "outputId": "dbc9b008-9109-45c1-db67-9e980743b303"
+ },
+ "source": [
+ "arcs, types, roots = [], [], []\n",
+ "\n",
+ "pbar = tqdm(\n",
+ " range(0, len(test_X), batch_size), desc = 'test minibatch loop'\n",
+ ")\n",
+ "for i in pbar:\n",
+ " index = min(i + batch_size, len(test_X))\n",
+ " batch_x = test_X[i: index]\n",
+ " batch_x = pad_sequences(batch_x,padding='post')\n",
+ " batch_char = test_char[i: index]\n",
+ " batch_char = generate_char_seq(batch_char)\n",
+ " batch_y = test_Y[i: index]\n",
+ " batch_y = pad_sequences(batch_y,padding='post')\n",
+ " batch_depends = test_depends[i: index]\n",
+ " batch_depends = pad_sequences(batch_depends,padding='post')\n",
+ " \n",
+ " tags_seq, heads = sess.run(\n",
+ " [model.tags_seq, model.heads],\n",
+ " feed_dict = {\n",
+ " model.word_ids: batch_x,\n",
+ " model.char_ids: batch_char\n",
+ " },\n",
+ " )\n",
+ " \n",
+ " arc_accuracy, type_accuracy, root_accuracy = evaluate(heads, tags_seq, batch_depends, batch_y, \n",
+ " np.count_nonzero(batch_x, axis = 1))\n",
+ " pbar.set_postfix(arc_accuracy = arc_accuracy, type_accuracy = type_accuracy, \n",
+ " root_accuracy = root_accuracy)\n",
+ " arcs.append(arc_accuracy)\n",
+ " types.append(type_accuracy)\n",
+ " roots.append(root_accuracy)"
+ ],
+ "execution_count": 24,
+ "outputs": [
+ {
+ "output_type": "stream",
+ "text": [
+ "test minibatch loop: 100%|██████████| 120/120 [00:19<00:00, 6.69it/s, arc_accuracy=0.789, root_accuracy=0.875, type_accuracy=0.761]\n"
+ ],
+ "name": "stderr"
+ }
+ ]
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "trusted": true,
+ "id": "ZKzFKsJXluKb",
+ "colab_type": "code",
+ "colab": {
+ "base_uri": "https://localhost:8080/",
+ "height": 68
+ },
+ "outputId": "045e56a7-2516-4275-c4e5-2f356791b1c8"
+ },
+ "source": [
+ "print('arc accuracy:', np.mean(arcs))\n",
+ "print('types accuracy:', np.mean(types))\n",
+ "print('root accuracy:', np.mean(roots))"
+ ],
+ "execution_count": 25,
+ "outputs": [
+ {
+ "output_type": "stream",
+ "text": [
+ "arc accuracy: 0.7082063481789892\n",
+ "types accuracy: 0.6533524914247569\n",
+ "root accuracy: 0.6677083333333333\n"
+ ],
+ "name": "stdout"
+ }
+ ]
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "trusted": true,
+ "id": "Ay_U7CirluKf",
+ "colab_type": "code",
+ "colab": {}
+ },
+ "source": [
+ ""
+ ],
+ "execution_count": 0,
+ "outputs": []
+ }
+ ]
+}
\ No newline at end of file
diff --git a/dependency-parser/3.lstm-birnn-luong-crf-biaffine.ipynb b/dependency-parser/3.lstm-birnn-luong-crf-biaffine.ipynb
new file mode 100644
index 0000000..507bdc9
--- /dev/null
+++ b/dependency-parser/3.lstm-birnn-luong-crf-biaffine.ipynb
@@ -0,0 +1 @@
+{"cells":[{"metadata":{"_uuid":"8f2839f25d086af736a60e9eeb907d3b93b6e0e5","_cell_guid":"b1076dfc-b9ad-4769-8c92-a6c4dae69d19","trusted":true},"cell_type":"code","source":"!wget https://raw.githubusercontent.com/UniversalDependencies/UD_English-EWT/master/en_ewt-ud-dev.conllu\n!wget https://raw.githubusercontent.com/UniversalDependencies/UD_English-EWT/master/en_ewt-ud-train.conllu\n!wget https://raw.githubusercontent.com/UniversalDependencies/UD_English-EWT/master/en_ewt-ud-test.conllu\n!pip install malaya -U","execution_count":1,"outputs":[{"output_type":"stream","text":"--2019-09-30 05:48:17-- https://raw.githubusercontent.com/UniversalDependencies/UD_English-EWT/master/en_ewt-ud-dev.conllu\nResolving raw.githubusercontent.com (raw.githubusercontent.com)... 151.101.0.133, 151.101.64.133, 151.101.128.133, ...\nConnecting to raw.githubusercontent.com (raw.githubusercontent.com)|151.101.0.133|:443... connected.\nHTTP request sent, awaiting response... 200 OK\nLength: 1668174 (1.6M) [text/plain]\nSaving to: ‘en_ewt-ud-dev.conllu’\n\nen_ewt-ud-dev.conll 100%[===================>] 1.59M --.-KB/s in 0.05s \n\n2019-09-30 05:48:17 (30.8 MB/s) - ‘en_ewt-ud-dev.conllu’ saved [1668174/1668174]\n\n--2019-09-30 05:48:18-- https://raw.githubusercontent.com/UniversalDependencies/UD_English-EWT/master/en_ewt-ud-train.conllu\nResolving raw.githubusercontent.com (raw.githubusercontent.com)... 151.101.0.133, 151.101.64.133, 151.101.128.133, ...\nConnecting to raw.githubusercontent.com (raw.githubusercontent.com)|151.101.0.133|:443... connected.\nHTTP request sent, awaiting response... 200 OK\nLength: 13303045 (13M) [text/plain]\nSaving to: ‘en_ewt-ud-train.conllu’\n\nen_ewt-ud-train.con 100%[===================>] 12.69M --.-KB/s in 0.1s \n\n2019-09-30 05:48:18 (120 MB/s) - ‘en_ewt-ud-train.conllu’ saved [13303045/13303045]\n\n--2019-09-30 05:48:19-- https://raw.githubusercontent.com/UniversalDependencies/UD_English-EWT/master/en_ewt-ud-test.conllu\nResolving raw.githubusercontent.com (raw.githubusercontent.com)... 151.101.0.133, 151.101.64.133, 151.101.128.133, ...\nConnecting to raw.githubusercontent.com (raw.githubusercontent.com)|151.101.0.133|:443... connected.\nHTTP request sent, awaiting response... 200 OK\nLength: 1661985 (1.6M) [text/plain]\nSaving to: ‘en_ewt-ud-test.conllu’\n\nen_ewt-ud-test.conl 100%[===================>] 1.58M --.-KB/s in 0.05s \n\n2019-09-30 05:48:19 (32.0 MB/s) - ‘en_ewt-ud-test.conllu’ saved [1661985/1661985]\n\nCollecting malaya\n\u001b[?25l Downloading https://files.pythonhosted.org/packages/b1/11/5f8ea8da94136d1fb4db39931d4ed55ae51655a3212b33e5bf607271646e/malaya-2.7.7.0-py3-none-any.whl (2.1MB)\n\u001b[K |████████████████████████████████| 2.1MB 4.9MB/s eta 0:00:01\n\u001b[?25hCollecting dateparser (from malaya)\n\u001b[?25l Downloading https://files.pythonhosted.org/packages/82/9d/51126ac615bbc4418478d725a5fa1a0f112059f6f111e4b48cfbe17ef9d0/dateparser-0.7.2-py2.py3-none-any.whl (352kB)\n\u001b[K |████████████████████████████████| 358kB 34.2MB/s eta 0:00:01\n\u001b[?25hRequirement already satisfied, skipping upgrade: scikit-learn in /opt/conda/lib/python3.6/site-packages (from malaya) (0.21.3)\nCollecting PySastrawi (from malaya)\n\u001b[?25l Downloading https://files.pythonhosted.org/packages/61/84/b0a5454a040f81e81e6a95a5d5635f20ad43cc0c288f8b4966b339084962/PySastrawi-1.2.0-py2.py3-none-any.whl (210kB)\n\u001b[K |████████████████████████████████| 215kB 42.5MB/s eta 0:00:01\n\u001b[?25hRequirement already satisfied, skipping upgrade: unidecode in /opt/conda/lib/python3.6/site-packages (from malaya) (1.1.1)\nRequirement already satisfied, skipping upgrade: scipy in /opt/conda/lib/python3.6/site-packages (from malaya) (1.2.1)\nRequirement already satisfied, skipping upgrade: ftfy in /opt/conda/lib/python3.6/site-packages (from malaya) (5.6)\nRequirement already satisfied, skipping upgrade: sentencepiece in /opt/conda/lib/python3.6/site-packages (from malaya) (0.1.83)\nCollecting bert-tensorflow (from malaya)\n\u001b[?25l Downloading https://files.pythonhosted.org/packages/a6/66/7eb4e8b6ea35b7cc54c322c816f976167a43019750279a8473d355800a93/bert_tensorflow-1.0.1-py2.py3-none-any.whl (67kB)\n\u001b[K |████████████████████████████████| 71kB 27.1MB/s eta 0:00:01\n\u001b[?25hRequirement already satisfied, skipping upgrade: sklearn in /opt/conda/lib/python3.6/site-packages (from malaya) (0.0)\nRequirement already satisfied, skipping upgrade: requests in /opt/conda/lib/python3.6/site-packages (from malaya) (2.22.0)\nRequirement already satisfied, skipping upgrade: numpy in /opt/conda/lib/python3.6/site-packages (from malaya) (1.16.4)\nRequirement already satisfied, skipping upgrade: tensorflow in /opt/conda/lib/python3.6/site-packages (from malaya) (1.14.0)\nRequirement already satisfied, skipping upgrade: networkx in /opt/conda/lib/python3.6/site-packages (from malaya) (2.3)\nRequirement already satisfied, skipping upgrade: xgboost in /opt/conda/lib/python3.6/site-packages (from malaya) (0.90)\nRequirement already satisfied, skipping upgrade: tzlocal in /opt/conda/lib/python3.6/site-packages (from dateparser->malaya) (2.0.0)\nRequirement already satisfied, skipping upgrade: regex in /opt/conda/lib/python3.6/site-packages (from dateparser->malaya) (2019.8.19)\nRequirement already satisfied, skipping upgrade: pytz in /opt/conda/lib/python3.6/site-packages (from dateparser->malaya) (2019.2)\nRequirement already satisfied, skipping upgrade: python-dateutil in /opt/conda/lib/python3.6/site-packages (from dateparser->malaya) (2.8.0)\nRequirement already satisfied, skipping upgrade: joblib>=0.11 in /opt/conda/lib/python3.6/site-packages (from scikit-learn->malaya) (0.13.2)\nRequirement already satisfied, skipping upgrade: wcwidth in /opt/conda/lib/python3.6/site-packages (from ftfy->malaya) (0.1.7)\nRequirement already satisfied, skipping upgrade: six in /opt/conda/lib/python3.6/site-packages (from bert-tensorflow->malaya) (1.12.0)\nRequirement already satisfied, skipping upgrade: certifi>=2017.4.17 in /opt/conda/lib/python3.6/site-packages (from requests->malaya) (2019.9.11)\nRequirement already satisfied, skipping upgrade: idna<2.9,>=2.5 in /opt/conda/lib/python3.6/site-packages (from requests->malaya) (2.8)\nRequirement already satisfied, skipping upgrade: urllib3!=1.25.0,!=1.25.1,<1.26,>=1.21.1 in /opt/conda/lib/python3.6/site-packages (from requests->malaya) (1.24.2)\nRequirement already satisfied, skipping upgrade: chardet<3.1.0,>=3.0.2 in /opt/conda/lib/python3.6/site-packages (from requests->malaya) (3.0.4)\nRequirement already satisfied, skipping upgrade: tensorboard<1.15.0,>=1.14.0 in /opt/conda/lib/python3.6/site-packages (from tensorflow->malaya) (1.14.0)\nRequirement already satisfied, skipping upgrade: grpcio>=1.8.6 in /opt/conda/lib/python3.6/site-packages (from tensorflow->malaya) (1.24.0)\nRequirement already satisfied, skipping upgrade: google-pasta>=0.1.6 in /opt/conda/lib/python3.6/site-packages (from tensorflow->malaya) (0.1.7)\nRequirement already satisfied, skipping upgrade: absl-py>=0.7.0 in /opt/conda/lib/python3.6/site-packages (from tensorflow->malaya) (0.8.0)\nRequirement already satisfied, skipping upgrade: wheel>=0.26 in /opt/conda/lib/python3.6/site-packages (from tensorflow->malaya) (0.33.6)\nRequirement already satisfied, skipping upgrade: keras-preprocessing>=1.0.5 in /opt/conda/lib/python3.6/site-packages (from tensorflow->malaya) (1.1.0)\nRequirement already satisfied, skipping upgrade: gast>=0.2.0 in /opt/conda/lib/python3.6/site-packages (from tensorflow->malaya) (0.3.2)\nRequirement already satisfied, skipping upgrade: protobuf>=3.6.1 in /opt/conda/lib/python3.6/site-packages (from tensorflow->malaya) (3.7.1)\nRequirement already satisfied, skipping upgrade: keras-applications>=1.0.6 in /opt/conda/lib/python3.6/site-packages (from tensorflow->malaya) (1.0.8)\nRequirement already satisfied, skipping upgrade: astor>=0.6.0 in /opt/conda/lib/python3.6/site-packages (from tensorflow->malaya) (0.8.0)\nRequirement already satisfied, skipping upgrade: tensorflow-estimator<1.15.0rc0,>=1.14.0rc0 in /opt/conda/lib/python3.6/site-packages (from tensorflow->malaya) (1.14.0)\nRequirement already satisfied, skipping upgrade: termcolor>=1.1.0 in /opt/conda/lib/python3.6/site-packages (from tensorflow->malaya) (1.1.0)\nRequirement already satisfied, skipping upgrade: wrapt>=1.11.1 in /opt/conda/lib/python3.6/site-packages (from tensorflow->malaya) (1.11.2)\nRequirement already satisfied, skipping upgrade: decorator>=4.3.0 in /opt/conda/lib/python3.6/site-packages (from networkx->malaya) (4.4.0)\nRequirement already satisfied, skipping upgrade: markdown>=2.6.8 in /opt/conda/lib/python3.6/site-packages (from tensorboard<1.15.0,>=1.14.0->tensorflow->malaya) (3.1.1)\nRequirement already satisfied, skipping upgrade: setuptools>=41.0.0 in /opt/conda/lib/python3.6/site-packages (from tensorboard<1.15.0,>=1.14.0->tensorflow->malaya) (41.2.0)\nRequirement already satisfied, skipping upgrade: werkzeug>=0.11.15 in /opt/conda/lib/python3.6/site-packages (from tensorboard<1.15.0,>=1.14.0->tensorflow->malaya) (0.16.0)\nRequirement already satisfied, skipping upgrade: h5py in /opt/conda/lib/python3.6/site-packages (from keras-applications>=1.0.6->tensorflow->malaya) (2.9.0)\n","name":"stdout"},{"output_type":"stream","text":"Installing collected packages: dateparser, PySastrawi, bert-tensorflow, malaya\nSuccessfully installed PySastrawi-1.2.0 bert-tensorflow-1.0.1 dateparser-0.7.2 malaya-2.7.7.0\n","name":"stdout"}]},{"metadata":{"_uuid":"d629ff2d2480ee46fbb7e2d37f6b5fab8052498a","_cell_guid":"79c7e3d0-c299-4dcb-8224-4455121ee9b0","trusted":true},"cell_type":"code","source":"import malaya\nimport re\nfrom malaya.texts._text_functions import split_into_sentences\nfrom malaya.texts import _regex\nimport numpy as np\nimport itertools\nimport tensorflow as tf\nfrom tensorflow.keras.preprocessing.sequence import pad_sequences\n\ntokenizer = malaya.preprocessing._tokenizer\nsplitter = split_into_sentences","execution_count":2,"outputs":[{"output_type":"stream","text":"not found any version, deleting previous version models..\n","name":"stdout"}]},{"metadata":{"trusted":true},"cell_type":"code","source":"def is_number_regex(s):\n if re.match(\"^\\d+?\\.\\d+?$\", s) is None:\n return s.isdigit()\n return True\n\ndef preprocessing(w):\n if is_number_regex(w):\n return ''\n elif re.match(_regex._money, w):\n return ''\n elif re.match(_regex._date, w):\n return ''\n elif re.match(_regex._expressions['email'], w):\n return ''\n elif re.match(_regex._expressions['url'], w):\n return ''\n else:\n w = ''.join(''.join(s)[:2] for _, s in itertools.groupby(w))\n return w","execution_count":3,"outputs":[]},{"metadata":{"trusted":true},"cell_type":"code","source":"word2idx = {'PAD': 0,'UNK':1, '_ROOT': 2}\ntag2idx = {'PAD': 0, '_': 1}\nchar2idx = {'PAD': 0,'UNK':1, '_ROOT': 2}\nword_idx = 3\ntag_idx = 2\nchar_idx = 3\n\nspecial_tokens = ['', '', '', '', '']\n\nfor t in special_tokens:\n word2idx[t] = word_idx\n word_idx += 1\n char2idx[t] = char_idx\n char_idx += 1\n \nword2idx, char2idx","execution_count":4,"outputs":[{"output_type":"execute_result","execution_count":4,"data":{"text/plain":"({'PAD': 0,\n 'UNK': 1,\n '_ROOT': 2,\n '': 3,\n '': 4,\n '': 5,\n '': 6,\n '': 7},\n {'PAD': 0,\n 'UNK': 1,\n '_ROOT': 2,\n '': 3,\n '': 4,\n '': 5,\n '': 6,\n '': 7})"},"metadata":{}}]},{"metadata":{"trusted":true},"cell_type":"code","source":"PAD = \"_PAD\"\nPAD_POS = \"_PAD_POS\"\nPAD_TYPE = \"_\"\nPAD_CHAR = \"_PAD_CHAR\"\nROOT = \"_ROOT\"\nROOT_POS = \"_ROOT_POS\"\nROOT_TYPE = \"_\"\nROOT_CHAR = \"_ROOT_CHAR\"\nEND = \"_END\"\nEND_POS = \"_END_POS\"\nEND_TYPE = \"_\"\nEND_CHAR = \"_END_CHAR\"\n\ndef process_corpus(corpus, until = None):\n global word2idx, tag2idx, char2idx, word_idx, tag_idx, char_idx\n sentences, words, depends, labels, pos, chars = [], [], [], [], [], []\n temp_sentence, temp_word, temp_depend, temp_label, temp_pos = [], [], [], [], []\n first_time = True\n for sentence in corpus:\n try:\n if len(sentence):\n if sentence[0] == '#':\n continue\n if first_time:\n print(sentence)\n first_time = False\n sentence = sentence.split('\\t')\n for c in sentence[1]:\n if c not in char2idx:\n char2idx[c] = char_idx\n char_idx += 1\n if sentence[7] not in tag2idx:\n tag2idx[sentence[7]] = tag_idx\n tag_idx += 1\n sentence[1] = preprocessing(sentence[1])\n if sentence[1] not in word2idx:\n word2idx[sentence[1]] = word_idx\n word_idx += 1\n temp_word.append(word2idx[sentence[1]])\n temp_depend.append(int(sentence[6]))\n temp_label.append(tag2idx[sentence[7]])\n temp_sentence.append(sentence[1])\n temp_pos.append(sentence[3])\n else:\n if len(temp_sentence) < 2 or len(temp_word) != len(temp_label):\n temp_word = []\n temp_depend = []\n temp_label = []\n temp_sentence = []\n temp_pos = []\n continue\n words.append(temp_word)\n depends.append(temp_depend)\n labels.append(temp_label)\n sentences.append( temp_sentence)\n pos.append(temp_pos)\n char_ = [[char2idx['_ROOT']]]\n for w in temp_sentence:\n if w in char2idx:\n char_.append([char2idx[w]])\n else:\n char_.append([char2idx[c] for c in w])\n chars.append(char_)\n temp_word = []\n temp_depend = []\n temp_label = []\n temp_sentence = []\n temp_pos = []\n except Exception as e:\n print(e, sentence)\n return sentences[:-1], words[:-1], depends[:-1], labels[:-1], pos[:-1], chars[:-1]","execution_count":5,"outputs":[]},{"metadata":{"trusted":true},"cell_type":"code","source":"with open('en_ewt-ud-dev.conllu') as fopen:\n dev = fopen.read().split('\\n')\n\nsentences_dev, words_dev, depends_dev, labels_dev, _, _ = process_corpus(dev)","execution_count":6,"outputs":[{"output_type":"stream","text":"1\tFrom\tfrom\tADP\tIN\t_\t3\tcase\t3:case\t_\ninvalid literal for int() with base 10: '_' ['10.1', 'has', 'have', 'VERB', 'VBZ', '_', '_', '_', '8:parataxis', 'CopyOf=-1']\ninvalid literal for int() with base 10: '_' ['21.1', 'has', 'have', 'VERB', 'VBZ', '_', '_', '_', '16:conj:and', 'CopyOf=-1']\n","name":"stdout"}]},{"metadata":{"trusted":true},"cell_type":"code","source":"with open('en_ewt-ud-test.conllu') as fopen:\n test = fopen.read().split('\\n')\n\nsentences_test, words_test, depends_test, labels_test, _, _ = process_corpus(test)\nsentences_test.extend(sentences_dev)\nwords_test.extend(words_dev)\ndepends_test.extend(depends_dev)\nlabels_test.extend(labels_dev)","execution_count":7,"outputs":[{"output_type":"stream","text":"1\tWhat\twhat\tPRON\tWP\tPronType=Int\t0\troot\t0:root\t_\ninvalid literal for int() with base 10: '_' ['24.1', 'left', 'left', 'VERB', 'VBN', 'Tense=Past|VerbForm=Part', '_', '_', '6:parataxis', 'CopyOf=6']\n","name":"stdout"}]},{"metadata":{"trusted":true},"cell_type":"code","source":"with open('en_ewt-ud-train.conllu') as fopen:\n train = fopen.read().split('\\n')\n\nsentences_train, words_train, depends_train, labels_train, _, _ = process_corpus(train)","execution_count":8,"outputs":[{"output_type":"stream","text":"1\tAl\tAl\tPROPN\tNNP\tNumber=Sing\t0\troot\t0:root\tSpaceAfter=No\ninvalid literal for int() with base 10: '_' ['8.1', 'reported', 'report', 'VERB', 'VBN', 'Tense=Past|VerbForm=Part|Voice=Pass', '_', '_', '5:conj:and', 'CopyOf=5']\ninvalid literal for int() with base 10: '_' ['22.1', 'used', 'use', 'VERB', 'VBN', 'Tense=Past|VerbForm=Part', '_', '_', '13:advcl:with|17:conj:and', 'CopyOf=17']\ninvalid literal for int() with base 10: '_' ['22.1', 'used', 'use', 'VERB', 'VBN', 'Tense=Past|VerbForm=Part', '_', '_', '13:advcl:with|17:conj:and', 'CopyOf=17']\ninvalid literal for int() with base 10: '_' ['11.1', 'called', 'call', 'VERB', 'VBN', 'Tense=Past|VerbForm=Part|Voice=Pass', '_', '_', '3:conj:and', 'CopyOf=3']\ninvalid literal for int() with base 10: '_' ['14.1', 'is', 'be', 'VERB', 'VBZ', '_', '_', '_', '1:conj:and', 'CopyOf=1']\ninvalid literal for int() with base 10: '_' ['20.1', 'reflect', 'reflect', 'VERB', 'VBP', 'Mood=Ind|Tense=Pres|VerbForm=Fin', '_', '_', '7:acl:relcl|9:conj', 'CopyOf=9']\ninvalid literal for int() with base 10: '_' ['21.1', 'recruited', 'recruit', 'VERB', 'VBD', 'Mood=Ind|Tense=Past|VerbForm=Fin', '_', '_', '9:conj:and', 'CopyOf=9']\ninvalid literal for int() with base 10: '_' ['9.1', 'wish', 'wish', 'VERB', 'VBP', 'Mood=Ind|Tense=Pres|VerbForm=Fin', '_', '_', '2:conj:and', 'CopyOf=2']\ninvalid literal for int() with base 10: '_' ['38.1', 'supplied', 'supply', 'VERB', 'VBN', 'Tense=Past|VerbForm=Part|Voice=Pass', '_', '_', '16:conj:and', 'CopyOf=16']\ninvalid literal for int() with base 10: '_' ['18.1', 'keep', 'keep', 'VERB', 'VB', 'Mood=Imp|VerbForm=Fin', '_', '_', '14:conj:and', 'CopyOf=14']\ninvalid literal for int() with base 10: '_' ['21.1', 'keep', 'keep', 'VERB', 'VB', 'Mood=Imp|VerbForm=Fin', '_', '_', '14:conj:and', 'CopyOf=14']\ninvalid literal for int() with base 10: '_' ['18.1', 'mean', 'mean', 'VERB', 'VB', 'VerbForm=Inf', '_', '_', '8:conj', 'CopyOf=8']\ninvalid literal for int() with base 10: '_' ['30.1', 'play', 'play', 'VERB', 'VBP', 'Mood=Ind|Tense=Pres|VerbForm=Fin', '_', '_', '18:acl:relcl|27:conj:but', 'CopyOf=27']\ninvalid literal for int() with base 10: '_' ['22.1', 'have', 'have', 'VERB', 'VBP', 'Mood=Ind|Tense=Pres|VerbForm=Fin', '_', '_', '17:conj', 'CopyOf=17']\ninvalid literal for int() with base 10: '_' ['27.1', 'have', 'have', 'VERB', 'VBP', 'Mood=Ind|Tense=Pres|VerbForm=Fin', '_', '_', '17:conj', 'CopyOf=17']\ninvalid literal for int() with base 10: '_' ['49.1', 'helped', 'help', 'VERB', 'VBD', '_', '_', '_', '38:conj:but', 'CopyOf=38']\ninvalid literal for int() with base 10: '_' ['7.1', 'found', 'find', 'VERB', 'VBD', 'Mood=Ind|Tense=Past|VerbForm=Fin', '_', '_', '3:conj', 'CopyOf=3']\ninvalid literal for int() with base 10: '_' ['10.1', 'excited', 'excited', 'ADJ', 'JJ', 'Degree=Pos', '_', '_', '4:advcl', 'CopyOf=4']\ninvalid literal for int() with base 10: '_' ['15.1', \"'s\", 'be', 'VERB', 'VBZ', '_', '_', '_', '2:conj:and', 'CopyOf=2']\ninvalid literal for int() with base 10: '_' ['25.1', 'took', 'take', 'VERB', 'VBD', 'Mood=Ind|Tense=Past|VerbForm=Fin', '_', '_', '17:conj:and', 'CopyOf=17']\ninvalid literal for int() with base 10: '_' ['10.1', 'loss', 'lose', 'VERB', 'VBD', 'Mood=Ind|Tense=Past|VerbForm=Fin', '_', '_', '3:conj:and', 'CopyOf=3']\ninvalid literal for int() with base 10: '_' ['11.1', 'leave', 'leave', 'VERB', 'VB', 'VerbForm=Inf', '_', '_', '7:parataxis', 'CopyOf=7']\ninvalid literal for int() with base 10: '_' ['24.1', 'charge', 'charge', 'VERB', 'VBP', 'Mood=Ind|Tense=Pres|VerbForm=Fin', '_', '_', '16:conj:and', 'CopyOf=16']\n","name":"stdout"}]},{"metadata":{"trusted":true},"cell_type":"code","source":"len(sentences_train), len(sentences_test)","execution_count":9,"outputs":[{"output_type":"execute_result","execution_count":9,"data":{"text/plain":"(12000, 3824)"},"metadata":{}}]},{"metadata":{"trusted":true},"cell_type":"code","source":"idx2word = {v:k for k, v in word2idx.items()}\nidx2tag = {v:k for k, v in tag2idx.items()}\nlen(idx2word)","execution_count":10,"outputs":[{"output_type":"execute_result","execution_count":10,"data":{"text/plain":"21974"},"metadata":{}}]},{"metadata":{"trusted":true},"cell_type":"code","source":"def generate_char_seq(batch, UNK = 2):\n maxlen_c = max([len(k) for k in batch])\n x = [[len(i) for i in k] for k in batch]\n maxlen = max([j for i in x for j in i])\n temp = np.zeros((len(batch),maxlen_c,maxlen),dtype=np.int32)\n for i in range(len(batch)):\n for k in range(len(batch[i])):\n for no, c in enumerate(batch[i][k]):\n temp[i,k,-1-no] = char2idx.get(c, UNK)\n return temp","execution_count":11,"outputs":[]},{"metadata":{"trusted":true},"cell_type":"code","source":"generate_char_seq(sentences_train[:5]).shape","execution_count":12,"outputs":[{"output_type":"execute_result","execution_count":12,"data":{"text/plain":"(5, 36, 11)"},"metadata":{}}]},{"metadata":{"trusted":true},"cell_type":"code","source":"pad_sequences(words_train[:5],padding='post').shape","execution_count":13,"outputs":[{"output_type":"execute_result","execution_count":13,"data":{"text/plain":"(5, 36)"},"metadata":{}}]},{"metadata":{"trusted":true},"cell_type":"code","source":"train_X = words_train\ntrain_Y = labels_train\ntrain_depends = depends_train\ntrain_char = sentences_train\n\ntest_X = words_test\ntest_Y = labels_test\ntest_depends = depends_test\ntest_char = sentences_test","execution_count":14,"outputs":[]},{"metadata":{"trusted":true},"cell_type":"code","source":"class BiAAttention:\n def __init__(self, input_size_encoder, input_size_decoder, num_labels):\n self.input_size_encoder = input_size_encoder\n self.input_size_decoder = input_size_decoder\n self.num_labels = num_labels\n \n self.W_d = tf.get_variable(\"W_d\", shape=[self.num_labels, self.input_size_decoder],\n initializer=tf.contrib.layers.xavier_initializer())\n self.W_e = tf.get_variable(\"W_e\", shape=[self.num_labels, self.input_size_encoder],\n initializer=tf.contrib.layers.xavier_initializer())\n self.U = tf.get_variable(\"U\", shape=[self.num_labels, self.input_size_decoder, self.input_size_encoder],\n initializer=tf.contrib.layers.xavier_initializer())\n \n def forward(self, input_d, input_e, mask_d=None, mask_e=None):\n batch = tf.shape(input_d)[0]\n length_decoder = tf.shape(input_d)[1]\n length_encoder = tf.shape(input_e)[1]\n out_d = tf.expand_dims(tf.matmul(self.W_d, tf.transpose(input_d, [0, 2, 1])), 3)\n out_e = tf.expand_dims(tf.matmul(self.W_e, tf.transpose(input_e, [0, 2, 1])), 2)\n output = tf.matmul(tf.expand_dims(input_d, 1), self.U)\n output = tf.matmul(output, tf.transpose(tf.expand_dims(input_e, 1), [0, 1, 3, 2]))\n \n output = output + out_d + out_e\n \n if mask_d is not None:\n d = tf.expand_dims(tf.expand_dims(mask_d, 1), 3)\n e = tf.expand_dims(tf.expand_dims(mask_e, 1), 2)\n output = output * d * e\n \n return output\n\nclass Model:\n def __init__(\n self,\n dim_word,\n dim_char,\n dropout,\n learning_rate,\n hidden_size_char,\n hidden_size_word,\n num_layers\n ):\n def cells(size, reuse = False):\n return tf.contrib.rnn.DropoutWrapper(\n tf.nn.rnn_cell.LSTMCell(\n size,\n initializer = tf.orthogonal_initializer(),\n reuse = reuse,\n ),\n output_keep_prob = dropout,\n )\n \n def luong(embedded, size):\n attention_mechanism = tf.contrib.seq2seq.LuongAttention(\n num_units = hidden_size_word, memory = embedded\n )\n return tf.contrib.seq2seq.AttentionWrapper(\n cell = cells(hidden_size_word),\n attention_mechanism = attention_mechanism,\n attention_layer_size = hidden_size_word,\n )\n \n self.word_ids = tf.placeholder(tf.int32, shape = [None, None])\n self.char_ids = tf.placeholder(tf.int32, shape = [None, None, None])\n self.labels = tf.placeholder(tf.int32, shape = [None, None])\n self.depends = tf.placeholder(tf.int32, shape = [None, None])\n self.maxlen = tf.shape(self.word_ids)[1]\n self.lengths = tf.count_nonzero(self.word_ids, 1)\n self.mask = tf.math.not_equal(self.word_ids, 0)\n float_mask = tf.cast(self.mask, tf.float32)\n \n self.arc_h = tf.layers.Dense(hidden_size_word)\n self.arc_c = tf.layers.Dense(hidden_size_word)\n self.attention = BiAAttention(hidden_size_word, hidden_size_word, 1)\n\n self.word_embeddings = tf.Variable(\n tf.truncated_normal(\n [len(word2idx), dim_word], stddev = 1.0 / np.sqrt(dim_word)\n )\n )\n self.char_embeddings = tf.Variable(\n tf.truncated_normal(\n [len(char2idx), dim_char], stddev = 1.0 / np.sqrt(dim_char)\n )\n )\n\n word_embedded = tf.nn.embedding_lookup(\n self.word_embeddings, self.word_ids\n )\n char_embedded = tf.nn.embedding_lookup(\n self.char_embeddings, self.char_ids\n )\n s = tf.shape(char_embedded)\n char_embedded = tf.reshape(\n char_embedded, shape = [s[0] * s[1], s[-2], dim_char]\n )\n\n for n in range(num_layers):\n (out_fw, out_bw), (\n state_fw,\n state_bw,\n ) = tf.nn.bidirectional_dynamic_rnn(\n cell_fw = cells(hidden_size_char),\n cell_bw = cells(hidden_size_char),\n inputs = char_embedded,\n dtype = tf.float32,\n scope = 'bidirectional_rnn_char_%d' % (n),\n )\n char_embedded = tf.concat((out_fw, out_bw), 2)\n output = tf.reshape(\n char_embedded[:, -1], shape = [s[0], s[1], 2 * hidden_size_char]\n )\n word_embedded = tf.concat([word_embedded, output], axis = -1)\n\n for n in range(num_layers):\n (out_fw, out_bw), (\n state_fw,\n state_bw,\n ) = tf.nn.bidirectional_dynamic_rnn(\n cell_fw = luong(word_embedded, hidden_size_word),\n cell_bw = luong(word_embedded, hidden_size_word),\n inputs = word_embedded,\n dtype = tf.float32,\n scope = 'bidirectional_rnn_word_%d' % (n),\n )\n word_embedded = tf.concat((out_fw, out_bw), 2)\n\n logits = tf.layers.dense(word_embedded, len(idx2tag))\n log_likelihood, transition_params = tf.contrib.crf.crf_log_likelihood(\n logits, self.labels, self.lengths\n )\n arc_h = tf.nn.elu(self.arc_h(word_embedded))\n arc_c = tf.nn.elu(self.arc_c(word_embedded))\n out_arc = tf.squeeze(self.attention.forward(arc_h, arc_h, mask_d=float_mask, mask_e=float_mask), axis = 1)\n \n batch = tf.shape(out_arc)[0]\n batch_index = tf.range(0, batch)\n max_len = tf.shape(out_arc)[1]\n sec_max_len = tf.shape(out_arc)[2]\n \n minus_inf = -1e8\n minus_mask = (1 - float_mask) * minus_inf\n out_arc = out_arc + tf.expand_dims(minus_mask, axis = 2) + tf.expand_dims(minus_mask, axis = 1)\n loss_arc = tf.nn.log_softmax(out_arc, dim=1)\n loss_arc = loss_arc * tf.expand_dims(float_mask, axis = 2) * tf.expand_dims(float_mask, axis = 1)\n num = tf.reduce_sum(float_mask) - tf.cast(batch, tf.float32)\n \n child_index = tf.tile(tf.expand_dims(tf.range(0, max_len), 1), [1, batch])\n t = tf.transpose(self.depends)\n broadcasted = tf.broadcast_to(batch_index, tf.shape(t))\n concatenated = tf.transpose(tf.concat([tf.expand_dims(broadcasted, axis = 0),\n tf.expand_dims(t, axis = 0),\n tf.expand_dims(child_index, axis = 0)], axis = 0))\n loss_arc = tf.gather_nd(loss_arc, concatenated)\n loss_arc = tf.transpose(loss_arc, [1, 0])[1:]\n \n loss_arc = tf.reduce_sum(-loss_arc) / num\n \n self.cost = tf.reduce_mean(-log_likelihood) + loss_arc\n \n self.optimizer = tf.train.AdamOptimizer(\n learning_rate = learning_rate\n ).minimize(self.cost)\n \n mask = tf.sequence_mask(self.lengths, maxlen = self.maxlen)\n \n self.tags_seq, _ = tf.contrib.crf.crf_decode(\n logits, transition_params, self.lengths\n )\n \n out_arc = out_arc + tf.linalg.diag(tf.fill([max_len], -np.inf))\n minus_mask = tf.expand_dims(tf.cast(1.0 - float_mask, tf.bool), axis = 2)\n minus_mask = tf.tile(minus_mask, [1, 1, sec_max_len])\n out_arc = tf.where(minus_mask, tf.fill(tf.shape(out_arc), -np.inf), out_arc)\n self.heads = tf.argmax(out_arc, axis = 1)\n \n self.prediction = tf.boolean_mask(self.tags_seq, mask)\n mask_label = tf.boolean_mask(self.labels, mask)\n correct_pred = tf.equal(self.prediction, mask_label)\n correct_index = tf.cast(correct_pred, tf.float32)\n self.accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32))\n \n self.prediction = tf.cast(tf.boolean_mask(self.heads, mask), tf.int32)\n mask_label = tf.boolean_mask(self.depends, mask)\n correct_pred = tf.equal(self.prediction, mask_label)\n correct_index = tf.cast(correct_pred, tf.float32)\n self.accuracy_depends = tf.reduce_mean(tf.cast(correct_pred, tf.float32))","execution_count":15,"outputs":[]},{"metadata":{"trusted":true},"cell_type":"code","source":"tf.reset_default_graph()\nsess = tf.InteractiveSession()\n\ndim_word = 128\ndim_char = 256\ndropout = 1.0\nlearning_rate = 1e-3\nhidden_size_char = 128\nhidden_size_word = 128\nnum_layers = 2\n\nmodel = Model(dim_word,dim_char,dropout,learning_rate,hidden_size_char,hidden_size_word,num_layers)\nsess.run(tf.global_variables_initializer())","execution_count":16,"outputs":[{"output_type":"stream","text":"WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AttributeError: module 'gast' has no attribute 'Num'\nWARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AttributeError: module 'gast' has no attribute 'Num'\nWARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AttributeError: module 'gast' has no attribute 'Num'\nWARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AttributeError: module 'gast' has no attribute 'Num'\nWARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\nWARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\nWARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AttributeError: module 'gast' has no attribute 'Num'\nWARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AttributeError: module 'gast' has no attribute 'Num'\nWARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\nWARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AttributeError: module 'gast' has no attribute 'Num'\nWARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AttributeError: module 'gast' has no attribute 'Num'\nWARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\nWARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\nWARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\nWARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AttributeError: module 'gast' has no attribute 'Num'\nWARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AttributeError: module 'gast' has no attribute 'Num'\nWARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n","name":"stdout"},{"output_type":"stream","text":"WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AttributeError: module 'gast' has no attribute 'Num'\nWARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AttributeError: module 'gast' has no attribute 'Num'\nWARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\nWARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\nWARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\nWARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n","name":"stdout"}]},{"metadata":{"trusted":true},"cell_type":"code","source":"batch_x = train_X[:5]\nbatch_x = pad_sequences(batch_x,padding='post')\nbatch_char = train_char[:5]\nbatch_char = generate_char_seq(batch_char)\nbatch_y = train_Y[:5]\nbatch_y = pad_sequences(batch_y,padding='post')\nbatch_depends = train_depends[:5]\nbatch_depends = pad_sequences(batch_depends,padding='post')","execution_count":17,"outputs":[]},{"metadata":{"trusted":true},"cell_type":"code","source":"sess.run([model.accuracy, model.accuracy_depends, model.cost],\n feed_dict = {model.word_ids: batch_x,\n model.char_ids: batch_char,\n model.labels: batch_y,\n model.depends: batch_depends})","execution_count":18,"outputs":[{"output_type":"execute_result","execution_count":18,"data":{"text/plain":"[0.01724138, 0.03448276, 94.80077]"},"metadata":{}}]},{"metadata":{"trusted":true},"cell_type":"code","source":"from tqdm import tqdm\n\nbatch_size = 32\nepoch = 15\n\nfor e in range(epoch):\n train_acc, train_loss = [], []\n test_acc, test_loss = [], []\n train_acc_depends, test_acc_depends = [], []\n \n pbar = tqdm(\n range(0, len(train_X), batch_size), desc = 'train minibatch loop'\n )\n for i in pbar:\n index = min(i + batch_size, len(train_X))\n batch_x = train_X[i: index]\n batch_x = pad_sequences(batch_x,padding='post')\n batch_char = train_char[i: index]\n batch_char = generate_char_seq(batch_char)\n batch_y = train_Y[i: index]\n batch_y = pad_sequences(batch_y,padding='post')\n batch_depends = train_depends[i: index]\n batch_depends = pad_sequences(batch_depends,padding='post')\n \n acc_depends, acc, cost, _ = sess.run(\n [model.accuracy_depends, model.accuracy, model.cost, model.optimizer],\n feed_dict = {\n model.word_ids: batch_x,\n model.char_ids: batch_char,\n model.labels: batch_y,\n model.depends: batch_depends\n },\n )\n train_loss.append(cost)\n train_acc.append(acc)\n train_acc_depends.append(acc_depends)\n pbar.set_postfix(cost = cost, accuracy = acc, accuracy_depends = acc_depends)\n \n pbar = tqdm(\n range(0, len(test_X), batch_size), desc = 'test minibatch loop'\n )\n for i in pbar:\n index = min(i + batch_size, len(test_X))\n batch_x = test_X[i: index]\n batch_x = pad_sequences(batch_x,padding='post')\n batch_char = test_char[i: index]\n batch_char = generate_char_seq(batch_char)\n batch_y = test_Y[i: index]\n batch_y = pad_sequences(batch_y,padding='post')\n batch_depends = test_depends[i: index]\n batch_depends = pad_sequences(batch_depends,padding='post')\n \n acc_depends, acc, cost = sess.run(\n [model.accuracy_depends, model.accuracy, model.cost],\n feed_dict = {\n model.word_ids: batch_x,\n model.char_ids: batch_char,\n model.labels: batch_y,\n model.depends: batch_depends\n },\n )\n test_loss.append(cost)\n test_acc.append(acc)\n test_acc_depends.append(acc_depends)\n pbar.set_postfix(cost = cost, accuracy = acc, accuracy_depends = acc_depends)\n \n \n print(\n 'epoch: %d, training loss: %f, training acc: %f, training depends: %f, valid loss: %f, valid acc: %f, valid depends: %f\\n'\n % (e, np.mean(train_loss), \n np.mean(train_acc), \n np.mean(train_acc_depends), \n np.mean(test_loss), \n np.mean(test_acc), \n np.mean(test_acc_depends)\n ))\n ","execution_count":19,"outputs":[{"output_type":"stream","text":"train minibatch loop: 100%|██████████| 375/375 [02:45<00:00, 2.27it/s, accuracy=0.76, accuracy_depends=0.563, cost=19.7] \ntest minibatch loop: 100%|██████████| 120/120 [00:22<00:00, 5.26it/s, accuracy=0.789, accuracy_depends=0.636, cost=12.2]\ntrain minibatch loop: 0%| | 0/375 [00:00, ?it/s]","name":"stderr"},{"output_type":"stream","text":"epoch: 0, training loss: 33.055885, training acc: 0.489183, training depends: 0.330494, valid loss: 14.063558, valid acc: 0.722389, valid depends: 0.542770\n\n","name":"stdout"},{"output_type":"stream","text":"train minibatch loop: 100%|██████████| 375/375 [02:44<00:00, 2.28it/s, accuracy=0.868, accuracy_depends=0.722, cost=11.4]\ntest minibatch loop: 100%|██████████| 120/120 [00:22<00:00, 5.27it/s, accuracy=0.866, accuracy_depends=0.733, cost=7.85]\ntrain minibatch loop: 0%| | 0/375 [00:00, ?it/s]","name":"stderr"},{"output_type":"stream","text":"epoch: 1, training loss: 12.850532, training acc: 0.797367, training depends: 0.598399, valid loss: 9.277430, valid acc: 0.815580, valid depends: 0.636476\n\n","name":"stdout"},{"output_type":"stream","text":"train minibatch loop: 100%|██████████| 375/375 [02:44<00:00, 2.28it/s, accuracy=0.889, accuracy_depends=0.756, cost=8.97]\ntest minibatch loop: 100%|██████████| 120/120 [00:23<00:00, 5.20it/s, accuracy=0.899, accuracy_depends=0.789, cost=6.5] \ntrain minibatch loop: 0%| | 0/375 [00:00, ?it/s]","name":"stderr"},{"output_type":"stream","text":"epoch: 2, training loss: 8.498978, training acc: 0.866366, training depends: 0.675371, valid loss: 8.156148, valid acc: 0.840378, valid depends: 0.668963\n\n","name":"stdout"},{"output_type":"stream","text":"train minibatch loop: 100%|██████████| 375/375 [02:40<00:00, 2.34it/s, accuracy=0.913, accuracy_depends=0.796, cost=6.91]\ntest minibatch loop: 100%|██████████| 120/120 [00:21<00:00, 5.47it/s, accuracy=0.903, accuracy_depends=0.834, cost=5.73]\ntrain minibatch loop: 0%| | 0/375 [00:00, ?it/s]","name":"stderr"},{"output_type":"stream","text":"epoch: 3, training loss: 6.492755, training acc: 0.898508, training depends: 0.715001, valid loss: 8.048695, valid acc: 0.847213, valid depends: 0.685596\n\n","name":"stdout"},{"output_type":"stream","text":"train minibatch loop: 100%|██████████| 375/375 [02:38<00:00, 2.37it/s, accuracy=0.936, accuracy_depends=0.798, cost=5.86]\ntest minibatch loop: 100%|██████████| 120/120 [00:21<00:00, 5.51it/s, accuracy=0.895, accuracy_depends=0.798, cost=6.22]\ntrain minibatch loop: 0%| | 0/375 [00:00, ?it/s]","name":"stderr"},{"output_type":"stream","text":"epoch: 4, training loss: 5.167568, training acc: 0.920600, training depends: 0.738663, valid loss: 8.305276, valid acc: 0.847596, valid depends: 0.689086\n\n","name":"stdout"},{"output_type":"stream","text":"train minibatch loop: 100%|██████████| 375/375 [02:38<00:00, 2.36it/s, accuracy=0.943, accuracy_depends=0.815, cost=4.73]\ntest minibatch loop: 100%|██████████| 120/120 [00:23<00:00, 5.12it/s, accuracy=0.891, accuracy_depends=0.798, cost=6.56]\ntrain minibatch loop: 0%| | 0/375 [00:00, ?it/s]","name":"stderr"},{"output_type":"stream","text":"epoch: 5, training loss: 4.165185, training acc: 0.937471, training depends: 0.753970, valid loss: 8.721710, valid acc: 0.846347, valid depends: 0.695697\n\n","name":"stdout"},{"output_type":"stream","text":"train minibatch loop: 100%|██████████| 375/375 [02:41<00:00, 2.33it/s, accuracy=0.941, accuracy_depends=0.818, cost=4.75]\ntest minibatch loop: 100%|██████████| 120/120 [00:22<00:00, 5.37it/s, accuracy=0.911, accuracy_depends=0.806, cost=6.27]\ntrain minibatch loop: 0%| | 0/375 [00:00, ?it/s]","name":"stderr"},{"output_type":"stream","text":"epoch: 6, training loss: 3.425873, training acc: 0.949323, training depends: 0.764339, valid loss: 9.168028, valid acc: 0.848342, valid depends: 0.696816\n\n","name":"stdout"},{"output_type":"stream","text":"train minibatch loop: 100%|██████████| 375/375 [02:40<00:00, 2.33it/s, accuracy=0.961, accuracy_depends=0.818, cost=3.71] \ntest minibatch loop: 100%|██████████| 120/120 [00:22<00:00, 5.28it/s, accuracy=0.895, accuracy_depends=0.83, cost=6.55] \ntrain minibatch loop: 0%| | 0/375 [00:00, ?it/s]","name":"stderr"},{"output_type":"stream","text":"epoch: 7, training loss: 2.844981, training acc: 0.958249, training depends: 0.774813, valid loss: 9.671643, valid acc: 0.846337, valid depends: 0.694770\n\n","name":"stdout"},{"output_type":"stream","text":"train minibatch loop: 100%|██████████| 375/375 [02:43<00:00, 2.30it/s, accuracy=0.965, accuracy_depends=0.824, cost=2.8] \ntest minibatch loop: 100%|██████████| 120/120 [00:22<00:00, 5.39it/s, accuracy=0.899, accuracy_depends=0.834, cost=6.87]\ntrain minibatch loop: 0%| | 0/375 [00:00, ?it/s]","name":"stderr"},{"output_type":"stream","text":"epoch: 8, training loss: 2.431637, training acc: 0.964540, training depends: 0.779873, valid loss: 9.727604, valid acc: 0.851299, valid depends: 0.697885\n\n","name":"stdout"},{"output_type":"stream","text":"train minibatch loop: 100%|██████████| 375/375 [02:43<00:00, 2.29it/s, accuracy=0.981, accuracy_depends=0.815, cost=1.87] \ntest minibatch loop: 100%|██████████| 120/120 [00:22<00:00, 5.25it/s, accuracy=0.891, accuracy_depends=0.802, cost=8.29]\ntrain minibatch loop: 0%| | 0/375 [00:00, ?it/s]","name":"stderr"},{"output_type":"stream","text":"epoch: 9, training loss: 1.873249, training acc: 0.973659, training depends: 0.792058, valid loss: 10.517039, valid acc: 0.848481, valid depends: 0.697743\n\n","name":"stdout"},{"output_type":"stream","text":"train minibatch loop: 100%|██████████| 375/375 [02:46<00:00, 2.25it/s, accuracy=0.98, accuracy_depends=0.818, cost=1.59] \ntest minibatch loop: 100%|██████████| 120/120 [00:24<00:00, 4.88it/s, accuracy=0.915, accuracy_depends=0.773, cost=9.24]\ntrain minibatch loop: 0%| | 0/375 [00:00, ?it/s]","name":"stderr"},{"output_type":"stream","text":"epoch: 10, training loss: 1.480060, training acc: 0.980323, training depends: 0.800714, valid loss: 11.094497, valid acc: 0.848848, valid depends: 0.700038\n\n","name":"stdout"},{"output_type":"stream","text":"train minibatch loop: 100%|██████████| 375/375 [02:49<00:00, 2.22it/s, accuracy=0.975, accuracy_depends=0.852, cost=1.57] \ntest minibatch loop: 100%|██████████| 120/120 [00:23<00:00, 5.11it/s, accuracy=0.903, accuracy_depends=0.789, cost=8.94]\ntrain minibatch loop: 0%| | 0/375 [00:00, ?it/s]","name":"stderr"},{"output_type":"stream","text":"epoch: 11, training loss: 1.274817, training acc: 0.983628, training depends: 0.805342, valid loss: 11.517880, valid acc: 0.849974, valid depends: 0.704320\n\n","name":"stdout"},{"output_type":"stream","text":"train minibatch loop: 100%|██████████| 375/375 [02:48<00:00, 2.22it/s, accuracy=0.983, accuracy_depends=0.836, cost=1.42] \ntest minibatch loop: 100%|██████████| 120/120 [00:23<00:00, 5.07it/s, accuracy=0.907, accuracy_depends=0.814, cost=8.96]\ntrain minibatch loop: 0%| | 0/375 [00:00, ?it/s]","name":"stderr"},{"output_type":"stream","text":"epoch: 12, training loss: 1.018653, training acc: 0.987874, training depends: 0.813133, valid loss: 12.010469, valid acc: 0.853442, valid depends: 0.709487\n\n","name":"stdout"},{"output_type":"stream","text":"train minibatch loop: 100%|██████████| 375/375 [02:47<00:00, 2.24it/s, accuracy=0.993, accuracy_depends=0.848, cost=0.988]\ntest minibatch loop: 100%|██████████| 120/120 [00:23<00:00, 5.10it/s, accuracy=0.907, accuracy_depends=0.834, cost=9.45]\ntrain minibatch loop: 0%| | 0/375 [00:00, ?it/s]","name":"stderr"},{"output_type":"stream","text":"epoch: 13, training loss: 0.844937, training acc: 0.990424, training depends: 0.820007, valid loss: 12.344518, valid acc: 0.853835, valid depends: 0.708826\n\n","name":"stdout"},{"output_type":"stream","text":"train minibatch loop: 100%|██████████| 375/375 [02:48<00:00, 2.22it/s, accuracy=0.997, accuracy_depends=0.855, cost=0.673]\ntest minibatch loop: 100%|██████████| 120/120 [00:23<00:00, 5.09it/s, accuracy=0.907, accuracy_depends=0.834, cost=8.72]","name":"stderr"},{"output_type":"stream","text":"epoch: 14, training loss: 0.746680, training acc: 0.991965, training depends: 0.824823, valid loss: 12.723352, valid acc: 0.852817, valid depends: 0.712279\n\n","name":"stdout"},{"output_type":"stream","text":"\n","name":"stderr"}]},{"metadata":{"trusted":true},"cell_type":"code","source":"def evaluate(heads_pred, types_pred, heads, types, lengths,\n symbolic_root=False, symbolic_end=False):\n batch_size, _ = words.shape\n ucorr = 0.\n lcorr = 0.\n total = 0.\n ucomplete_match = 0.\n lcomplete_match = 0.\n\n corr_root = 0.\n total_root = 0.\n start = 1 if symbolic_root else 0\n end = 1 if symbolic_end else 0\n for i in range(batch_size):\n ucm = 1.\n lcm = 1.\n for j in range(start, lengths[i] - end):\n\n total += 1\n if heads[i, j] == heads_pred[i, j]:\n ucorr += 1\n if types[i, j] == types_pred[i, j]:\n lcorr += 1\n else:\n lcm = 0\n else:\n ucm = 0\n lcm = 0\n\n if heads[i, j] == 0:\n total_root += 1\n corr_root += 1 if heads_pred[i, j] == 0 else 0\n\n ucomplete_match += ucm\n lcomplete_match += lcm\n\n return (ucorr, lcorr, total, ucomplete_match, lcomplete_match), \\\n (corr_root, total_root), batch_size","execution_count":20,"outputs":[]},{"metadata":{"trusted":true},"cell_type":"code","source":"tags_seq, heads = sess.run(\n [model.tags_seq, model.heads],\n feed_dict = {\n model.word_ids: batch_x,\n model.char_ids: batch_char\n },\n)\ntags_seq[0], heads[0], batch_depends[0]","execution_count":21,"outputs":[{"output_type":"execute_result","execution_count":21,"data":{"text/plain":"(array([15, 6, 22, 26, 23, 18, 16, 5, 3, 13, 10, 11, 6, 12, 31, 10, 16,\n 7, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,\n 0], dtype=int32),\n array([ 2, 8, 5, 5, 2, 8, 8, 0, 11, 11, 8, 14, 14, 8, 16, 14, 14,\n 8, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,\n 0]),\n array([ 2, 8, 5, 5, 2, 8, 8, 0, 11, 11, 8, 14, 14, 8, 16, 14, 14,\n 8, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,\n 0], dtype=int32))"},"metadata":{}}]},{"metadata":{"trusted":true},"cell_type":"code","source":"def evaluate(heads_pred, types_pred, heads, types, lengths,\n symbolic_root=False, symbolic_end=False):\n batch_size, _ = heads_pred.shape\n ucorr = 0.\n lcorr = 0.\n total = 0.\n ucomplete_match = 0.\n lcomplete_match = 0.\n\n corr_root = 0.\n total_root = 0.\n start = 1 if symbolic_root else 0\n end = 1 if symbolic_end else 0\n for i in range(batch_size):\n ucm = 1.\n lcm = 1.\n for j in range(start, lengths[i] - end):\n\n total += 1\n if heads[i, j] == heads_pred[i, j]:\n ucorr += 1\n if types[i, j] == types_pred[i, j]:\n lcorr += 1\n else:\n lcm = 0\n else:\n ucm = 0\n lcm = 0\n\n if heads[i, j] == 0:\n total_root += 1\n corr_root += 1 if heads_pred[i, j] == 0 else 0\n\n ucomplete_match += ucm\n lcomplete_match += lcm\n \n return ucorr / total, lcorr / total, corr_root / total_root","execution_count":22,"outputs":[]},{"metadata":{"trusted":true},"cell_type":"code","source":"arc_accuracy, type_accuracy, root_accuracy = evaluate(heads, tags_seq, batch_depends, batch_y, \n np.count_nonzero(batch_x, axis = 1))\narc_accuracy, type_accuracy, root_accuracy","execution_count":23,"outputs":[{"output_type":"execute_result","execution_count":23,"data":{"text/plain":"(0.8340080971659919, 0.7813765182186235, 0.9375)"},"metadata":{}}]},{"metadata":{"trusted":true},"cell_type":"code","source":"arcs, types, roots = [], [], []\n\npbar = tqdm(\n range(0, len(test_X), batch_size), desc = 'test minibatch loop'\n)\nfor i in pbar:\n index = min(i + batch_size, len(test_X))\n batch_x = test_X[i: index]\n batch_x = pad_sequences(batch_x,padding='post')\n batch_char = test_char[i: index]\n batch_char = generate_char_seq(batch_char)\n batch_y = test_Y[i: index]\n batch_y = pad_sequences(batch_y,padding='post')\n batch_depends = test_depends[i: index]\n batch_depends = pad_sequences(batch_depends,padding='post')\n \n tags_seq, heads = sess.run(\n [model.tags_seq, model.heads],\n feed_dict = {\n model.word_ids: batch_x,\n model.char_ids: batch_char\n },\n )\n \n arc_accuracy, type_accuracy, root_accuracy = evaluate(heads, tags_seq, batch_depends, batch_y, \n np.count_nonzero(batch_x, axis = 1))\n pbar.set_postfix(arc_accuracy = arc_accuracy, type_accuracy = type_accuracy, \n root_accuracy = root_accuracy)\n arcs.append(arc_accuracy)\n types.append(type_accuracy)\n roots.append(root_accuracy)","execution_count":24,"outputs":[{"output_type":"stream","text":"test minibatch loop: 100%|██████████| 120/120 [00:22<00:00, 5.45it/s, arc_accuracy=0.834, root_accuracy=0.938, type_accuracy=0.781]\n","name":"stderr"}]},{"metadata":{"trusted":true},"cell_type":"code","source":"print('arc accuracy:', np.mean(arcs))\nprint('types accuracy:', np.mean(types))\nprint('root accuracy:', np.mean(roots))","execution_count":25,"outputs":[{"output_type":"stream","text":"arc accuracy: 0.7122794390376872\ntypes accuracy: 0.6573356968974766\nroot accuracy: 0.6723958333333333\n","name":"stdout"}]},{"metadata":{"trusted":true},"cell_type":"code","source":"","execution_count":null,"outputs":[]}],"metadata":{"kernelspec":{"language":"python","display_name":"Python 3","name":"python3"},"language_info":{"pygments_lexer":"ipython3","nbconvert_exporter":"python","version":"3.6.4","file_extension":".py","codemirror_mode":{"name":"ipython","version":3},"name":"python","mimetype":"text/x-python"}},"nbformat":4,"nbformat_minor":1}
\ No newline at end of file
diff --git a/dependency-parser/3.residual-network-bahdanau.ipynb b/dependency-parser/3.residual-network-bahdanau.ipynb
deleted file mode 100644
index 3660782..0000000
--- a/dependency-parser/3.residual-network-bahdanau.ipynb
+++ /dev/null
@@ -1,860 +0,0 @@
-{
- "cells": [
- {
- "cell_type": "code",
- "execution_count": 1,
- "metadata": {},
- "outputs": [],
- "source": [
- "import tensorflow as tf\n",
- "from tqdm import tqdm\n",
- "import numpy as np"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": 2,
- "metadata": {},
- "outputs": [],
- "source": [
- "with open('test.conll.txt') as fopen:\n",
- " corpus = fopen.read().split('\\n')\n",
- " \n",
- "with open('dev.conll.txt') as fopen:\n",
- " corpus_test = fopen.read().split('\\n')"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": 3,
- "metadata": {},
- "outputs": [],
- "source": [
- "word2idx = {'PAD': 0,'NUM':1,'UNK':2}\n",
- "tag2idx = {'PAD': 0}\n",
- "char2idx = {'PAD': 0}\n",
- "word_idx = 3\n",
- "tag_idx = 1\n",
- "char_idx = 1\n",
- "\n",
- "def process_corpus(corpus, until = None):\n",
- " global word2idx, tag2idx, char2idx, word_idx, tag_idx, char_idx\n",
- " words, depends, labels = [], [], []\n",
- " temp_word, temp_depend, temp_label = [], [], []\n",
- " for sentence in corpus:\n",
- " if len(sentence):\n",
- " sentence = sentence.split('\\t')\n",
- " for c in sentence[1]:\n",
- " if c not in char2idx:\n",
- " char2idx[c] = char_idx\n",
- " char_idx += 1\n",
- " if sentence[7] not in tag2idx:\n",
- " tag2idx[sentence[7]] = tag_idx\n",
- " tag_idx += 1\n",
- " if sentence[1] not in word2idx:\n",
- " word2idx[sentence[1]] = word_idx\n",
- " word_idx += 1\n",
- " temp_word.append(word2idx[sentence[1]])\n",
- " temp_depend.append(int(sentence[6]))\n",
- " temp_label.append(tag2idx[sentence[7]])\n",
- " else:\n",
- " words.append(temp_word)\n",
- " depends.append(temp_depend)\n",
- " labels.append(temp_label)\n",
- " temp_word = []\n",
- " temp_depend = []\n",
- " temp_label = []\n",
- " return words[:-1], depends[:-1], labels[:-1]\n",
- " \n",
- "words, depends, labels = process_corpus(corpus)\n",
- "words_test, depends_test, labels_test = process_corpus(corpus_test)"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": 4,
- "metadata": {},
- "outputs": [
- {
- "name": "stderr",
- "output_type": "stream",
- "text": [
- "Using TensorFlow backend.\n"
- ]
- }
- ],
- "source": [
- "from keras.preprocessing.sequence import pad_sequences"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": 5,
- "metadata": {},
- "outputs": [],
- "source": [
- "words = pad_sequences(words,padding='post')\n",
- "depends = pad_sequences(depends,padding='post')\n",
- "labels = pad_sequences(labels,padding='post')\n",
- "\n",
- "words_test = pad_sequences(words_test,padding='post')\n",
- "depends_test = pad_sequences(depends_test,padding='post')\n",
- "labels_test = pad_sequences(labels_test,padding='post')"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": 6,
- "metadata": {},
- "outputs": [],
- "source": [
- "idx2word = {idx: tag for tag, idx in word2idx.items()}\n",
- "idx2tag = {i: w for w, i in tag2idx.items()}\n",
- "\n",
- "train_X = words\n",
- "train_Y = labels\n",
- "train_depends = depends\n",
- "\n",
- "test_X = words_test\n",
- "test_Y = labels_test\n",
- "test_depends = depends_test"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": 7,
- "metadata": {},
- "outputs": [],
- "source": [
- "maxlen = max(train_X.shape[1], test_X.shape[1])\n",
- "\n",
- "train_X = pad_sequences(train_X,padding='post',maxlen=maxlen)\n",
- "train_Y = pad_sequences(train_Y,padding='post',maxlen=maxlen)\n",
- "train_depends = pad_sequences(train_depends,padding='post',maxlen=maxlen)\n",
- "\n",
- "test_X = pad_sequences(test_X,padding='post',maxlen=maxlen)\n",
- "test_Y = pad_sequences(test_Y,padding='post',maxlen=maxlen)\n",
- "test_depends = pad_sequences(test_depends,padding='post',maxlen=maxlen)"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": 8,
- "metadata": {},
- "outputs": [],
- "source": [
- "class Attention:\n",
- " def __init__(self,hidden_size):\n",
- " self.hidden_size = hidden_size\n",
- " self.dense_layer = tf.layers.Dense(hidden_size)\n",
- " self.v = tf.random_normal([hidden_size],mean=0,stddev=1/np.sqrt(hidden_size))\n",
- " \n",
- " def score(self, hidden_tensor, encoder_outputs):\n",
- " energy = tf.nn.tanh(self.dense_layer(tf.concat([hidden_tensor,encoder_outputs],2)))\n",
- " energy = tf.transpose(energy,[0,2,1])\n",
- " batch_size = tf.shape(encoder_outputs)[0]\n",
- " v = tf.expand_dims(tf.tile(tf.expand_dims(self.v,0),[batch_size,1]),1)\n",
- " energy = tf.matmul(v,energy)\n",
- " return tf.squeeze(energy,1)\n",
- " \n",
- " def __call__(self, hidden, encoder_outputs):\n",
- " seq_len = tf.shape(encoder_outputs)[1]\n",
- " batch_size = tf.shape(encoder_outputs)[0]\n",
- " H = tf.tile(tf.expand_dims(hidden, 1),[1,seq_len,1])\n",
- " attn_energies = self.score(H,encoder_outputs)\n",
- " return tf.expand_dims(tf.nn.softmax(attn_energies),1)\n",
- "\n",
- "class Model:\n",
- " def __init__(\n",
- " self,\n",
- " dict_size,\n",
- " size_layers,\n",
- " learning_rate,\n",
- " maxlen,\n",
- " num_blocks = 3,\n",
- " block_size = 128,\n",
- " ):\n",
- " self.word_ids = tf.placeholder(tf.int32, shape = [None, maxlen])\n",
- " self.labels = tf.placeholder(tf.int32, shape = [None, maxlen])\n",
- " self.depends = tf.placeholder(tf.int32, shape = [None, maxlen])\n",
- " embeddings = tf.Variable(tf.random_uniform([dict_size, size_layers], -1, 1))\n",
- " embedded = tf.nn.embedding_lookup(embeddings, self.word_ids)\n",
- " self.attention = Attention(size_layers)\n",
- " self.maxlen = tf.shape(self.word_ids)[1]\n",
- " self.lengths = tf.count_nonzero(self.word_ids, 1)\n",
- "\n",
- " def residual_block(x, size, rate, block):\n",
- " with tf.variable_scope(\n",
- " 'block_%d_%d' % (block, rate), reuse = False\n",
- " ):\n",
- " attn_weights = self.attention(tf.reduce_sum(x,axis=1), x)\n",
- " conv_filter = tf.layers.conv1d(\n",
- " attn_weights,\n",
- " x.shape[2] // 4,\n",
- " kernel_size = size,\n",
- " strides = 1,\n",
- " padding = 'same',\n",
- " dilation_rate = rate,\n",
- " activation = tf.nn.tanh,\n",
- " )\n",
- " conv_gate = tf.layers.conv1d(\n",
- " x,\n",
- " x.shape[2] // 4,\n",
- " kernel_size = size,\n",
- " strides = 1,\n",
- " padding = 'same',\n",
- " dilation_rate = rate,\n",
- " activation = tf.nn.sigmoid,\n",
- " )\n",
- " out = tf.multiply(conv_filter, conv_gate)\n",
- " out = tf.layers.conv1d(\n",
- " out,\n",
- " block_size,\n",
- " kernel_size = 1,\n",
- " strides = 1,\n",
- " padding = 'same',\n",
- " activation = tf.nn.tanh,\n",
- " )\n",
- " return tf.add(x, out), out\n",
- "\n",
- " forward = tf.layers.conv1d(\n",
- " embedded, block_size, kernel_size = 1, strides = 1, padding = 'SAME'\n",
- " )\n",
- " zeros = tf.zeros_like(forward)\n",
- " for i in range(num_blocks):\n",
- " for r in [1, 2, 4, 8, 16]:\n",
- " forward, s = residual_block(\n",
- " forward, size = 7, rate = r, block = i\n",
- " )\n",
- " zeros = tf.add(zeros, s)\n",
- " logits = tf.layers.conv1d(\n",
- " zeros, len(idx2tag), kernel_size = 1, strides = 1, padding = 'SAME'\n",
- " )\n",
- " logits_depends = tf.layers.conv1d(\n",
- " zeros, maxlen, kernel_size = 1, strides = 1, padding = 'SAME'\n",
- " )\n",
- " log_likelihood, transition_params = tf.contrib.crf.crf_log_likelihood(\n",
- " logits, self.labels, self.lengths\n",
- " )\n",
- " with tf.variable_scope(\"depends\"):\n",
- " log_likelihood_depends, transition_params_depends = tf.contrib.crf.crf_log_likelihood(\n",
- " logits_depends, self.depends, self.lengths\n",
- " )\n",
- " self.cost = tf.reduce_mean(-log_likelihood) + tf.reduce_mean(-log_likelihood_depends)\n",
- " self.optimizer = tf.train.AdamOptimizer(\n",
- " learning_rate = learning_rate\n",
- " ).minimize(self.cost)\n",
- " \n",
- " mask = tf.sequence_mask(self.lengths, maxlen = self.maxlen)\n",
- " \n",
- " self.tags_seq, _ = tf.contrib.crf.crf_decode(\n",
- " logits, transition_params, self.lengths\n",
- " )\n",
- " self.tags_seq_depends, _ = tf.contrib.crf.crf_decode(\n",
- " logits_depends, transition_params_depends, self.lengths\n",
- " )\n",
- "\n",
- " self.prediction = tf.boolean_mask(self.tags_seq, mask)\n",
- " mask_label = tf.boolean_mask(self.labels, mask)\n",
- " correct_pred = tf.equal(self.prediction, mask_label)\n",
- " correct_index = tf.cast(correct_pred, tf.float32)\n",
- " self.accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32))\n",
- " \n",
- " self.prediction = tf.boolean_mask(self.tags_seq_depends, mask)\n",
- " mask_label = tf.boolean_mask(self.depends, mask)\n",
- " correct_pred = tf.equal(self.prediction, mask_label)\n",
- " correct_index = tf.cast(correct_pred, tf.float32)\n",
- " self.accuracy_depends = tf.reduce_mean(tf.cast(correct_pred, tf.float32))"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": 9,
- "metadata": {},
- "outputs": [
- {
- "name": "stderr",
- "output_type": "stream",
- "text": [
- "/usr/local/lib/python3.6/dist-packages/tensorflow/python/ops/gradients_impl.py:100: UserWarning: Converting sparse IndexedSlices to a dense Tensor of unknown shape. This may consume a large amount of memory.\n",
- " \"Converting sparse IndexedSlices to a dense Tensor of unknown shape. \"\n"
- ]
- }
- ],
- "source": [
- "tf.reset_default_graph()\n",
- "sess = tf.InteractiveSession()\n",
- "\n",
- "dim = 256\n",
- "dropout = 1\n",
- "learning_rate = 1e-3\n",
- "batch_size = 32\n",
- "\n",
- "model = Model(len(word2idx), dim, learning_rate, maxlen)\n",
- "sess.run(tf.global_variables_initializer())"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": 10,
- "metadata": {},
- "outputs": [
- {
- "name": "stderr",
- "output_type": "stream",
- "text": [
- "train minibatch loop: 100%|██████████| 76/76 [00:17<00:00, 3.64it/s, accuracy=0.497, accuracy_depends=0.0582, cost=96.4]\n",
- "test minibatch loop: 100%|██████████| 54/54 [00:04<00:00, 10.84it/s, accuracy=0.518, accuracy_depends=0.0636, cost=151]\n",
- "train minibatch loop: 0%| | 0/76 [00:00, ?it/s]"
- ]
- },
- {
- "name": "stdout",
- "output_type": "stream",
- "text": [
- "time taken: 22.717660427093506\n",
- "epoch: 0, training loss: 153.183009, training acc: 0.294935, training depends: 0.048782, valid loss: 129.335049, valid acc: 0.491078, valid depends: 0.077788\n",
- "\n"
- ]
- },
- {
- "name": "stderr",
- "output_type": "stream",
- "text": [
- "train minibatch loop: 100%|██████████| 76/76 [00:15<00:00, 5.47it/s, accuracy=0.634, accuracy_depends=0.171, cost=71.4]\n",
- "test minibatch loop: 100%|██████████| 54/54 [00:04<00:00, 12.90it/s, accuracy=0.755, accuracy_depends=0.145, cost=113] \n",
- "train minibatch loop: 0%| | 0/76 [00:00, ?it/s]"
- ]
- },
- {
- "name": "stdout",
- "output_type": "stream",
- "text": [
- "time taken: 20.58572268486023\n",
- "epoch: 1, training loss: 110.329521, training acc: 0.580832, training depends: 0.113730, valid loss: 100.057401, valid acc: 0.659700, valid depends: 0.141831\n",
- "\n"
- ]
- },
- {
- "name": "stderr",
- "output_type": "stream",
- "text": [
- "train minibatch loop: 100%|██████████| 76/76 [00:15<00:00, 5.48it/s, accuracy=0.726, accuracy_depends=0.226, cost=59.4]\n",
- "test minibatch loop: 100%|██████████| 54/54 [00:04<00:00, 12.67it/s, accuracy=0.818, accuracy_depends=0.127, cost=97.3]\n",
- "train minibatch loop: 0%| | 0/76 [00:00, ?it/s, accuracy=0.739, accuracy_depends=0.168, cost=79.6]"
- ]
- },
- {
- "name": "stdout",
- "output_type": "stream",
- "text": [
- "time taken: 20.490660667419434\n",
- "epoch: 2, training loss: 88.406940, training acc: 0.712373, training depends: 0.173413, valid loss: 88.115075, valid acc: 0.722415, valid depends: 0.169411\n",
- "\n"
- ]
- },
- {
- "name": "stderr",
- "output_type": "stream",
- "text": [
- "train minibatch loop: 100%|██████████| 76/76 [00:15<00:00, 5.35it/s, accuracy=0.805, accuracy_depends=0.281, cost=50.7]\n",
- "test minibatch loop: 100%|██████████| 54/54 [00:04<00:00, 11.75it/s, accuracy=0.8, accuracy_depends=0.155, cost=90.2] \n",
- "train minibatch loop: 0%| | 0/76 [00:00, ?it/s]"
- ]
- },
- {
- "name": "stdout",
- "output_type": "stream",
- "text": [
- "time taken: 20.47122573852539\n",
- "epoch: 3, training loss: 76.711710, training acc: 0.783817, training depends: 0.216270, valid loss: 81.675866, valid acc: 0.755237, valid depends: 0.195363\n",
- "\n"
- ]
- },
- {
- "name": "stderr",
- "output_type": "stream",
- "text": [
- "train minibatch loop: 100%|██████████| 76/76 [00:16<00:00, 5.43it/s, accuracy=0.873, accuracy_depends=0.349, cost=43.6]\n",
- "test minibatch loop: 100%|██████████| 54/54 [00:04<00:00, 12.81it/s, accuracy=0.818, accuracy_depends=0.182, cost=85.8]\n",
- "train minibatch loop: 0%| | 0/76 [00:00, ?it/s]"
- ]
- },
- {
- "name": "stdout",
- "output_type": "stream",
- "text": [
- "time taken: 20.655714988708496\n",
- "epoch: 4, training loss: 68.472157, training acc: 0.831365, training depends: 0.259200, valid loss: 77.894161, valid acc: 0.775189, valid depends: 0.222546\n",
- "\n"
- ]
- },
- {
- "name": "stderr",
- "output_type": "stream",
- "text": [
- "train minibatch loop: 100%|██████████| 76/76 [00:15<00:00, 5.49it/s, accuracy=0.938, accuracy_depends=0.38, cost=37.7] \n",
- "test minibatch loop: 100%|██████████| 54/54 [00:04<00:00, 11.73it/s, accuracy=0.827, accuracy_depends=0.245, cost=83.4]\n",
- "train minibatch loop: 0%| | 0/76 [00:00, ?it/s]"
- ]
- },
- {
- "name": "stdout",
- "output_type": "stream",
- "text": [
- "time taken: 20.527581453323364\n",
- "epoch: 5, training loss: 61.882243, training acc: 0.866012, training depends: 0.299192, valid loss: 75.928109, valid acc: 0.782752, valid depends: 0.242099\n",
- "\n"
- ]
- },
- {
- "name": "stderr",
- "output_type": "stream",
- "text": [
- "train minibatch loop: 100%|██████████| 76/76 [00:15<00:00, 5.48it/s, accuracy=0.962, accuracy_depends=0.469, cost=32.9]\n",
- "test minibatch loop: 100%|██████████| 54/54 [00:04<00:00, 12.72it/s, accuracy=0.827, accuracy_depends=0.245, cost=82.1]\n",
- "train minibatch loop: 0%| | 0/76 [00:00, ?it/s]"
- ]
- },
- {
- "name": "stdout",
- "output_type": "stream",
- "text": [
- "time taken: 20.484257698059082\n",
- "epoch: 6, training loss: 56.255667, training acc: 0.893967, training depends: 0.341305, valid loss: 75.448544, valid acc: 0.786553, valid depends: 0.254019\n",
- "\n"
- ]
- },
- {
- "name": "stderr",
- "output_type": "stream",
- "text": [
- "train minibatch loop: 100%|██████████| 76/76 [00:15<00:00, 5.38it/s, accuracy=0.979, accuracy_depends=0.534, cost=28.7]\n",
- "test minibatch loop: 100%|██████████| 54/54 [00:04<00:00, 12.78it/s, accuracy=0.827, accuracy_depends=0.236, cost=82.8]\n",
- "train minibatch loop: 0%| | 0/76 [00:00, ?it/s]"
- ]
- },
- {
- "name": "stdout",
- "output_type": "stream",
- "text": [
- "time taken: 20.61243438720703\n",
- "epoch: 7, training loss: 51.263502, training acc: 0.917225, training depends: 0.383208, valid loss: 76.300179, valid acc: 0.783883, valid depends: 0.259615\n",
- "\n"
- ]
- },
- {
- "name": "stderr",
- "output_type": "stream",
- "text": [
- "train minibatch loop: 100%|██████████| 76/76 [00:15<00:00, 5.49it/s, accuracy=0.986, accuracy_depends=0.579, cost=25.1]\n",
- "test minibatch loop: 100%|██████████| 54/54 [00:04<00:00, 13.08it/s, accuracy=0.8, accuracy_depends=0.218, cost=86.5] \n",
- "train minibatch loop: 0%| | 0/76 [00:00, ?it/s]"
- ]
- },
- {
- "name": "stdout",
- "output_type": "stream",
- "text": [
- "time taken: 20.4697368144989\n",
- "epoch: 8, training loss: 46.744222, training acc: 0.934576, training depends: 0.428258, valid loss: 77.713942, valid acc: 0.784674, valid depends: 0.261951\n",
- "\n"
- ]
- },
- {
- "name": "stderr",
- "output_type": "stream",
- "text": [
- "train minibatch loop: 100%|██████████| 76/76 [00:15<00:00, 5.40it/s, accuracy=0.986, accuracy_depends=0.644, cost=21.8]\n",
- "test minibatch loop: 100%|██████████| 54/54 [00:04<00:00, 12.54it/s, accuracy=0.836, accuracy_depends=0.209, cost=94.3]\n",
- "train minibatch loop: 0%| | 0/76 [00:00, ?it/s]"
- ]
- },
- {
- "name": "stdout",
- "output_type": "stream",
- "text": [
- "time taken: 20.674675703048706\n",
- "epoch: 9, training loss: 42.561942, training acc: 0.949874, training depends: 0.470683, valid loss: 81.004866, valid acc: 0.780429, valid depends: 0.263705\n",
- "\n"
- ]
- },
- {
- "name": "stderr",
- "output_type": "stream",
- "text": [
- "train minibatch loop: 100%|██████████| 76/76 [00:15<00:00, 5.46it/s, accuracy=0.986, accuracy_depends=0.705, cost=19.1]\n",
- "test minibatch loop: 100%|██████████| 54/54 [00:04<00:00, 12.57it/s, accuracy=0.855, accuracy_depends=0.264, cost=89.2]\n",
- "train minibatch loop: 0%| | 0/76 [00:00, ?it/s]"
- ]
- },
- {
- "name": "stdout",
- "output_type": "stream",
- "text": [
- "time taken: 20.510538816452026\n",
- "epoch: 10, training loss: 38.979279, training acc: 0.960487, training depends: 0.507692, valid loss: 79.328121, valid acc: 0.784113, valid depends: 0.287815\n",
- "\n"
- ]
- },
- {
- "name": "stderr",
- "output_type": "stream",
- "text": [
- "train minibatch loop: 100%|██████████| 76/76 [00:15<00:00, 5.51it/s, accuracy=0.993, accuracy_depends=0.777, cost=15.1]\n",
- "test minibatch loop: 100%|██████████| 54/54 [00:04<00:00, 12.50it/s, accuracy=0.836, accuracy_depends=0.3, cost=85.7] \n",
- "train minibatch loop: 0%| | 0/76 [00:00, ?it/s]"
- ]
- },
- {
- "name": "stdout",
- "output_type": "stream",
- "text": [
- "time taken: 20.53708791732788\n",
- "epoch: 11, training loss: 35.629965, training acc: 0.968517, training depends: 0.553509, valid loss: 81.083164, valid acc: 0.784796, valid depends: 0.287922\n",
- "\n"
- ]
- },
- {
- "name": "stderr",
- "output_type": "stream",
- "text": [
- "train minibatch loop: 100%|██████████| 76/76 [00:15<00:00, 5.47it/s, accuracy=0.997, accuracy_depends=0.846, cost=13.3]\n",
- "test minibatch loop: 100%|██████████| 54/54 [00:04<00:00, 12.81it/s, accuracy=0.818, accuracy_depends=0.291, cost=96.6]\n",
- "train minibatch loop: 0%| | 0/76 [00:00, ?it/s]"
- ]
- },
- {
- "name": "stdout",
- "output_type": "stream",
- "text": [
- "time taken: 20.565435647964478\n",
- "epoch: 12, training loss: 32.694344, training acc: 0.974347, training depends: 0.593964, valid loss: 84.288952, valid acc: 0.781696, valid depends: 0.291779\n",
- "\n"
- ]
- },
- {
- "name": "stderr",
- "output_type": "stream",
- "text": [
- "train minibatch loop: 100%|██████████| 76/76 [00:15<00:00, 5.49it/s, accuracy=1, accuracy_depends=0.904, cost=10.5] \n",
- "test minibatch loop: 100%|██████████| 54/54 [00:04<00:00, 12.77it/s, accuracy=0.8, accuracy_depends=0.309, cost=97] \n",
- "train minibatch loop: 0%| | 0/76 [00:00, ?it/s]"
- ]
- },
- {
- "name": "stdout",
- "output_type": "stream",
- "text": [
- "time taken: 20.49126100540161\n",
- "epoch: 13, training loss: 28.929201, training acc: 0.978603, training depends: 0.648120, valid loss: 87.836118, valid acc: 0.779976, valid depends: 0.289016\n",
- "\n"
- ]
- },
- {
- "name": "stderr",
- "output_type": "stream",
- "text": [
- "train minibatch loop: 100%|██████████| 76/76 [00:15<00:00, 5.50it/s, accuracy=1, accuracy_depends=0.914, cost=9.43] \n",
- "test minibatch loop: 100%|██████████| 54/54 [00:04<00:00, 12.41it/s, accuracy=0.809, accuracy_depends=0.364, cost=94] \n",
- "train minibatch loop: 0%| | 0/76 [00:00, ?it/s]"
- ]
- },
- {
- "name": "stdout",
- "output_type": "stream",
- "text": [
- "time taken: 20.524020671844482\n",
- "epoch: 14, training loss: 26.025858, training acc: 0.981591, training depends: 0.689773, valid loss: 89.180642, valid acc: 0.776449, valid depends: 0.298843\n",
- "\n"
- ]
- },
- {
- "name": "stderr",
- "output_type": "stream",
- "text": [
- "train minibatch loop: 100%|██████████| 76/76 [00:15<00:00, 5.26it/s, accuracy=1, accuracy_depends=0.918, cost=7.58] \n",
- "test minibatch loop: 100%|██████████| 54/54 [00:04<00:00, 11.67it/s, accuracy=0.8, accuracy_depends=0.318, cost=99.5] \n",
- "train minibatch loop: 0%| | 0/76 [00:00, ?it/s]"
- ]
- },
- {
- "name": "stdout",
- "output_type": "stream",
- "text": [
- "time taken: 20.600938320159912\n",
- "epoch: 15, training loss: 23.192569, training acc: 0.987144, training depends: 0.726383, valid loss: 94.854855, valid acc: 0.770395, valid depends: 0.286692\n",
- "\n"
- ]
- },
- {
- "name": "stderr",
- "output_type": "stream",
- "text": [
- "train minibatch loop: 100%|██████████| 76/76 [00:15<00:00, 5.36it/s, accuracy=0.997, accuracy_depends=0.955, cost=5.65]\n",
- "test minibatch loop: 100%|██████████| 54/54 [00:04<00:00, 12.75it/s, accuracy=0.809, accuracy_depends=0.355, cost=102] \n",
- "train minibatch loop: 0%| | 0/76 [00:00, ?it/s]"
- ]
- },
- {
- "name": "stdout",
- "output_type": "stream",
- "text": [
- "time taken: 20.59098505973816\n",
- "epoch: 16, training loss: 20.472838, training acc: 0.992866, training depends: 0.764563, valid loss: 97.303479, valid acc: 0.769268, valid depends: 0.287900\n",
- "\n"
- ]
- },
- {
- "name": "stderr",
- "output_type": "stream",
- "text": [
- "train minibatch loop: 100%|██████████| 76/76 [00:16<00:00, 5.41it/s, accuracy=1, accuracy_depends=0.986, cost=4.23] \n",
- "test minibatch loop: 100%|██████████| 54/54 [00:04<00:00, 12.59it/s, accuracy=0.809, accuracy_depends=0.336, cost=104] \n",
- "train minibatch loop: 0%| | 0/76 [00:00, ?it/s]"
- ]
- },
- {
- "name": "stdout",
- "output_type": "stream",
- "text": [
- "time taken: 20.69632053375244\n",
- "epoch: 17, training loss: 17.953982, training acc: 0.995921, training depends: 0.801940, valid loss: 100.876137, valid acc: 0.769323, valid depends: 0.283341\n",
- "\n"
- ]
- },
- {
- "name": "stderr",
- "output_type": "stream",
- "text": [
- "train minibatch loop: 100%|██████████| 76/76 [00:16<00:00, 5.31it/s, accuracy=1, accuracy_depends=0.983, cost=3.82] \n",
- "test minibatch loop: 100%|██████████| 54/54 [00:04<00:00, 12.80it/s, accuracy=0.809, accuracy_depends=0.336, cost=102] \n",
- "train minibatch loop: 0%| | 0/76 [00:00, ?it/s]"
- ]
- },
- {
- "name": "stdout",
- "output_type": "stream",
- "text": [
- "time taken: 20.720519304275513\n",
- "epoch: 18, training loss: 15.907509, training acc: 0.998263, training depends: 0.829425, valid loss: 103.454220, valid acc: 0.772234, valid depends: 0.287102\n",
- "\n"
- ]
- },
- {
- "name": "stderr",
- "output_type": "stream",
- "text": [
- "train minibatch loop: 100%|██████████| 76/76 [00:15<00:00, 5.40it/s, accuracy=1, accuracy_depends=0.973, cost=3.46] \n",
- "test minibatch loop: 100%|██████████| 54/54 [00:04<00:00, 12.69it/s, accuracy=0.845, accuracy_depends=0.409, cost=101] "
- ]
- },
- {
- "name": "stdout",
- "output_type": "stream",
- "text": [
- "time taken: 20.618144035339355\n",
- "epoch: 19, training loss: 14.017608, training acc: 0.999365, training depends: 0.851850, valid loss: 108.007175, valid acc: 0.771574, valid depends: 0.283096\n",
- "\n"
- ]
- },
- {
- "name": "stderr",
- "output_type": "stream",
- "text": [
- "\n"
- ]
- }
- ],
- "source": [
- "import time\n",
- "\n",
- "for e in range(20):\n",
- " lasttime = time.time()\n",
- " train_acc, train_loss, test_acc, test_loss, train_acc_depends, test_acc_depends = 0, 0, 0, 0, 0, 0\n",
- " pbar = tqdm(\n",
- " range(0, len(train_X), batch_size), desc = 'train minibatch loop'\n",
- " )\n",
- " for i in pbar:\n",
- " batch_x = train_X[i : min(i + batch_size, train_X.shape[0])]\n",
- " batch_y = train_Y[i : min(i + batch_size, train_X.shape[0])]\n",
- " batch_depends = train_depends[i : min(i + batch_size, train_X.shape[0])]\n",
- " acc_depends, acc, cost, _ = sess.run(\n",
- " [model.accuracy_depends, model.accuracy, model.cost, model.optimizer],\n",
- " feed_dict = {\n",
- " model.word_ids: batch_x,\n",
- " model.labels: batch_y,\n",
- " model.depends: batch_depends\n",
- " },\n",
- " )\n",
- " assert not np.isnan(cost)\n",
- " train_loss += cost\n",
- " train_acc += acc\n",
- " train_acc_depends += acc_depends\n",
- " pbar.set_postfix(cost = cost, accuracy = acc, accuracy_depends = acc_depends)\n",
- " \n",
- " pbar = tqdm(\n",
- " range(0, len(test_X), batch_size), desc = 'test minibatch loop'\n",
- " )\n",
- " for i in pbar:\n",
- " batch_x = test_X[i : min(i + batch_size, test_X.shape[0])]\n",
- " batch_y = test_Y[i : min(i + batch_size, test_X.shape[0])]\n",
- " batch_depends = test_depends[i : min(i + batch_size, test_X.shape[0])]\n",
- " acc_depends, acc, cost = sess.run(\n",
- " [model.accuracy_depends, model.accuracy, model.cost],\n",
- " feed_dict = {\n",
- " model.word_ids: batch_x,\n",
- " model.labels: batch_y,\n",
- " model.depends: batch_depends\n",
- " },\n",
- " )\n",
- " assert not np.isnan(cost)\n",
- " test_loss += cost\n",
- " test_acc += acc\n",
- " test_acc_depends += acc_depends\n",
- " pbar.set_postfix(cost = cost, accuracy = acc, accuracy_depends = acc_depends)\n",
- " \n",
- " train_loss /= len(train_X) / batch_size\n",
- " train_acc /= len(train_X) / batch_size\n",
- " train_acc_depends /= len(train_X) / batch_size\n",
- " test_loss /= len(test_X) / batch_size\n",
- " test_acc /= len(test_X) / batch_size\n",
- " test_acc_depends /= len(test_X) / batch_size\n",
- "\n",
- " print('time taken:', time.time() - lasttime)\n",
- " print(\n",
- " 'epoch: %d, training loss: %f, training acc: %f, training depends: %f, valid loss: %f, valid acc: %f, valid depends: %f\\n'\n",
- " % (e, train_loss, train_acc, train_acc_depends, test_loss, test_acc, test_acc_depends)\n",
- " )"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": 11,
- "metadata": {},
- "outputs": [],
- "source": [
- "seq, deps = sess.run([model.tags_seq, model.tags_seq_depends],\n",
- " feed_dict={model.word_ids:batch_x[:1]})"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": 12,
- "metadata": {},
- "outputs": [],
- "source": [
- "seq = seq[0]\n",
- "deps = deps[0]"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": 13,
- "metadata": {},
- "outputs": [
- {
- "data": {
- "text/plain": [
- "array([18, 19, 2, 6, 3, 7, 16, 18, 23, 20, 19, 2], dtype=int32)"
- ]
- },
- "execution_count": 13,
- "metadata": {},
- "output_type": "execute_result"
- }
- ],
- "source": [
- "seq[seq>0]"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": 14,
- "metadata": {},
- "outputs": [
- {
- "data": {
- "text/plain": [
- "array([18, 19, 2, 6, 3, 7, 16, 18, 23, 20, 19, 2], dtype=int32)"
- ]
- },
- "execution_count": 14,
- "metadata": {},
- "output_type": "execute_result"
- }
- ],
- "source": [
- "batch_y[0][seq>0]"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": 15,
- "metadata": {},
- "outputs": [
- {
- "data": {
- "text/plain": [
- "array([ 2, 14, 11, 5, 6, 0, 3, 11, 11, 11, 6, 6], dtype=int32)"
- ]
- },
- "execution_count": 15,
- "metadata": {},
- "output_type": "execute_result"
- }
- ],
- "source": [
- "deps[seq>0]"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": 16,
- "metadata": {},
- "outputs": [
- {
- "data": {
- "text/plain": [
- "array([ 2, 6, 6, 5, 6, 0, 6, 11, 11, 11, 6, 6], dtype=int32)"
- ]
- },
- "execution_count": 16,
- "metadata": {},
- "output_type": "execute_result"
- }
- ],
- "source": [
- "batch_depends[0][seq>0]"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "metadata": {},
- "outputs": [],
- "source": []
- }
- ],
- "metadata": {
- "kernelspec": {
- "display_name": "Python 3",
- "language": "python",
- "name": "python3"
- },
- "language_info": {
- "codemirror_mode": {
- "name": "ipython",
- "version": 3
- },
- "file_extension": ".py",
- "mimetype": "text/x-python",
- "name": "python",
- "nbconvert_exporter": "python",
- "pygments_lexer": "ipython3",
- "version": "3.6.8"
- }
- },
- "nbformat": 4,
- "nbformat_minor": 2
-}
diff --git a/dependency-parser/4.bert-crf-biaffine.ipynb b/dependency-parser/4.bert-crf-biaffine.ipynb
new file mode 100644
index 0000000..d38c52d
--- /dev/null
+++ b/dependency-parser/4.bert-crf-biaffine.ipynb
@@ -0,0 +1,1217 @@
+{
+ "cells": [
+ {
+ "cell_type": "code",
+ "execution_count": 1,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "# !wget https://raw.githubusercontent.com/UniversalDependencies/UD_English-EWT/master/en_ewt-ud-dev.conllu\n",
+ "# !wget https://raw.githubusercontent.com/UniversalDependencies/UD_English-EWT/master/en_ewt-ud-train.conllu\n",
+ "# !wget https://raw.githubusercontent.com/UniversalDependencies/UD_English-EWT/master/en_ewt-ud-test.conllu\n",
+ "# !wget https://storage.googleapis.com/bert_models/2018_10_18/cased_L-12_H-768_A-12.zip\n",
+ "# !unzip cased_L-12_H-768_A-12.zip"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 2,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "tag2idx = {'PAD': 0, 'X': 1}\n",
+ "tag_idx = 2"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 3,
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stderr",
+ "output_type": "stream",
+ "text": [
+ "/home/husein/.local/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:516: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n",
+ " _np_qint8 = np.dtype([(\"qint8\", np.int8, 1)])\n",
+ "/home/husein/.local/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:517: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n",
+ " _np_quint8 = np.dtype([(\"quint8\", np.uint8, 1)])\n",
+ "/home/husein/.local/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:518: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n",
+ " _np_qint16 = np.dtype([(\"qint16\", np.int16, 1)])\n",
+ "/home/husein/.local/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:519: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n",
+ " _np_quint16 = np.dtype([(\"quint16\", np.uint16, 1)])\n",
+ "/home/husein/.local/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:520: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n",
+ " _np_qint32 = np.dtype([(\"qint32\", np.int32, 1)])\n",
+ "/home/husein/.local/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:525: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n",
+ " np_resource = np.dtype([(\"resource\", np.ubyte, 1)])\n"
+ ]
+ },
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/bert/optimization.py:87: The name tf.train.Optimizer is deprecated. Please use tf.compat.v1.train.Optimizer instead.\n",
+ "\n",
+ "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/bert/tokenization.py:125: The name tf.gfile.GFile is deprecated. Please use tf.io.gfile.GFile instead.\n",
+ "\n"
+ ]
+ },
+ {
+ "name": "stderr",
+ "output_type": "stream",
+ "text": [
+ "/home/husein/.local/lib/python3.6/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:541: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n",
+ " _np_qint8 = np.dtype([(\"qint8\", np.int8, 1)])\n",
+ "/home/husein/.local/lib/python3.6/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:542: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n",
+ " _np_quint8 = np.dtype([(\"quint8\", np.uint8, 1)])\n",
+ "/home/husein/.local/lib/python3.6/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:543: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n",
+ " _np_qint16 = np.dtype([(\"qint16\", np.int16, 1)])\n",
+ "/home/husein/.local/lib/python3.6/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:544: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n",
+ " _np_quint16 = np.dtype([(\"quint16\", np.uint16, 1)])\n",
+ "/home/husein/.local/lib/python3.6/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:545: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n",
+ " _np_qint32 = np.dtype([(\"qint32\", np.int32, 1)])\n",
+ "/home/husein/.local/lib/python3.6/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:550: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n",
+ " np_resource = np.dtype([(\"resource\", np.ubyte, 1)])\n"
+ ]
+ }
+ ],
+ "source": [
+ "import bert\n",
+ "from bert import run_classifier\n",
+ "from bert import optimization\n",
+ "from bert import tokenization\n",
+ "from bert import modeling\n",
+ "import tensorflow as tf\n",
+ "import numpy as np\n",
+ "\n",
+ "BERT_VOCAB = 'cased_L-12_H-768_A-12/vocab.txt'\n",
+ "BERT_INIT_CHKPNT = 'cased_L-12_H-768_A-12/bert_model.ckpt'\n",
+ "BERT_CONFIG = 'cased_L-12_H-768_A-12/bert_config.json'\n",
+ "\n",
+ "tokenizer = tokenization.FullTokenizer(\n",
+ " vocab_file=BERT_VOCAB, do_lower_case=False)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 4,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "def process_corpus(corpus, until = None):\n",
+ " global word2idx, tag2idx, char2idx, word_idx, tag_idx, char_idx\n",
+ " sentences, words, depends, labels, pos, sequences = [], [], [], [], [], []\n",
+ " temp_sentence, temp_word, temp_depend, temp_label, temp_pos = [], [], [], [], []\n",
+ " first_time = True\n",
+ " for sentence in corpus:\n",
+ " try:\n",
+ " if len(sentence):\n",
+ " if sentence[0] == '#':\n",
+ " continue\n",
+ " if first_time:\n",
+ " print(sentence)\n",
+ " first_time = False\n",
+ " sentence = sentence.split('\\t')\n",
+ " if sentence[7] not in tag2idx:\n",
+ " tag2idx[sentence[7]] = tag_idx\n",
+ " tag_idx += 1\n",
+ " temp_word.append(sentence[1])\n",
+ " temp_depend.append(int(sentence[6]) + 1)\n",
+ " temp_label.append(tag2idx[sentence[7]])\n",
+ " temp_sentence.append(sentence[1])\n",
+ " temp_pos.append(sentence[3])\n",
+ " else:\n",
+ " if len(temp_sentence) < 2 or len(temp_word) != len(temp_label):\n",
+ " temp_word = []\n",
+ " temp_depend = []\n",
+ " temp_label = []\n",
+ " temp_sentence = []\n",
+ " temp_pos = []\n",
+ " continue\n",
+ " bert_tokens = ['[CLS]']\n",
+ " labels_ = [0]\n",
+ " depends_ = [0]\n",
+ " seq_ = []\n",
+ " for no, orig_token in enumerate(temp_word):\n",
+ " labels_.append(temp_label[no])\n",
+ " depends_.append(temp_depend[no])\n",
+ " t = tokenizer.tokenize(orig_token)\n",
+ " bert_tokens.extend(t)\n",
+ " labels_.extend([1] * (len(t) - 1))\n",
+ " depends_.extend([0] * (len(t) - 1))\n",
+ " seq_.append(no + 1)\n",
+ " bert_tokens.append('[SEP]')\n",
+ " labels_.append(0)\n",
+ " depends_.append(0)\n",
+ " words.append(tokenizer.convert_tokens_to_ids(bert_tokens))\n",
+ " depends.append(depends_)\n",
+ " labels.append(labels_)\n",
+ " sentences.append(temp_sentence)\n",
+ " pos.append(temp_pos)\n",
+ " sequences.append(seq_)\n",
+ " temp_word = []\n",
+ " temp_depend = []\n",
+ " temp_label = []\n",
+ " temp_sentence = []\n",
+ " temp_pos = []\n",
+ " except Exception as e:\n",
+ " print(e, sentence)\n",
+ " return sentences[:-1], words[:-1], depends[:-1], labels[:-1], pos[:-1], sequences[:-1]"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 5,
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "1\tFrom\tfrom\tADP\tIN\t_\t3\tcase\t3:case\t_\n",
+ "invalid literal for int() with base 10: '_' ['10.1', 'has', 'have', 'VERB', 'VBZ', '_', '_', '_', '8:parataxis', 'CopyOf=-1']\n",
+ "invalid literal for int() with base 10: '_' ['21.1', 'has', 'have', 'VERB', 'VBZ', '_', '_', '_', '16:conj:and', 'CopyOf=-1']\n"
+ ]
+ }
+ ],
+ "source": [
+ "with open('en_ewt-ud-dev.conllu') as fopen:\n",
+ " dev = fopen.read().split('\\n')\n",
+ "\n",
+ "sentences_dev, words_dev, depends_dev, labels_dev, _, seq_dev = process_corpus(dev)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 6,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "([101, 1622, 1103, 10997, 2502, 1142, 1642, 131, 102],\n",
+ " [0, 4, 4, 5, 1, 7, 5, 5, 0],\n",
+ " [1, 2, 3, 4, 5, 6, 7])"
+ ]
+ },
+ "execution_count": 6,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "words_dev[0], depends_dev[0], seq_dev[0]"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 7,
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "1\tWhat\twhat\tPRON\tWP\tPronType=Int\t0\troot\t0:root\t_\n",
+ "invalid literal for int() with base 10: '_' ['24.1', 'left', 'left', 'VERB', 'VBN', 'Tense=Past|VerbForm=Part', '_', '_', '6:parataxis', 'CopyOf=6']\n"
+ ]
+ }
+ ],
+ "source": [
+ "with open('en_ewt-ud-test.conllu') as fopen:\n",
+ " test = fopen.read().split('\\n')\n",
+ "\n",
+ "sentences_test, words_test, depends_test, labels_test, _, seq_test = process_corpus(test)\n",
+ "sentences_test.extend(sentences_dev)\n",
+ "words_test.extend(words_dev)\n",
+ "depends_test.extend(depends_dev)\n",
+ "labels_test.extend(labels_dev)\n",
+ "seq_test.extend(seq_dev)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 8,
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "1\tAl\tAl\tPROPN\tNNP\tNumber=Sing\t0\troot\t0:root\tSpaceAfter=No\n",
+ "invalid literal for int() with base 10: '_' ['8.1', 'reported', 'report', 'VERB', 'VBN', 'Tense=Past|VerbForm=Part|Voice=Pass', '_', '_', '5:conj:and', 'CopyOf=5']\n",
+ "invalid literal for int() with base 10: '_' ['22.1', 'used', 'use', 'VERB', 'VBN', 'Tense=Past|VerbForm=Part', '_', '_', '13:advcl:with|17:conj:and', 'CopyOf=17']\n",
+ "invalid literal for int() with base 10: '_' ['22.1', 'used', 'use', 'VERB', 'VBN', 'Tense=Past|VerbForm=Part', '_', '_', '13:advcl:with|17:conj:and', 'CopyOf=17']\n",
+ "invalid literal for int() with base 10: '_' ['11.1', 'called', 'call', 'VERB', 'VBN', 'Tense=Past|VerbForm=Part|Voice=Pass', '_', '_', '3:conj:and', 'CopyOf=3']\n",
+ "invalid literal for int() with base 10: '_' ['14.1', 'is', 'be', 'VERB', 'VBZ', '_', '_', '_', '1:conj:and', 'CopyOf=1']\n",
+ "invalid literal for int() with base 10: '_' ['20.1', 'reflect', 'reflect', 'VERB', 'VBP', 'Mood=Ind|Tense=Pres|VerbForm=Fin', '_', '_', '7:acl:relcl|9:conj', 'CopyOf=9']\n",
+ "invalid literal for int() with base 10: '_' ['21.1', 'recruited', 'recruit', 'VERB', 'VBD', 'Mood=Ind|Tense=Past|VerbForm=Fin', '_', '_', '9:conj:and', 'CopyOf=9']\n",
+ "invalid literal for int() with base 10: '_' ['9.1', 'wish', 'wish', 'VERB', 'VBP', 'Mood=Ind|Tense=Pres|VerbForm=Fin', '_', '_', '2:conj:and', 'CopyOf=2']\n",
+ "invalid literal for int() with base 10: '_' ['38.1', 'supplied', 'supply', 'VERB', 'VBN', 'Tense=Past|VerbForm=Part|Voice=Pass', '_', '_', '16:conj:and', 'CopyOf=16']\n",
+ "invalid literal for int() with base 10: '_' ['18.1', 'keep', 'keep', 'VERB', 'VB', 'Mood=Imp|VerbForm=Fin', '_', '_', '14:conj:and', 'CopyOf=14']\n",
+ "invalid literal for int() with base 10: '_' ['21.1', 'keep', 'keep', 'VERB', 'VB', 'Mood=Imp|VerbForm=Fin', '_', '_', '14:conj:and', 'CopyOf=14']\n",
+ "invalid literal for int() with base 10: '_' ['18.1', 'mean', 'mean', 'VERB', 'VB', 'VerbForm=Inf', '_', '_', '8:conj', 'CopyOf=8']\n",
+ "invalid literal for int() with base 10: '_' ['30.1', 'play', 'play', 'VERB', 'VBP', 'Mood=Ind|Tense=Pres|VerbForm=Fin', '_', '_', '18:acl:relcl|27:conj:but', 'CopyOf=27']\n",
+ "invalid literal for int() with base 10: '_' ['22.1', 'have', 'have', 'VERB', 'VBP', 'Mood=Ind|Tense=Pres|VerbForm=Fin', '_', '_', '17:conj', 'CopyOf=17']\n",
+ "invalid literal for int() with base 10: '_' ['27.1', 'have', 'have', 'VERB', 'VBP', 'Mood=Ind|Tense=Pres|VerbForm=Fin', '_', '_', '17:conj', 'CopyOf=17']\n",
+ "invalid literal for int() with base 10: '_' ['49.1', 'helped', 'help', 'VERB', 'VBD', '_', '_', '_', '38:conj:but', 'CopyOf=38']\n",
+ "invalid literal for int() with base 10: '_' ['7.1', 'found', 'find', 'VERB', 'VBD', 'Mood=Ind|Tense=Past|VerbForm=Fin', '_', '_', '3:conj', 'CopyOf=3']\n",
+ "invalid literal for int() with base 10: '_' ['10.1', 'excited', 'excited', 'ADJ', 'JJ', 'Degree=Pos', '_', '_', '4:advcl', 'CopyOf=4']\n",
+ "invalid literal for int() with base 10: '_' ['15.1', \"'s\", 'be', 'VERB', 'VBZ', '_', '_', '_', '2:conj:and', 'CopyOf=2']\n",
+ "invalid literal for int() with base 10: '_' ['25.1', 'took', 'take', 'VERB', 'VBD', 'Mood=Ind|Tense=Past|VerbForm=Fin', '_', '_', '17:conj:and', 'CopyOf=17']\n",
+ "invalid literal for int() with base 10: '_' ['10.1', 'loss', 'lose', 'VERB', 'VBD', 'Mood=Ind|Tense=Past|VerbForm=Fin', '_', '_', '3:conj:and', 'CopyOf=3']\n",
+ "invalid literal for int() with base 10: '_' ['11.1', 'leave', 'leave', 'VERB', 'VB', 'VerbForm=Inf', '_', '_', '7:parataxis', 'CopyOf=7']\n",
+ "invalid literal for int() with base 10: '_' ['24.1', 'charge', 'charge', 'VERB', 'VBP', 'Mood=Ind|Tense=Pres|VerbForm=Fin', '_', '_', '16:conj:and', 'CopyOf=16']\n"
+ ]
+ }
+ ],
+ "source": [
+ "with open('en_ewt-ud-train.conllu') as fopen:\n",
+ " train = fopen.read().split('\\n')\n",
+ "\n",
+ "sentences_train, words_train, depends_train, labels_train, _, _ = process_corpus(train)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 9,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "(12000, 3824)"
+ ]
+ },
+ "execution_count": 9,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "len(sentences_train), len(sentences_test)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 10,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "idx2tag = {v:k for k, v in tag2idx.items()}"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 11,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "train_X = words_train\n",
+ "train_Y = labels_train\n",
+ "train_depends = depends_train\n",
+ "\n",
+ "test_X = words_test\n",
+ "test_Y = labels_test\n",
+ "test_depends = depends_test"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 12,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "epoch = 15\n",
+ "batch_size = 32\n",
+ "warmup_proportion = 0.1\n",
+ "num_train_steps = int(len(train_X) / batch_size * epoch)\n",
+ "num_warmup_steps = int(num_train_steps * warmup_proportion)\n",
+ "bert_config = modeling.BertConfig.from_json_file(BERT_CONFIG)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 13,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "class BiAAttention:\n",
+ " def __init__(self, input_size_encoder, input_size_decoder, num_labels):\n",
+ " self.input_size_encoder = input_size_encoder\n",
+ " self.input_size_decoder = input_size_decoder\n",
+ " self.num_labels = num_labels\n",
+ " \n",
+ " self.W_d = tf.get_variable(\"W_d\", shape=[self.num_labels, self.input_size_decoder],\n",
+ " initializer=tf.contrib.layers.xavier_initializer())\n",
+ " self.W_e = tf.get_variable(\"W_e\", shape=[self.num_labels, self.input_size_encoder],\n",
+ " initializer=tf.contrib.layers.xavier_initializer())\n",
+ " self.U = tf.get_variable(\"U\", shape=[self.num_labels, self.input_size_decoder, self.input_size_encoder],\n",
+ " initializer=tf.contrib.layers.xavier_initializer())\n",
+ " \n",
+ " def forward(self, input_d, input_e, mask_d=None, mask_e=None):\n",
+ " batch = tf.shape(input_d)[0]\n",
+ " length_decoder = tf.shape(input_d)[1]\n",
+ " length_encoder = tf.shape(input_e)[1]\n",
+ " out_d = tf.expand_dims(tf.matmul(self.W_d, tf.transpose(input_d, [0, 2, 1])), 3)\n",
+ " out_e = tf.expand_dims(tf.matmul(self.W_e, tf.transpose(input_e, [0, 2, 1])), 2)\n",
+ " output = tf.matmul(tf.expand_dims(input_d, 1), self.U)\n",
+ " output = tf.matmul(output, tf.transpose(tf.expand_dims(input_e, 1), [0, 1, 3, 2]))\n",
+ " \n",
+ " output = output + out_d + out_e\n",
+ " \n",
+ " if mask_d is not None:\n",
+ " d = tf.expand_dims(tf.expand_dims(mask_d, 1), 3)\n",
+ " e = tf.expand_dims(tf.expand_dims(mask_e, 1), 2)\n",
+ " output = output * d * e\n",
+ " \n",
+ " return output\n",
+ "\n",
+ "class Model:\n",
+ " def __init__(\n",
+ " self,\n",
+ " learning_rate,\n",
+ " hidden_size_word,\n",
+ " ):\n",
+ " def cells(size, reuse = False):\n",
+ " return tf.contrib.rnn.DropoutWrapper(\n",
+ " tf.nn.rnn_cell.LSTMCell(\n",
+ " size,\n",
+ " initializer = tf.orthogonal_initializer(),\n",
+ " reuse = reuse,\n",
+ " ),\n",
+ " output_keep_prob = dropout,\n",
+ " )\n",
+ " \n",
+ " self.X = tf.placeholder(tf.int32, [None, None])\n",
+ " self.labels = tf.placeholder(tf.int32, shape = [None, None])\n",
+ " self.depends = tf.placeholder(tf.int32, shape = [None, None])\n",
+ " self.maxlen = tf.shape(self.X)[1]\n",
+ " self.lengths = tf.count_nonzero(self.X, 1)\n",
+ " self.mask = tf.math.not_equal(self.X, 0)\n",
+ " float_mask = tf.cast(self.mask, tf.float32)\n",
+ " \n",
+ " self.arc_h = tf.layers.Dense(hidden_size_word)\n",
+ " self.arc_c = tf.layers.Dense(hidden_size_word)\n",
+ " self.attention = BiAAttention(hidden_size_word, hidden_size_word, 1)\n",
+ "\n",
+ " model = modeling.BertModel(\n",
+ " config=bert_config,\n",
+ " is_training=True,\n",
+ " input_ids=self.X,\n",
+ " use_one_hot_embeddings=False)\n",
+ " output_layer = model.get_sequence_output()\n",
+ "\n",
+ " logits = tf.layers.dense(output_layer, len(idx2tag))\n",
+ " log_likelihood, transition_params = tf.contrib.crf.crf_log_likelihood(\n",
+ " logits, self.labels, self.lengths\n",
+ " )\n",
+ " arc_h = tf.nn.elu(self.arc_h(output_layer))\n",
+ " arc_c = tf.nn.elu(self.arc_c(output_layer))\n",
+ " out_arc = tf.squeeze(self.attention.forward(arc_h, arc_h, mask_d=float_mask, mask_e=float_mask), axis = 1)\n",
+ " \n",
+ " batch = tf.shape(out_arc)[0]\n",
+ " batch_index = tf.range(0, batch)\n",
+ " max_len = tf.shape(out_arc)[1]\n",
+ " sec_max_len = tf.shape(out_arc)[2]\n",
+ " \n",
+ " minus_inf = -1e8\n",
+ " minus_mask = (1 - float_mask) * minus_inf\n",
+ " out_arc = out_arc + tf.expand_dims(minus_mask, axis = 2) + tf.expand_dims(minus_mask, axis = 1)\n",
+ " loss_arc = tf.nn.log_softmax(out_arc, dim=1)\n",
+ " loss_arc = loss_arc * tf.expand_dims(float_mask, axis = 2) * tf.expand_dims(float_mask, axis = 1)\n",
+ " num = tf.reduce_sum(float_mask) - tf.cast(batch, tf.float32)\n",
+ " \n",
+ " child_index = tf.tile(tf.expand_dims(tf.range(0, max_len), 1), [1, batch])\n",
+ " t = tf.transpose(self.depends)\n",
+ " broadcasted = tf.broadcast_to(batch_index, tf.shape(t))\n",
+ " concatenated = tf.transpose(tf.concat([tf.expand_dims(broadcasted, axis = 0),\n",
+ " tf.expand_dims(t, axis = 0),\n",
+ " tf.expand_dims(child_index, axis = 0)], axis = 0))\n",
+ " loss_arc = tf.gather_nd(loss_arc, concatenated)\n",
+ " loss_arc = tf.transpose(loss_arc, [1, 0])[1:]\n",
+ " \n",
+ " loss_arc = tf.reduce_sum(-loss_arc) / num\n",
+ " \n",
+ " self.cost = tf.reduce_mean(-log_likelihood) + loss_arc\n",
+ " \n",
+ " self.optimizer = optimization.create_optimizer(self.cost, learning_rate, \n",
+ " num_train_steps, num_warmup_steps, False)\n",
+ " \n",
+ " mask = tf.sequence_mask(self.lengths, maxlen = self.maxlen)\n",
+ " \n",
+ " self.tags_seq, _ = tf.contrib.crf.crf_decode(\n",
+ " logits, transition_params, self.lengths\n",
+ " )\n",
+ " \n",
+ " out_arc = out_arc + tf.linalg.diag(tf.fill([max_len], -np.inf))\n",
+ " minus_mask = tf.expand_dims(tf.cast(1.0 - float_mask, tf.bool), axis = 2)\n",
+ " minus_mask = tf.tile(minus_mask, [1, 1, sec_max_len])\n",
+ " out_arc = tf.where(minus_mask, tf.fill(tf.shape(out_arc), -np.inf), out_arc)\n",
+ " self.heads = tf.argmax(out_arc, axis = 1)\n",
+ " \n",
+ " self.prediction = tf.boolean_mask(self.tags_seq, mask)\n",
+ " mask_label = tf.boolean_mask(self.labels, mask)\n",
+ " correct_pred = tf.equal(self.prediction, mask_label)\n",
+ " correct_index = tf.cast(correct_pred, tf.float32)\n",
+ " self.accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32))\n",
+ " \n",
+ " self.prediction = tf.cast(tf.boolean_mask(self.heads, mask), tf.int32)\n",
+ " mask_label = tf.boolean_mask(self.depends, mask)\n",
+ " correct_pred = tf.equal(self.prediction, mask_label)\n",
+ " correct_index = tf.cast(correct_pred, tf.float32)\n",
+ " self.accuracy_depends = tf.reduce_mean(tf.cast(correct_pred, tf.float32))"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 14,
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/tensorflow/python/util/deprecation.py:507: calling count_nonzero (from tensorflow.python.ops.math_ops) with axis is deprecated and will be removed in a future version.\n",
+ "Instructions for updating:\n",
+ "reduction_indices is deprecated, use axis instead\n",
+ "WARNING:tensorflow:\n",
+ "The TensorFlow contrib module will not be included in TensorFlow 2.0.\n",
+ "For more information, please see:\n",
+ " * https://github.com/tensorflow/community/blob/master/rfcs/20180907-contrib-sunset.md\n",
+ " * https://github.com/tensorflow/addons\n",
+ " * https://github.com/tensorflow/io (for I/O related ops)\n",
+ "If you depend on functionality not listed there, please file an issue.\n",
+ "\n",
+ "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/bert/modeling.py:171: The name tf.variable_scope is deprecated. Please use tf.compat.v1.variable_scope instead.\n",
+ "\n",
+ "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/bert/modeling.py:409: The name tf.get_variable is deprecated. Please use tf.compat.v1.get_variable instead.\n",
+ "\n",
+ "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/bert/modeling.py:490: The name tf.assert_less_equal is deprecated. Please use tf.compat.v1.assert_less_equal instead.\n",
+ "\n",
+ "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/bert/modeling.py:358: calling dropout (from tensorflow.python.ops.nn_ops) with keep_prob is deprecated and will be removed in a future version.\n",
+ "Instructions for updating:\n",
+ "Please use `rate` instead of `keep_prob`. Rate should be set to `rate = 1 - keep_prob`.\n",
+ "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/bert/modeling.py:671: dense (from tensorflow.python.layers.core) is deprecated and will be removed in a future version.\n",
+ "Instructions for updating:\n",
+ "Use keras.layers.dense instead.\n",
+ "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/tensorflow/python/ops/init_ops.py:1251: calling VarianceScaling.__init__ (from tensorflow.python.ops.init_ops) with dtype is deprecated and will be removed in a future version.\n",
+ "Instructions for updating:\n",
+ "Call initializer instance with the dtype argument instead of passing it to the constructor\n",
+ "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/tensorflow/contrib/crf/python/ops/crf.py:99: add_dispatch_support..wrapper (from tensorflow.python.ops.array_ops) is deprecated and will be removed in a future version.\n",
+ "Instructions for updating:\n",
+ "Use tf.where in 2.0, which has the same broadcast rule as np.where\n",
+ "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/tensorflow/contrib/crf/python/ops/crf.py:213: dynamic_rnn (from tensorflow.python.ops.rnn) is deprecated and will be removed in a future version.\n",
+ "Instructions for updating:\n",
+ "Please use `keras.layers.RNN(cell)`, which is equivalent to this API\n",
+ "WARNING:tensorflow:From :83: calling log_softmax (from tensorflow.python.ops.nn_ops) with dim is deprecated and will be removed in a future version.\n",
+ "Instructions for updating:\n",
+ "dim is deprecated, use axis instead\n",
+ "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/bert/optimization.py:27: The name tf.train.get_or_create_global_step is deprecated. Please use tf.compat.v1.train.get_or_create_global_step instead.\n",
+ "\n",
+ "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/bert/optimization.py:32: The name tf.train.polynomial_decay is deprecated. Please use tf.compat.v1.train.polynomial_decay instead.\n",
+ "\n",
+ "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/tensorflow/python/keras/optimizer_v2/learning_rate_schedule.py:409: div (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.\n",
+ "Instructions for updating:\n",
+ "Deprecated in favor of operator or tf.math.divide.\n",
+ "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/bert/optimization.py:70: The name tf.trainable_variables is deprecated. Please use tf.compat.v1.trainable_variables instead.\n",
+ "\n"
+ ]
+ }
+ ],
+ "source": [
+ "tf.reset_default_graph()\n",
+ "sess = tf.InteractiveSession()\n",
+ "\n",
+ "hidden_size_word = 128\n",
+ "learning_rate = 2e-5\n",
+ "\n",
+ "model = Model(learning_rate,hidden_size_word)\n",
+ "sess.run(tf.global_variables_initializer())"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 15,
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/tensorflow/python/training/saver.py:1276: checkpoint_exists (from tensorflow.python.training.checkpoint_management) is deprecated and will be removed in a future version.\n",
+ "Instructions for updating:\n",
+ "Use standard file APIs to check for files with this prefix.\n",
+ "INFO:tensorflow:Restoring parameters from cased_L-12_H-768_A-12/bert_model.ckpt\n"
+ ]
+ }
+ ],
+ "source": [
+ "var_lists = tf.get_collection(tf.GraphKeys.TRAINABLE_VARIABLES, scope = 'bert')\n",
+ "saver = tf.train.Saver(var_list = var_lists)\n",
+ "saver.restore(sess, BERT_INIT_CHKPNT)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 16,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "from tensorflow.keras.preprocessing.sequence import pad_sequences\n",
+ "\n",
+ "batch_x = train_X[:5]\n",
+ "batch_x = pad_sequences(batch_x,padding='post')\n",
+ "batch_y = train_Y[:5]\n",
+ "batch_y = pad_sequences(batch_y,padding='post')\n",
+ "batch_depends = train_depends[:5]\n",
+ "batch_depends = pad_sequences(batch_depends,padding='post')"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 17,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "[0.028169014, 0.03521127, 124.20428]"
+ ]
+ },
+ "execution_count": 17,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "sess.run([model.accuracy, model.accuracy_depends, model.cost],\n",
+ " feed_dict = {model.X: batch_x,\n",
+ " model.labels: batch_y,\n",
+ " model.depends: batch_depends})"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 18,
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stderr",
+ "output_type": "stream",
+ "text": [
+ "train minibatch loop: 100%|██████████| 375/375 [01:38<00:00, 3.82it/s, accuracy=0.925, accuracy_depends=0.278, cost=10.6] \n",
+ "test minibatch loop: 100%|██████████| 120/120 [00:12<00:00, 9.31it/s, accuracy=0.945, accuracy_depends=0.377, cost=6.01]\n",
+ "train minibatch loop: 0%| | 0/375 [00:00, ?it/s]"
+ ]
+ },
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "epoch: 0, training loss: 31.966364, training acc: 0.692803, training depends: 0.221330, valid loss: 8.225014, valid acc: 0.909024, valid depends: 0.385700\n",
+ "\n"
+ ]
+ },
+ {
+ "name": "stderr",
+ "output_type": "stream",
+ "text": [
+ "train minibatch loop: 100%|██████████| 375/375 [01:35<00:00, 3.91it/s, accuracy=0.952, accuracy_depends=0.398, cost=6.75]\n",
+ "test minibatch loop: 100%|██████████| 120/120 [00:13<00:00, 9.17it/s, accuracy=0.964, accuracy_depends=0.458, cost=4.34]\n",
+ "train minibatch loop: 0%| | 0/375 [00:00, ?it/s]"
+ ]
+ },
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "epoch: 1, training loss: 7.462639, training acc: 0.933842, training depends: 0.379213, valid loss: 5.789834, valid acc: 0.938835, valid depends: 0.465765\n",
+ "\n"
+ ]
+ },
+ {
+ "name": "stderr",
+ "output_type": "stream",
+ "text": [
+ "train minibatch loop: 100%|██████████| 375/375 [01:36<00:00, 3.91it/s, accuracy=0.961, accuracy_depends=0.476, cost=5.27]\n",
+ "test minibatch loop: 100%|██████████| 120/120 [00:12<00:00, 9.23it/s, accuracy=0.981, accuracy_depends=0.477, cost=3.09]\n",
+ "train minibatch loop: 0%| | 0/375 [00:00, ?it/s]"
+ ]
+ },
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "epoch: 2, training loss: 5.058823, training acc: 0.958284, training depends: 0.443148, valid loss: 5.341218, valid acc: 0.943286, valid depends: 0.503747\n",
+ "\n"
+ ]
+ },
+ {
+ "name": "stderr",
+ "output_type": "stream",
+ "text": [
+ "train minibatch loop: 100%|██████████| 375/375 [01:35<00:00, 3.91it/s, accuracy=0.971, accuracy_depends=0.516, cost=4.14]\n",
+ "test minibatch loop: 100%|██████████| 120/120 [00:13<00:00, 9.19it/s, accuracy=0.977, accuracy_depends=0.523, cost=2.93]\n",
+ "train minibatch loop: 0%| | 0/375 [00:00, ?it/s]"
+ ]
+ },
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "epoch: 3, training loss: 3.880462, training acc: 0.970009, training depends: 0.490953, valid loss: 4.991700, valid acc: 0.947060, valid depends: 0.540521\n",
+ "\n"
+ ]
+ },
+ {
+ "name": "stderr",
+ "output_type": "stream",
+ "text": [
+ "train minibatch loop: 100%|██████████| 375/375 [01:35<00:00, 3.91it/s, accuracy=0.98, accuracy_depends=0.545, cost=3.42] \n",
+ "test minibatch loop: 100%|██████████| 120/120 [00:13<00:00, 9.19it/s, accuracy=0.974, accuracy_depends=0.62, cost=2.79] \n",
+ "train minibatch loop: 0%| | 0/375 [00:00, ?it/s]"
+ ]
+ },
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "epoch: 4, training loss: 3.105362, training acc: 0.977292, training depends: 0.533332, valid loss: 5.030708, valid acc: 0.947595, valid depends: 0.568011\n",
+ "\n"
+ ]
+ },
+ {
+ "name": "stderr",
+ "output_type": "stream",
+ "text": [
+ "train minibatch loop: 100%|██████████| 375/375 [01:35<00:00, 3.92it/s, accuracy=0.978, accuracy_depends=0.548, cost=3.41] \n",
+ "test minibatch loop: 100%|██████████| 120/120 [00:13<00:00, 9.08it/s, accuracy=0.968, accuracy_depends=0.646, cost=3.11]\n",
+ "train minibatch loop: 0%| | 0/375 [00:00, ?it/s]"
+ ]
+ },
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "epoch: 5, training loss: 2.581543, training acc: 0.982569, training depends: 0.562761, valid loss: 5.104789, valid acc: 0.947420, valid depends: 0.584828\n",
+ "\n"
+ ]
+ },
+ {
+ "name": "stderr",
+ "output_type": "stream",
+ "text": [
+ "train minibatch loop: 100%|██████████| 375/375 [01:35<00:00, 3.91it/s, accuracy=0.983, accuracy_depends=0.586, cost=2.67] \n",
+ "test minibatch loop: 100%|██████████| 120/120 [00:12<00:00, 9.26it/s, accuracy=0.971, accuracy_depends=0.685, cost=2.68]\n",
+ "train minibatch loop: 0%| | 0/375 [00:00, ?it/s]"
+ ]
+ },
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "epoch: 6, training loss: 2.202466, training acc: 0.986247, training depends: 0.585046, valid loss: 5.109566, valid acc: 0.948887, valid depends: 0.596191\n",
+ "\n"
+ ]
+ },
+ {
+ "name": "stderr",
+ "output_type": "stream",
+ "text": [
+ "train minibatch loop: 100%|██████████| 375/375 [01:35<00:00, 3.92it/s, accuracy=0.991, accuracy_depends=0.6, cost=2.32] \n",
+ "test minibatch loop: 100%|██████████| 120/120 [00:12<00:00, 9.29it/s, accuracy=0.968, accuracy_depends=0.669, cost=2.95]\n",
+ "train minibatch loop: 0%| | 0/375 [00:00, ?it/s]"
+ ]
+ },
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "epoch: 7, training loss: 1.927867, training acc: 0.988890, training depends: 0.603766, valid loss: 5.185890, valid acc: 0.948625, valid depends: 0.608178\n",
+ "\n"
+ ]
+ },
+ {
+ "name": "stderr",
+ "output_type": "stream",
+ "text": [
+ "train minibatch loop: 100%|██████████| 375/375 [01:35<00:00, 3.91it/s, accuracy=0.99, accuracy_depends=0.609, cost=2.11] \n",
+ "test minibatch loop: 100%|██████████| 120/120 [00:12<00:00, 9.33it/s, accuracy=0.974, accuracy_depends=0.679, cost=3.1] \n",
+ "train minibatch loop: 0%| | 0/375 [00:00, ?it/s]"
+ ]
+ },
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "epoch: 8, training loss: 1.711818, training acc: 0.990891, training depends: 0.618656, valid loss: 5.243721, valid acc: 0.949115, valid depends: 0.617389\n",
+ "\n"
+ ]
+ },
+ {
+ "name": "stderr",
+ "output_type": "stream",
+ "text": [
+ "train minibatch loop: 100%|██████████| 375/375 [01:35<00:00, 3.92it/s, accuracy=0.99, accuracy_depends=0.636, cost=1.91] \n",
+ "test minibatch loop: 100%|██████████| 120/120 [00:12<00:00, 9.33it/s, accuracy=0.974, accuracy_depends=0.724, cost=2.76]\n",
+ "train minibatch loop: 0%| | 0/375 [00:00, ?it/s]"
+ ]
+ },
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "epoch: 9, training loss: 1.531559, training acc: 0.992612, training depends: 0.631491, valid loss: 5.315990, valid acc: 0.949411, valid depends: 0.623621\n",
+ "\n"
+ ]
+ },
+ {
+ "name": "stderr",
+ "output_type": "stream",
+ "text": [
+ "train minibatch loop: 100%|██████████| 375/375 [01:35<00:00, 3.91it/s, accuracy=0.991, accuracy_depends=0.616, cost=1.87] \n",
+ "test minibatch loop: 100%|██████████| 120/120 [00:12<00:00, 9.26it/s, accuracy=0.968, accuracy_depends=0.721, cost=2.73]\n",
+ "train minibatch loop: 0%| | 0/375 [00:00, ?it/s]"
+ ]
+ },
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "epoch: 10, training loss: 1.412521, training acc: 0.993470, training depends: 0.641314, valid loss: 5.335382, valid acc: 0.950646, valid depends: 0.627185\n",
+ "\n"
+ ]
+ },
+ {
+ "name": "stderr",
+ "output_type": "stream",
+ "text": [
+ "train minibatch loop: 100%|██████████| 375/375 [01:35<00:00, 3.91it/s, accuracy=0.99, accuracy_depends=0.625, cost=1.73] \n",
+ "test minibatch loop: 100%|██████████| 120/120 [00:12<00:00, 9.28it/s, accuracy=0.977, accuracy_depends=0.688, cost=3.26]\n",
+ "train minibatch loop: 0%| | 0/375 [00:00, ?it/s]"
+ ]
+ },
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "epoch: 11, training loss: 1.315561, training acc: 0.994384, training depends: 0.650567, valid loss: 5.303139, valid acc: 0.950435, valid depends: 0.634550\n",
+ "\n"
+ ]
+ },
+ {
+ "name": "stderr",
+ "output_type": "stream",
+ "text": [
+ "train minibatch loop: 100%|██████████| 375/375 [01:35<00:00, 3.91it/s, accuracy=0.994, accuracy_depends=0.635, cost=1.51] \n",
+ "test minibatch loop: 100%|██████████| 120/120 [00:12<00:00, 9.29it/s, accuracy=0.974, accuracy_depends=0.714, cost=3.14]\n",
+ "train minibatch loop: 0%| | 0/375 [00:00, ?it/s]"
+ ]
+ },
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "epoch: 12, training loss: 1.233227, training acc: 0.995393, training depends: 0.656432, valid loss: 5.327754, valid acc: 0.951320, valid depends: 0.640154\n",
+ "\n"
+ ]
+ },
+ {
+ "name": "stderr",
+ "output_type": "stream",
+ "text": [
+ "train minibatch loop: 100%|██████████| 375/375 [01:35<00:00, 3.91it/s, accuracy=0.994, accuracy_depends=0.632, cost=1.51] \n",
+ "test minibatch loop: 100%|██████████| 120/120 [00:13<00:00, 9.21it/s, accuracy=0.974, accuracy_depends=0.731, cost=3.02]\n",
+ "train minibatch loop: 0%| | 0/375 [00:00, ?it/s]"
+ ]
+ },
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "epoch: 13, training loss: 1.192494, training acc: 0.995591, training depends: 0.660398, valid loss: 5.371092, valid acc: 0.950071, valid depends: 0.640167\n",
+ "\n"
+ ]
+ },
+ {
+ "name": "stderr",
+ "output_type": "stream",
+ "text": [
+ "train minibatch loop: 100%|██████████| 375/375 [01:35<00:00, 3.92it/s, accuracy=0.996, accuracy_depends=0.63, cost=1.38] \n",
+ "test minibatch loop: 100%|██████████| 120/120 [00:12<00:00, 9.39it/s, accuracy=0.977, accuracy_depends=0.721, cost=2.28]"
+ ]
+ },
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "epoch: 14, training loss: 1.151563, training acc: 0.996021, training depends: 0.663822, valid loss: 5.408720, valid acc: 0.950176, valid depends: 0.641616\n",
+ "\n"
+ ]
+ },
+ {
+ "name": "stderr",
+ "output_type": "stream",
+ "text": [
+ "\n"
+ ]
+ }
+ ],
+ "source": [
+ "from tqdm import tqdm\n",
+ "\n",
+ "batch_size = 32\n",
+ "epoch = 15\n",
+ "\n",
+ "for e in range(epoch):\n",
+ " train_acc, train_loss = [], []\n",
+ " test_acc, test_loss = [], []\n",
+ " train_acc_depends, test_acc_depends = [], []\n",
+ " \n",
+ " pbar = tqdm(\n",
+ " range(0, len(train_X), batch_size), desc = 'train minibatch loop'\n",
+ " )\n",
+ " for i in pbar:\n",
+ " index = min(i + batch_size, len(train_X))\n",
+ " batch_x = train_X[i: index]\n",
+ " batch_x = pad_sequences(batch_x,padding='post')\n",
+ " batch_y = train_Y[i: index]\n",
+ " batch_y = pad_sequences(batch_y,padding='post')\n",
+ " batch_depends = train_depends[i: index]\n",
+ " batch_depends = pad_sequences(batch_depends,padding='post')\n",
+ " \n",
+ " acc_depends, acc, cost, _ = sess.run(\n",
+ " [model.accuracy_depends, model.accuracy, model.cost, model.optimizer],\n",
+ " feed_dict = {\n",
+ " model.X: batch_x,\n",
+ " model.labels: batch_y,\n",
+ " model.depends: batch_depends\n",
+ " },\n",
+ " )\n",
+ " train_loss.append(cost)\n",
+ " train_acc.append(acc)\n",
+ " train_acc_depends.append(acc_depends)\n",
+ " pbar.set_postfix(cost = cost, accuracy = acc, accuracy_depends = acc_depends)\n",
+ " \n",
+ " pbar = tqdm(\n",
+ " range(0, len(test_X), batch_size), desc = 'test minibatch loop'\n",
+ " )\n",
+ " for i in pbar:\n",
+ " index = min(i + batch_size, len(test_X))\n",
+ " batch_x = test_X[i: index]\n",
+ " batch_x = pad_sequences(batch_x,padding='post')\n",
+ " batch_y = test_Y[i: index]\n",
+ " batch_y = pad_sequences(batch_y,padding='post')\n",
+ " batch_depends = test_depends[i: index]\n",
+ " batch_depends = pad_sequences(batch_depends,padding='post')\n",
+ " \n",
+ " acc_depends, acc, cost = sess.run(\n",
+ " [model.accuracy_depends, model.accuracy, model.cost],\n",
+ " feed_dict = {\n",
+ " model.X: batch_x,\n",
+ " model.labels: batch_y,\n",
+ " model.depends: batch_depends\n",
+ " },\n",
+ " )\n",
+ " test_loss.append(cost)\n",
+ " test_acc.append(acc)\n",
+ " test_acc_depends.append(acc_depends)\n",
+ " pbar.set_postfix(cost = cost, accuracy = acc, accuracy_depends = acc_depends)\n",
+ " \n",
+ " \n",
+ " print(\n",
+ " 'epoch: %d, training loss: %f, training acc: %f, training depends: %f, valid loss: %f, valid acc: %f, valid depends: %f\\n'\n",
+ " % (e, np.mean(train_loss), \n",
+ " np.mean(train_acc), \n",
+ " np.mean(train_acc_depends), \n",
+ " np.mean(test_loss), \n",
+ " np.mean(test_acc), \n",
+ " np.mean(test_acc_depends)\n",
+ " ))\n",
+ " "
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 19,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "def evaluate(heads_pred, types_pred, heads, types, lengths,\n",
+ " symbolic_root=False, symbolic_end=False):\n",
+ " batch_size, _ = heads_pred.shape\n",
+ " ucorr = 0.\n",
+ " lcorr = 0.\n",
+ " total = 0.\n",
+ " ucomplete_match = 0.\n",
+ " lcomplete_match = 0.\n",
+ "\n",
+ " corr_root = 0.\n",
+ " total_root = 0.\n",
+ " start = 1 if symbolic_root else 0\n",
+ " end = 1 if symbolic_end else 0\n",
+ " for i in range(batch_size):\n",
+ " ucm = 1.\n",
+ " lcm = 1.\n",
+ " for j in range(start, lengths[i] - end):\n",
+ "\n",
+ " total += 1\n",
+ " if heads[i, j] == heads_pred[i, j]:\n",
+ " ucorr += 1\n",
+ " if types[i, j] == types_pred[i, j]:\n",
+ " lcorr += 1\n",
+ " else:\n",
+ " lcm = 0\n",
+ " else:\n",
+ " ucm = 0\n",
+ " lcm = 0\n",
+ "\n",
+ " if heads[i, j] == 0:\n",
+ " total_root += 1\n",
+ " corr_root += 1 if heads_pred[i, j] == 0 else 0\n",
+ "\n",
+ " ucomplete_match += ucm\n",
+ " lcomplete_match += lcm\n",
+ " \n",
+ " return ucorr / total, lcorr / total, corr_root / total_root"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 20,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "(array([ 0, 40, 6, 22, 26, 23, 18, 16, 1, 1, 5, 3, 13, 10, 11, 6, 12,\n",
+ " 13, 10, 16, 7, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,\n",
+ " 0, 0, 0, 0, 0, 0, 0, 0], dtype=int32),\n",
+ " array([ 0, 2, 8, 5, 5, 2, 8, 8, -1, -1, 0, 11, 9, 8, 14, 13, 9,\n",
+ " 17, 14, 14, 8, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1,\n",
+ " -1, -1, -1, -1, -1, -1, -1, -1]),\n",
+ " array([-1, 2, 8, 5, 5, 2, 8, 8, -1, -1, 0, 11, 11, 8, 14, 14, 8,\n",
+ " 16, 14, 14, 8, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1,\n",
+ " -1, -1, -1, -1, -1, -1, -1, -1], dtype=int32))"
+ ]
+ },
+ "execution_count": 20,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "tags_seq, heads = sess.run(\n",
+ " [model.tags_seq, model.heads],\n",
+ " feed_dict = {\n",
+ " model.X: batch_x,\n",
+ " },\n",
+ ")\n",
+ "\n",
+ "# do not forget minus by 1, because we plus by 1 during data processing to differentiate BERT padding\n",
+ "tags_seq[0], heads[0] - 1, batch_depends[0] - 1"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 21,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "def merge_wordpiece_tokens_tagging(x, y):\n",
+ " new_paired_tokens = []\n",
+ " n_tokens = len(x)\n",
+ "\n",
+ " i = 0\n",
+ " while i < n_tokens:\n",
+ " current_token, current_label = x[i], y[i]\n",
+ " if current_token.startswith('##'):\n",
+ " previous_token, previous_label = new_paired_tokens.pop()\n",
+ " merged_token = previous_token\n",
+ " merged_label = [previous_label]\n",
+ " while current_token.startswith('##'):\n",
+ " merged_token = merged_token + current_token.replace('##', '')\n",
+ " merged_label.append(current_label)\n",
+ " i = i + 1\n",
+ " current_token, current_label = x[i], y[i]\n",
+ " merged_label = merged_label[0]\n",
+ " new_paired_tokens.append((merged_token, merged_label))\n",
+ " else:\n",
+ " new_paired_tokens.append((current_token, current_label))\n",
+ " i = i + 1\n",
+ " words = [\n",
+ " i[0]\n",
+ " for i in new_paired_tokens\n",
+ " if i[0] not in ['[CLS]', '[SEP]', '[PAD]']\n",
+ " ]\n",
+ " labels = [\n",
+ " i[1]\n",
+ " for i in new_paired_tokens\n",
+ " if i[0] not in ['[CLS]', '[SEP]', '[PAD]']\n",
+ " ]\n",
+ " return words, labels"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 22,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "def evaluate(heads_pred, types_pred, heads, types, lengths,\n",
+ " symbolic_root=False, symbolic_end=False):\n",
+ " batch_size, _ = heads_pred.shape\n",
+ " ucorr = 0.\n",
+ " lcorr = 0.\n",
+ " total = 0.\n",
+ " ucomplete_match = 0.\n",
+ " lcomplete_match = 0.\n",
+ "\n",
+ " corr_root = 0.\n",
+ " total_root = 0.\n",
+ " start = 1 if symbolic_root else 0\n",
+ " end = 1 if symbolic_end else 0\n",
+ " for i in range(batch_size):\n",
+ " ucm = 1.\n",
+ " lcm = 1.\n",
+ " for j in range(start, lengths[i] - end):\n",
+ "\n",
+ " total += 1\n",
+ " if heads[i, j] == heads_pred[i, j]:\n",
+ " ucorr += 1\n",
+ " if types[i, j] == types_pred[i, j]:\n",
+ " lcorr += 1\n",
+ " else:\n",
+ " lcm = 0\n",
+ " else:\n",
+ " ucm = 0\n",
+ " lcm = 0\n",
+ "\n",
+ " if heads[i, j] == 0:\n",
+ " total_root += 1\n",
+ " corr_root += 1 if heads_pred[i, j] == 0 else 0\n",
+ "\n",
+ " ucomplete_match += ucm\n",
+ " lcomplete_match += lcm\n",
+ " \n",
+ " return ucorr / total, lcorr / total, corr_root / total_root"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 23,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "(0.7077922077922078, 0.6883116883116883, 0.7377049180327869)"
+ ]
+ },
+ "execution_count": 23,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "arc_accuracy, type_accuracy, root_accuracy = evaluate(heads, tags_seq, batch_depends, batch_y, \n",
+ " np.count_nonzero(batch_x, axis = 1))\n",
+ "arc_accuracy, type_accuracy, root_accuracy"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 24,
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stderr",
+ "output_type": "stream",
+ "text": [
+ "test minibatch loop: 100%|██████████| 120/120 [00:12<00:00, 9.42it/s, arc_accuracy=0.727, root_accuracy=1, type_accuracy=0.714] \n"
+ ]
+ }
+ ],
+ "source": [
+ "arcs, types, roots = [], [], []\n",
+ "\n",
+ "pbar = tqdm(\n",
+ " range(0, len(test_X), batch_size), desc = 'test minibatch loop'\n",
+ ")\n",
+ "for i in pbar:\n",
+ " index = min(i + batch_size, len(test_X))\n",
+ " batch_x = test_X[i: index]\n",
+ " batch_x = pad_sequences(batch_x,padding='post')\n",
+ " batch_y = test_Y[i: index]\n",
+ " batch_y = pad_sequences(batch_y,padding='post')\n",
+ " batch_depends = test_depends[i: index]\n",
+ " batch_depends = pad_sequences(batch_depends,padding='post')\n",
+ " \n",
+ " tags_seq, heads = sess.run(\n",
+ " [model.tags_seq, model.heads],\n",
+ " feed_dict = {\n",
+ " model.X: batch_x\n",
+ " },\n",
+ " )\n",
+ " \n",
+ " arc_accuracy, type_accuracy, root_accuracy = evaluate(heads - 1, tags_seq, batch_depends - 1, batch_y, \n",
+ " np.count_nonzero(batch_x, axis = 1))\n",
+ " pbar.set_postfix(arc_accuracy = arc_accuracy, type_accuracy = type_accuracy, \n",
+ " root_accuracy = root_accuracy)\n",
+ " arcs.append(arc_accuracy)\n",
+ " types.append(type_accuracy)\n",
+ " roots.append(root_accuracy)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 25,
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "arc accuracy: 0.643016523744595\n",
+ "types accuracy: 0.6289139955331527\n",
+ "root accuracy: 0.7419270833333333\n"
+ ]
+ }
+ ],
+ "source": [
+ "print('arc accuracy:', np.mean(arcs))\n",
+ "print('types accuracy:', np.mean(types))\n",
+ "print('root accuracy:', np.mean(roots))"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": []
+ }
+ ],
+ "metadata": {
+ "kernelspec": {
+ "display_name": "Python 3",
+ "language": "python",
+ "name": "python3"
+ },
+ "language_info": {
+ "codemirror_mode": {
+ "name": "ipython",
+ "version": 3
+ },
+ "file_extension": ".py",
+ "mimetype": "text/x-python",
+ "name": "python",
+ "nbconvert_exporter": "python",
+ "pygments_lexer": "ipython3",
+ "version": "3.6.8"
+ }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 2
+}
diff --git a/dependency-parser/4.residual-network-bahdanau-char.ipynb b/dependency-parser/4.residual-network-bahdanau-char.ipynb
deleted file mode 100644
index 5248da9..0000000
--- a/dependency-parser/4.residual-network-bahdanau-char.ipynb
+++ /dev/null
@@ -1,481 +0,0 @@
-{
- "cells": [
- {
- "cell_type": "code",
- "execution_count": null,
- "metadata": {},
- "outputs": [],
- "source": [
- "import tensorflow as tf\n",
- "from tqdm import tqdm\n",
- "import numpy as np"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "metadata": {},
- "outputs": [],
- "source": [
- "with open('test.conll.txt') as fopen:\n",
- " corpus = fopen.read().split('\\n')\n",
- " \n",
- "with open('dev.conll.txt') as fopen:\n",
- " corpus_test = fopen.read().split('\\n')"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "metadata": {},
- "outputs": [],
- "source": [
- "word2idx = {'PAD': 0,'NUM':1,'UNK':2}\n",
- "tag2idx = {'PAD': 0}\n",
- "char2idx = {'PAD': 0,'NUM':1,'UNK':2}\n",
- "word_idx = 3\n",
- "tag_idx = 1\n",
- "char_idx = 3\n",
- "\n",
- "def process_corpus(corpus, until = None):\n",
- " global word2idx, tag2idx, char2idx, word_idx, tag_idx, char_idx\n",
- " sentences, words, depends, labels = [], [], [], []\n",
- " temp_sentence, temp_word, temp_depend, temp_label = [], [], [], []\n",
- " for sentence in corpus:\n",
- " if len(sentence):\n",
- " sentence = sentence.split('\\t')\n",
- " for c in sentence[1]:\n",
- " if c not in char2idx:\n",
- " char2idx[c] = char_idx\n",
- " char_idx += 1\n",
- " if sentence[7] not in tag2idx:\n",
- " tag2idx[sentence[7]] = tag_idx\n",
- " tag_idx += 1\n",
- " if sentence[1] not in word2idx:\n",
- " word2idx[sentence[1]] = word_idx\n",
- " word_idx += 1\n",
- " temp_word.append(word2idx[sentence[1]])\n",
- " temp_depend.append(int(sentence[6]))\n",
- " temp_label.append(tag2idx[sentence[7]])\n",
- " temp_sentence.append(sentence[1])\n",
- " else:\n",
- " words.append(temp_word)\n",
- " depends.append(temp_depend)\n",
- " labels.append(temp_label)\n",
- " sentences.append(temp_sentence)\n",
- " temp_word = []\n",
- " temp_depend = []\n",
- " temp_label = []\n",
- " temp_sentence = []\n",
- " return sentences[:-1], words[:-1], depends[:-1], labels[:-1]\n",
- " \n",
- "sentences, words, depends, labels = process_corpus(corpus)\n",
- "sentences_test, words_test, depends_test, labels_test = process_corpus(corpus_test)"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "metadata": {},
- "outputs": [],
- "source": [
- "from keras.preprocessing.sequence import pad_sequences"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "metadata": {},
- "outputs": [],
- "source": [
- "words = pad_sequences(words,padding='post')\n",
- "depends = pad_sequences(depends,padding='post')\n",
- "labels = pad_sequences(labels,padding='post')\n",
- "\n",
- "words_test = pad_sequences(words_test,padding='post')\n",
- "depends_test = pad_sequences(depends_test,padding='post')\n",
- "labels_test = pad_sequences(labels_test,padding='post')"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "metadata": {},
- "outputs": [],
- "source": [
- "idx2word = {idx: tag for tag, idx in word2idx.items()}\n",
- "idx2tag = {i: w for w, i in tag2idx.items()}\n",
- "\n",
- "train_X = words\n",
- "train_Y = labels\n",
- "train_depends = depends\n",
- "\n",
- "test_X = words_test\n",
- "test_Y = labels_test\n",
- "test_depends = depends_test"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "metadata": {},
- "outputs": [],
- "source": [
- "def generate_char_seq(batch, maxlen_c, maxlen, UNK = 2):\n",
- " temp = np.zeros((len(batch),maxlen_c,maxlen),dtype=np.int32)\n",
- " for i in range(len(batch)):\n",
- " for k in range(len(batch[i])):\n",
- " for no, c in enumerate(batch[i][k][:maxlen]):\n",
- " temp[i,k,-1-no] = char2idx.get(c, UNK)\n",
- " return temp"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "metadata": {},
- "outputs": [],
- "source": [
- "maxlen = max(train_X.shape[1], test_X.shape[1])\n",
- "\n",
- "train_X = pad_sequences(train_X,padding='post',maxlen=maxlen)\n",
- "train_Y = pad_sequences(train_Y,padding='post',maxlen=maxlen)\n",
- "train_depends = pad_sequences(train_depends,padding='post',maxlen=maxlen)\n",
- "train_char = generate_char_seq(sentences, maxlen, 30)\n",
- "\n",
- "test_X = pad_sequences(test_X,padding='post',maxlen=maxlen)\n",
- "test_Y = pad_sequences(test_Y,padding='post',maxlen=maxlen)\n",
- "test_depends = pad_sequences(test_depends,padding='post',maxlen=maxlen)\n",
- "test_char = generate_char_seq(sentences_test, maxlen, 30)"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "metadata": {},
- "outputs": [],
- "source": [
- "class Attention:\n",
- " def __init__(self,hidden_size):\n",
- " self.hidden_size = hidden_size\n",
- " self.dense_layer = tf.layers.Dense(hidden_size)\n",
- " self.v = tf.random_normal([hidden_size],mean=0,stddev=1/np.sqrt(hidden_size))\n",
- " \n",
- " def score(self, hidden_tensor, encoder_outputs):\n",
- " energy = tf.nn.tanh(self.dense_layer(tf.concat([hidden_tensor,encoder_outputs],2)))\n",
- " energy = tf.transpose(energy,[0,2,1])\n",
- " batch_size = tf.shape(encoder_outputs)[0]\n",
- " v = tf.expand_dims(tf.tile(tf.expand_dims(self.v,0),[batch_size,1]),1)\n",
- " energy = tf.matmul(v,energy)\n",
- " return tf.squeeze(energy,1)\n",
- " \n",
- " def __call__(self, hidden, encoder_outputs):\n",
- " seq_len = tf.shape(encoder_outputs)[1]\n",
- " batch_size = tf.shape(encoder_outputs)[0]\n",
- " H = tf.tile(tf.expand_dims(hidden, 1),[1,seq_len,1])\n",
- " attn_energies = self.score(H,encoder_outputs)\n",
- " return tf.expand_dims(tf.nn.softmax(attn_energies),1)\n",
- "\n",
- "class Model:\n",
- " def __init__(\n",
- " self,\n",
- " dict_size,\n",
- " char_dict_size,\n",
- " size_layers,\n",
- " learning_rate,\n",
- " maxlen,\n",
- " num_blocks = 3,\n",
- " block_size = 64,\n",
- " ):\n",
- " self.word_ids = tf.placeholder(tf.int32, shape = [None, maxlen])\n",
- " self.labels = tf.placeholder(tf.int32, shape = [None, maxlen])\n",
- " self.depends = tf.placeholder(tf.int32, shape = [None, maxlen])\n",
- " self.char_ids = tf.placeholder(tf.int32, shape = [None, maxlen, 30])\n",
- " embeddings = tf.Variable(tf.random_uniform([dict_size, size_layers], -1, 1))\n",
- " char_embeddings = tf.Variable(tf.random_uniform([char_dict_size, size_layers], -1, 1))\n",
- " embedded = tf.nn.embedding_lookup(embeddings, self.word_ids)\n",
- " char_embedded = tf.nn.embedding_lookup(char_embeddings, self.char_ids)\n",
- " s_x = tf.shape(char_embedded)\n",
- " char_embedded = tf.reshape(\n",
- " char_embedded, shape = [s_x[0] * s_x[1], s_x[-2], size_layers]\n",
- " )\n",
- " self.attention = Attention(size_layers)\n",
- " self.maxlen = tf.shape(self.word_ids)[1]\n",
- " self.lengths = tf.count_nonzero(self.word_ids, 1)\n",
- "\n",
- " def residual_block(x, size, rate, block, block_s = block_size, attention = True):\n",
- " with tf.variable_scope(\n",
- " 'block_%d_%d' % (block, rate), reuse = False\n",
- " ):\n",
- " if attention:\n",
- " attn_weights = self.attention(tf.reduce_sum(x,axis=1), x)\n",
- " else:\n",
- " attn_weights = x\n",
- " conv_filter = tf.layers.conv1d(\n",
- " attn_weights,\n",
- " x.shape[2] // 4,\n",
- " kernel_size = size,\n",
- " strides = 1,\n",
- " padding = 'same',\n",
- " dilation_rate = rate,\n",
- " activation = tf.nn.tanh,\n",
- " )\n",
- " conv_gate = tf.layers.conv1d(\n",
- " x,\n",
- " x.shape[2] // 4,\n",
- " kernel_size = size,\n",
- " strides = 1,\n",
- " padding = 'same',\n",
- " dilation_rate = rate,\n",
- " activation = tf.nn.sigmoid,\n",
- " )\n",
- " out = tf.multiply(conv_filter, conv_gate)\n",
- " out = tf.layers.conv1d(\n",
- " out,\n",
- " block_s,\n",
- " kernel_size = 1,\n",
- " strides = 1,\n",
- " padding = 'same',\n",
- " activation = tf.nn.tanh,\n",
- " )\n",
- " return tf.add(x, out), out\n",
- " \n",
- " forward = tf.layers.conv1d(\n",
- " char_embedded, block_size, kernel_size = 1, strides = 1, padding = 'SAME'\n",
- " )\n",
- " zeros = tf.zeros_like(forward)\n",
- " for i in range(num_blocks):\n",
- " for r in [1, 2, 4, 8]:\n",
- " forward, s = residual_block(\n",
- " forward, size = 7, rate = r, block = 10 * (i + 1), attention = False\n",
- " )\n",
- " zeros = tf.add(zeros, s)\n",
- " output = tf.reshape(\n",
- " tf.reduce_sum(zeros,axis=1), shape = [s_x[0], s_x[1], block_size]\n",
- " )\n",
- " forward = tf.layers.conv1d(\n",
- " embedded, block_size, kernel_size = 1, strides = 1, padding = 'SAME'\n",
- " )\n",
- " forward = tf.concat([forward, output], axis = -1)\n",
- " zeros = tf.zeros_like(forward)\n",
- " for i in range(num_blocks):\n",
- " for r in [1, 2, 4, 8, 16]:\n",
- " forward, s = residual_block(\n",
- " forward, size = 7, rate = r, block = i, block_s = block_size * 2, attention = False\n",
- " )\n",
- " zeros = tf.add(zeros, s)\n",
- " logits = tf.layers.conv1d(\n",
- " zeros, len(idx2tag), kernel_size = 1, strides = 1, padding = 'SAME'\n",
- " )\n",
- " logits_depends = tf.layers.conv1d(\n",
- " zeros, maxlen, kernel_size = 1, strides = 1, padding = 'SAME'\n",
- " )\n",
- " log_likelihood, transition_params = tf.contrib.crf.crf_log_likelihood(\n",
- " logits, self.labels, self.lengths\n",
- " )\n",
- " with tf.variable_scope(\"depends\"):\n",
- " log_likelihood_depends, transition_params_depends = tf.contrib.crf.crf_log_likelihood(\n",
- " logits_depends, self.depends, self.lengths\n",
- " )\n",
- " self.cost = tf.reduce_mean(-log_likelihood) + tf.reduce_mean(-log_likelihood_depends)\n",
- " self.optimizer = tf.train.AdamOptimizer(\n",
- " learning_rate = learning_rate\n",
- " ).minimize(self.cost)\n",
- " \n",
- " mask = tf.sequence_mask(self.lengths, maxlen = self.maxlen)\n",
- " \n",
- " self.tags_seq, _ = tf.contrib.crf.crf_decode(\n",
- " logits, transition_params, self.lengths\n",
- " )\n",
- " self.tags_seq_depends, _ = tf.contrib.crf.crf_decode(\n",
- " logits_depends, transition_params_depends, self.lengths\n",
- " )\n",
- "\n",
- " self.prediction = tf.boolean_mask(self.tags_seq, mask)\n",
- " mask_label = tf.boolean_mask(self.labels, mask)\n",
- " correct_pred = tf.equal(self.prediction, mask_label)\n",
- " correct_index = tf.cast(correct_pred, tf.float32)\n",
- " self.accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32))\n",
- " \n",
- " self.prediction = tf.boolean_mask(self.tags_seq_depends, mask)\n",
- " mask_label = tf.boolean_mask(self.depends, mask)\n",
- " correct_pred = tf.equal(self.prediction, mask_label)\n",
- " correct_index = tf.cast(correct_pred, tf.float32)\n",
- " self.accuracy_depends = tf.reduce_mean(tf.cast(correct_pred, tf.float32))"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "metadata": {},
- "outputs": [],
- "source": [
- "tf.reset_default_graph()\n",
- "sess = tf.InteractiveSession()\n",
- "\n",
- "dim = 128\n",
- "dropout = 1\n",
- "learning_rate = 1e-3\n",
- "batch_size = 8\n",
- "\n",
- "model = Model(len(word2idx), len(char2idx), dim, learning_rate, maxlen)\n",
- "sess.run(tf.global_variables_initializer())"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "metadata": {},
- "outputs": [],
- "source": [
- "import time\n",
- "\n",
- "for e in range(20):\n",
- " lasttime = time.time()\n",
- " train_acc, train_loss, test_acc, test_loss, train_acc_depends, test_acc_depends = 0, 0, 0, 0, 0, 0\n",
- " pbar = tqdm(\n",
- " range(0, len(train_X), batch_size), desc = 'train minibatch loop'\n",
- " )\n",
- " for i in pbar:\n",
- " batch_x = train_X[i : min(i + batch_size, train_X.shape[0])]\n",
- " batch_y = train_Y[i : min(i + batch_size, train_X.shape[0])]\n",
- " batch_char = train_char[i : min(i + batch_size, train_X.shape[0])]\n",
- " batch_depends = train_depends[i : min(i + batch_size, train_X.shape[0])]\n",
- " acc_depends, acc, cost, _ = sess.run(\n",
- " [model.accuracy_depends, model.accuracy, model.cost, model.optimizer],\n",
- " feed_dict = {\n",
- " model.word_ids: batch_x,\n",
- " model.labels: batch_y,\n",
- " model.char_ids: batch_char,\n",
- " model.depends: batch_depends\n",
- " },\n",
- " )\n",
- " assert not np.isnan(cost)\n",
- " train_loss += cost\n",
- " train_acc += acc\n",
- " train_acc_depends += acc_depends\n",
- " pbar.set_postfix(cost = cost, accuracy = acc, accuracy_depends = acc_depends)\n",
- " \n",
- " pbar = tqdm(\n",
- " range(0, len(test_X), batch_size), desc = 'test minibatch loop'\n",
- " )\n",
- " for i in pbar:\n",
- " batch_x = test_X[i : min(i + batch_size, test_X.shape[0])]\n",
- " batch_char = test_char[i : min(i + batch_size, test_X.shape[0])]\n",
- " batch_y = test_Y[i : min(i + batch_size, test_X.shape[0])]\n",
- " batch_depends = test_depends[i : min(i + batch_size, test_X.shape[0])]\n",
- " acc_depends, acc, cost = sess.run(\n",
- " [model.accuracy_depends, model.accuracy, model.cost],\n",
- " feed_dict = {\n",
- " model.word_ids: batch_x,\n",
- " model.labels: batch_y,\n",
- " model.char_ids: batch_char,\n",
- " model.depends: batch_depends\n",
- " },\n",
- " )\n",
- " assert not np.isnan(cost)\n",
- " test_loss += cost\n",
- " test_acc += acc\n",
- " test_acc_depends += acc_depends\n",
- " pbar.set_postfix(cost = cost, accuracy = acc, accuracy_depends = acc_depends)\n",
- " \n",
- " train_loss /= len(train_X) / batch_size\n",
- " train_acc /= len(train_X) / batch_size\n",
- " train_acc_depends /= len(train_X) / batch_size\n",
- " test_loss /= len(test_X) / batch_size\n",
- " test_acc /= len(test_X) / batch_size\n",
- " test_acc_depends /= len(test_X) / batch_size\n",
- "\n",
- " print('time taken:', time.time() - lasttime)\n",
- " print(\n",
- " 'epoch: %d, training loss: %f, training acc: %f, training depends: %f, valid loss: %f, valid acc: %f, valid depends: %f\\n'\n",
- " % (e, train_loss, train_acc, train_acc_depends, test_loss, test_acc, test_acc_depends)\n",
- " )"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "metadata": {},
- "outputs": [],
- "source": [
- "seq, deps = sess.run([model.tags_seq, model.tags_seq_depends],\n",
- " feed_dict={model.word_ids:batch_x[:1]})"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "metadata": {},
- "outputs": [],
- "source": [
- "seq = seq[0]\n",
- "deps = deps[0]"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "metadata": {},
- "outputs": [],
- "source": [
- "seq[seq>0]"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "metadata": {},
- "outputs": [],
- "source": [
- "batch_y[0][seq>0]"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "metadata": {},
- "outputs": [],
- "source": [
- "deps[seq>0]"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "metadata": {},
- "outputs": [],
- "source": [
- "batch_depends[0][seq>0]"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "metadata": {},
- "outputs": [],
- "source": []
- }
- ],
- "metadata": {
- "kernelspec": {
- "display_name": "Python 3",
- "language": "python",
- "name": "python3"
- },
- "language_info": {
- "codemirror_mode": {
- "name": "ipython",
- "version": 3
- },
- "file_extension": ".py",
- "mimetype": "text/x-python",
- "name": "python",
- "nbconvert_exporter": "python",
- "pygments_lexer": "ipython3",
- "version": "3.6.8"
- }
- },
- "nbformat": 4,
- "nbformat_minor": 2
-}
diff --git a/dependency-parser/5.attention-is-all-you-need.ipynb b/dependency-parser/5.attention-is-all-you-need.ipynb
deleted file mode 100644
index a5baba0..0000000
--- a/dependency-parser/5.attention-is-all-you-need.ipynb
+++ /dev/null
@@ -1,1743 +0,0 @@
-{
- "cells": [
- {
- "cell_type": "code",
- "execution_count": 1,
- "metadata": {},
- "outputs": [],
- "source": [
- "import tensorflow as tf\n",
- "from tqdm import tqdm\n",
- "import numpy as np\n",
- "import re"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": 2,
- "metadata": {},
- "outputs": [],
- "source": [
- "with open('id_gsd-ud-train.conllu.txt') as fopen:\n",
- " corpus = fopen.read().split('\\n')\n",
- " \n",
- "with open('id_gsd-ud-test.conllu.txt') as fopen:\n",
- " corpus.extend(fopen.read().split('\\n'))\n",
- " \n",
- "with open('id_gsd-ud-dev.conllu.txt') as fopen:\n",
- " corpus.extend(fopen.read().split('\\n'))"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": 3,
- "metadata": {},
- "outputs": [],
- "source": [
- "word2idx = {'PAD': 0,'NUM':1,'UNK':2}\n",
- "tag2idx = {'PAD': 0}\n",
- "char2idx = {'PAD': 0,'NUM':1,'UNK':2}\n",
- "word_idx = 3\n",
- "tag_idx = 1\n",
- "char_idx = 3\n",
- "\n",
- "def process_string(string):\n",
- " string = re.sub('[^A-Za-z0-9\\-\\/ ]+', ' ', string).split()\n",
- " return [to_title(y.strip()) for y in string]\n",
- "\n",
- "def to_title(string):\n",
- " if string.isupper():\n",
- " string = string.title()\n",
- " return string\n",
- "\n",
- "def process_corpus(corpus, until = None):\n",
- " global word2idx, tag2idx, char2idx, word_idx, tag_idx, char_idx\n",
- " sentences, words, depends, labels, pos = [], [], [], [], []\n",
- " temp_sentence, temp_word, temp_depend, temp_label, temp_pos = [], [], [], [], []\n",
- " for sentence in corpus:\n",
- " if len(sentence):\n",
- " if sentence[0] == '#':\n",
- " continue\n",
- " sentence = sentence.split('\\t')\n",
- " temp = process_string(sentence[1])\n",
- " if not len(temp):\n",
- " sentence[1] = 'EMPTY'\n",
- " sentence[1] = process_string(sentence[1])[0]\n",
- " for c in sentence[1]:\n",
- " if c not in char2idx:\n",
- " char2idx[c] = char_idx\n",
- " char_idx += 1\n",
- " if sentence[7] not in tag2idx:\n",
- " tag2idx[sentence[7]] = tag_idx\n",
- " tag_idx += 1\n",
- " if sentence[1] not in word2idx:\n",
- " word2idx[sentence[1]] = word_idx\n",
- " word_idx += 1\n",
- " temp_word.append(word2idx[sentence[1]])\n",
- " temp_depend.append(int(sentence[6]) + 1)\n",
- " temp_label.append(tag2idx[sentence[7]])\n",
- " temp_sentence.append(sentence[1])\n",
- " temp_pos.append(sentence[3])\n",
- " else:\n",
- " words.append(temp_word)\n",
- " depends.append(temp_depend)\n",
- " labels.append(temp_label)\n",
- " sentences.append(temp_sentence)\n",
- " pos.append(temp_pos)\n",
- " temp_word = []\n",
- " temp_depend = []\n",
- " temp_label = []\n",
- " temp_sentence = []\n",
- " temp_pos = []\n",
- " return sentences[:-1], words[:-1], depends[:-1], labels[:-1], pos[:-1]\n",
- " \n",
- "sentences, words, depends, labels, pos = process_corpus(corpus)"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": 4,
- "metadata": {},
- "outputs": [],
- "source": [
- "import json\n",
- "\n",
- "with open('augmented.json') as fopen:\n",
- " augmented = json.load(fopen)"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": 5,
- "metadata": {},
- "outputs": [],
- "source": [
- "def parse_XY(texts):\n",
- " global word2idx, tag2idx, char2idx, word_idx, tag_idx, char_idx\n",
- " outside, sentences = [], []\n",
- " for no, text in enumerate(texts):\n",
- " s = process_string(text)\n",
- " sentences.append(s)\n",
- " inside = []\n",
- " for w in s:\n",
- " for c in w:\n",
- " if c not in char2idx:\n",
- " char2idx[c] = char_idx\n",
- " char_idx += 1\n",
- " \n",
- " if w not in word2idx:\n",
- " word2idx[w] = word_idx\n",
- " word_idx += 1\n",
- " \n",
- " inside.append(word2idx[w])\n",
- " outside.append(inside)\n",
- " return outside, sentences"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": 6,
- "metadata": {},
- "outputs": [],
- "source": [
- "text_augmented = []\n",
- "for a in augmented:\n",
- " text_augmented.extend(a[0])\n",
- " depends.extend(a[1])\n",
- " labels.extend(a[2])"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": 7,
- "metadata": {},
- "outputs": [],
- "source": [
- "outside, new_sentences = parse_XY(text_augmented)"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": 8,
- "metadata": {},
- "outputs": [
- {
- "name": "stderr",
- "output_type": "stream",
- "text": [
- "Using TensorFlow backend.\n"
- ]
- }
- ],
- "source": [
- "from keras.preprocessing.sequence import pad_sequences"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": 9,
- "metadata": {},
- "outputs": [],
- "source": [
- "words.extend(outside)\n",
- "sentences.extend(new_sentences)"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": 10,
- "metadata": {},
- "outputs": [
- {
- "data": {
- "text/plain": [
- "(50365, 50365, 50365, 50365)"
- ]
- },
- "execution_count": 10,
- "metadata": {},
- "output_type": "execute_result"
- }
- ],
- "source": [
- "len(words), len(depends), len(labels), len(sentences)"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": 11,
- "metadata": {},
- "outputs": [],
- "source": [
- "def generate_char_seq(batch, UNK = 2):\n",
- " maxlen_c = max([len(k) for k in batch])\n",
- " x = [[len(i) for i in k] for k in batch]\n",
- " maxlen = max([j for i in x for j in i])\n",
- " temp = np.zeros((len(batch),maxlen_c,maxlen),dtype=np.int32)\n",
- " for i in range(len(batch)):\n",
- " for k in range(len(batch[i])):\n",
- " for no, c in enumerate(batch[i][k][:maxlen][::-1]):\n",
- " temp[i,k,-1-no] = char2idx.get(c, UNK)\n",
- " return temp"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": 12,
- "metadata": {},
- "outputs": [],
- "source": [
- "idx2word = {idx: tag for tag, idx in word2idx.items()}\n",
- "idx2tag = {i: w for w, i in tag2idx.items()}\n",
- "char = generate_char_seq(sentences)"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": 13,
- "metadata": {},
- "outputs": [
- {
- "data": {
- "text/plain": [
- "(50365, 189)"
- ]
- },
- "execution_count": 13,
- "metadata": {},
- "output_type": "execute_result"
- }
- ],
- "source": [
- "words = pad_sequences(words,padding='post')\n",
- "depends = pad_sequences(depends,padding='post')\n",
- "labels = pad_sequences(labels,padding='post')\n",
- "words.shape"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": 14,
- "metadata": {},
- "outputs": [
- {
- "name": "stderr",
- "output_type": "stream",
- "text": [
- "/usr/local/lib/python3.6/dist-packages/sklearn/cross_validation.py:41: DeprecationWarning: This module was deprecated in version 0.18 in favor of the model_selection module into which all the refactored classes and functions are moved. Also note that the interface of the new CV iterators are different from that of this module. This module will be removed in 0.20.\n",
- " \"This module will be removed in 0.20.\", DeprecationWarning)\n"
- ]
- }
- ],
- "source": [
- "from sklearn.cross_validation import train_test_split\n",
- "train_X, test_X, train_Y, test_Y, train_depends, test_depends, train_char, test_char = train_test_split(\n",
- " words,\n",
- " labels,\n",
- " depends,\n",
- " char,\n",
- " test_size=0.1)"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": 15,
- "metadata": {},
- "outputs": [],
- "source": [
- "def layer_norm(inputs, epsilon=1e-8):\n",
- " mean, variance = tf.nn.moments(inputs, [-1], keep_dims=True)\n",
- " normalized = (inputs - mean) / (tf.sqrt(variance + epsilon))\n",
- "\n",
- " params_shape = inputs.get_shape()[-1:]\n",
- " gamma = tf.get_variable('gamma', params_shape, tf.float32, tf.ones_initializer())\n",
- " beta = tf.get_variable('beta', params_shape, tf.float32, tf.zeros_initializer())\n",
- " \n",
- " outputs = gamma * normalized + beta\n",
- " return outputs\n",
- "\n",
- "def multihead_attn(queries, keys, q_masks, k_masks, future_binding, num_units, num_heads):\n",
- " \n",
- " T_q = tf.shape(queries)[1] \n",
- " T_k = tf.shape(keys)[1] \n",
- "\n",
- " Q = tf.layers.dense(queries, num_units, name='Q') \n",
- " K_V = tf.layers.dense(keys, 2*num_units, name='K_V') \n",
- " K, V = tf.split(K_V, 2, -1) \n",
- "\n",
- " Q_ = tf.concat(tf.split(Q, num_heads, axis=2), axis=0) \n",
- " K_ = tf.concat(tf.split(K, num_heads, axis=2), axis=0) \n",
- " V_ = tf.concat(tf.split(V, num_heads, axis=2), axis=0) \n",
- "\n",
- " align = tf.matmul(Q_, tf.transpose(K_, [0,2,1])) \n",
- " align = align / np.sqrt(K_.get_shape().as_list()[-1]) \n",
- "\n",
- " paddings = tf.fill(tf.shape(align), 0.0) \n",
- "\n",
- " key_masks = k_masks \n",
- " key_masks = tf.tile(key_masks, [num_heads, 1]) \n",
- " key_masks = tf.tile(tf.expand_dims(key_masks, 1), [1, T_q, 1]) \n",
- " align = tf.where(tf.equal(key_masks, 0), paddings, align) \n",
- "\n",
- " if future_binding:\n",
- " lower_tri = tf.ones([T_q, T_k]) \n",
- " lower_tri = tf.linalg.LinearOperatorLowerTriangular(lower_tri).to_dense() \n",
- " masks = tf.tile(tf.expand_dims(lower_tri,0), [tf.shape(align)[0], 1, 1]) \n",
- " align = tf.where(tf.equal(masks, 0), paddings, align) \n",
- " \n",
- " align = tf.nn.softmax(align) \n",
- " query_masks = tf.to_float(q_masks) \n",
- " query_masks = tf.tile(query_masks, [num_heads, 1]) \n",
- " query_masks = tf.tile(tf.expand_dims(query_masks, -1), [1, 1, T_k]) \n",
- " align *= query_masks\n",
- " outputs = tf.matmul(align, V_) \n",
- " outputs = tf.concat(tf.split(outputs, num_heads, axis=0), axis=2) \n",
- " outputs += queries \n",
- " outputs = layer_norm(outputs) \n",
- " return outputs\n",
- "\n",
- "\n",
- "def pointwise_feedforward(inputs, hidden_units, activation=None):\n",
- " outputs = tf.layers.dense(inputs, 4*hidden_units, activation=activation)\n",
- " outputs = tf.layers.dense(outputs, hidden_units, activation=None)\n",
- " outputs += inputs\n",
- " outputs = layer_norm(outputs)\n",
- " return outputs\n",
- "\n",
- "\n",
- "def learned_position_encoding(inputs, mask, embed_dim):\n",
- " T = tf.shape(inputs)[1]\n",
- " outputs = tf.range(tf.shape(inputs)[1]) # (T_q)\n",
- " outputs = tf.expand_dims(outputs, 0) # (1, T_q)\n",
- " outputs = tf.tile(outputs, [tf.shape(inputs)[0], 1]) # (N, T_q)\n",
- " outputs = embed_seq(outputs, T, embed_dim, zero_pad=False, scale=False)\n",
- " return tf.expand_dims(tf.to_float(mask), -1) * outputs\n",
- "\n",
- "\n",
- "def sinusoidal_position_encoding(inputs, mask, repr_dim):\n",
- " T = tf.shape(inputs)[1]\n",
- " pos = tf.reshape(tf.range(0.0, tf.to_float(T), dtype=tf.float32), [-1, 1])\n",
- " i = np.arange(0, repr_dim, 2, np.float32)\n",
- " denom = np.reshape(np.power(10000.0, i / repr_dim), [1, -1])\n",
- " enc = tf.expand_dims(tf.concat([tf.sin(pos / denom), tf.cos(pos / denom)], 1), 0)\n",
- " return tf.tile(enc, [tf.shape(inputs)[0], 1, 1]) * tf.expand_dims(tf.to_float(mask), -1)\n",
- "\n",
- "def label_smoothing(inputs, epsilon=0.1):\n",
- " C = inputs.get_shape().as_list()[-1]\n",
- " return ((1 - epsilon) * inputs) + (epsilon / C)\n",
- "\n",
- "\n",
- "class CRF:\n",
- " def __init__(self,\n",
- " dim_word,\n",
- " dim_char,\n",
- " dropout,\n",
- " learning_rate,\n",
- " hidden_size_char,\n",
- " hidden_size_word,\n",
- " maxlen,\n",
- " num_blocks = 2,\n",
- " num_heads = 8,\n",
- " min_freq = 50):\n",
- " \n",
- " self.word_ids = tf.placeholder(tf.int32, shape = [None, None])\n",
- " self.char_ids = tf.placeholder(tf.int32, shape = [None, None, None])\n",
- " self.labels = tf.placeholder(tf.int32, shape = [None, None])\n",
- " self.depends = tf.placeholder(tf.int32, shape = [None, None])\n",
- " self.maxlen = tf.shape(self.word_ids)[1]\n",
- " self.lengths = tf.count_nonzero(self.word_ids, 1)\n",
- " batch_size = tf.shape(self.word_ids)[0]\n",
- " \n",
- " self.word_embeddings = tf.Variable(\n",
- " tf.truncated_normal(\n",
- " [len(word2idx), dim_word], stddev = 1.0 / np.sqrt(dim_word)\n",
- " )\n",
- " )\n",
- " self.char_embeddings = tf.Variable(\n",
- " tf.truncated_normal(\n",
- " [len(char2idx), dim_char], stddev = 1.0 / np.sqrt(dim_char)\n",
- " )\n",
- " )\n",
- " \n",
- " word_embedded = tf.nn.embedding_lookup(\n",
- " self.word_embeddings, self.word_ids\n",
- " )\n",
- " char_embedded = tf.nn.embedding_lookup(\n",
- " self.char_embeddings, self.char_ids\n",
- " )\n",
- " s = tf.shape(char_embedded)\n",
- " char_embedded = tf.reshape(\n",
- " char_embedded, shape = [s[0] * s[1], s[-2], dim_char]\n",
- " )\n",
- " reshape_char = tf.reshape(self.char_ids, shape = [s[0] * s[1], s[-2]])\n",
- " char_masked = tf.sign(reshape_char)\n",
- " char_embedded += sinusoidal_position_encoding(reshape_char, char_masked, dim_char)\n",
- " for i in range(num_blocks):\n",
- " with tf.variable_scope('char_%d'%i,reuse=tf.AUTO_REUSE):\n",
- " char_embedded = multihead_attn(queries = char_embedded,\n",
- " keys = char_embedded,\n",
- " q_masks = char_masked,\n",
- " k_masks = char_masked,\n",
- " future_binding = False,\n",
- " num_units = dim_char,\n",
- " num_heads = num_heads)\n",
- " with tf.variable_scope('char_feedforward_%d'%i,reuse=tf.AUTO_REUSE):\n",
- " char_embedded = pointwise_feedforward(char_embedded,\n",
- " dim_char,\n",
- " activation = tf.nn.relu)\n",
- " output = tf.reshape(\n",
- " char_embedded[:, -1], shape = [s[0], s[1], 2 * hidden_size_char]\n",
- " )\n",
- " \n",
- " decoder_embedded = tf.concat([word_embedded, output], axis = -1)\n",
- " decoder_embedded = tf.layers.dense(word_embedded, dim_char)\n",
- " de_masks = tf.sign(self.word_ids)\n",
- " \n",
- " decoder_embedded += sinusoidal_position_encoding(self.word_ids, de_masks, dim_char)\n",
- " \n",
- " for i in range(num_blocks):\n",
- " with tf.variable_scope('word_char_%d'%i,reuse=tf.AUTO_REUSE):\n",
- " decoder_embedded = multihead_attn(queries = decoder_embedded,\n",
- " keys = decoder_embedded,\n",
- " q_masks = de_masks,\n",
- " k_masks = de_masks,\n",
- " future_binding = True,\n",
- " num_units = dim_char,\n",
- " num_heads = num_heads)\n",
- " \n",
- " with tf.variable_scope('word_char_attention_%d'%i,reuse=tf.AUTO_REUSE):\n",
- " decoder_embedded = multihead_attn(queries = decoder_embedded,\n",
- " keys = output,\n",
- " q_masks = de_masks,\n",
- " k_masks = de_masks,\n",
- " future_binding = False,\n",
- " num_units = dim_char,\n",
- " num_heads = num_heads)\n",
- " \n",
- " with tf.variable_scope('word_feedforward_%d'%i,reuse=tf.AUTO_REUSE):\n",
- " decoder_embedded = pointwise_feedforward(decoder_embedded,\n",
- " dim_char,\n",
- " activation = tf.nn.relu)\n",
- " \n",
- " logits = tf.layers.dense(decoder_embedded, len(idx2tag))\n",
- " \n",
- " log_likelihood, transition_params = tf.contrib.crf.crf_log_likelihood(\n",
- " logits, self.labels, self.lengths\n",
- " )\n",
- " \n",
- " tag_embeddings = tf.Variable(\n",
- " tf.truncated_normal(\n",
- " [len(idx2tag), dim_char], stddev = 1.0 / np.sqrt(dim_char)\n",
- " )\n",
- " )\n",
- " logits_max = tf.argmax(logits,axis=2,output_type=tf.int32)\n",
- " lookup_logits = tf.nn.embedding_lookup(\n",
- " tag_embeddings, logits_max\n",
- " )\n",
- " \n",
- " lookup_logits += sinusoidal_position_encoding(logits_max, de_masks, dim_char)\n",
- " \n",
- " for i in range(num_blocks):\n",
- " with tf.variable_scope('depend_%d'%i,reuse=tf.AUTO_REUSE):\n",
- " lookup_logits = multihead_attn(queries = lookup_logits,\n",
- " keys = lookup_logits,\n",
- " q_masks = de_masks,\n",
- " k_masks = de_masks,\n",
- " future_binding = True,\n",
- " num_units = dim_char,\n",
- " num_heads = num_heads)\n",
- " \n",
- " with tf.variable_scope('depend_attention_%d'%i,reuse=tf.AUTO_REUSE):\n",
- " lookup_logits = multihead_attn(queries = lookup_logits,\n",
- " keys = decoder_embedded,\n",
- " q_masks = de_masks,\n",
- " k_masks = de_masks,\n",
- " future_binding = False,\n",
- " num_units = dim_char,\n",
- " num_heads = num_heads)\n",
- " \n",
- " with tf.variable_scope('depend_feedforward_%d'%i,reuse=tf.AUTO_REUSE):\n",
- " lookup_logits = pointwise_feedforward(lookup_logits,\n",
- " dim_char,\n",
- " activation = tf.nn.relu)\n",
- " \n",
- " cast_mask = tf.cast(tf.sequence_mask(self.lengths + 1, maxlen = maxlen), dtype = tf.float32)\n",
- " cast_mask = tf.tile(tf.expand_dims(cast_mask,axis=1),[1,self.maxlen,1]) * 10\n",
- " \n",
- " logits_depends = tf.layers.dense(lookup_logits, maxlen)\n",
- " logits_depends = tf.multiply(logits_depends, cast_mask)\n",
- " \n",
- " with tf.variable_scope(\"depends\"):\n",
- " log_likelihood_depends, transition_params_depends = tf.contrib.crf.crf_log_likelihood(\n",
- " logits_depends, self.depends, self.lengths\n",
- " )\n",
- " \n",
- " self.cost = tf.reduce_mean(-log_likelihood) + tf.reduce_mean(-log_likelihood_depends)\n",
- " self.optimizer = tf.train.AdamOptimizer(\n",
- " learning_rate = learning_rate\n",
- " ).minimize(self.cost)\n",
- " \n",
- " mask = tf.sequence_mask(self.lengths, maxlen = self.maxlen)\n",
- " \n",
- " self.tags_seq, _ = tf.contrib.crf.crf_decode(\n",
- " logits, transition_params, self.lengths\n",
- " )\n",
- " self.tags_seq = tf.identity(self.tags_seq, name = 'logits')\n",
- " \n",
- " self.tags_seq_depends, _ = tf.contrib.crf.crf_decode(\n",
- " logits_depends, transition_params_depends, self.lengths\n",
- " )\n",
- " self.tags_seq_depends = tf.identity(self.tags_seq_depends, name = 'logits_depends')\n",
- "\n",
- " self.prediction = tf.boolean_mask(self.tags_seq, mask)\n",
- " mask_label = tf.boolean_mask(self.labels, mask)\n",
- " correct_pred = tf.equal(self.prediction, mask_label)\n",
- " correct_index = tf.cast(correct_pred, tf.float32)\n",
- " self.accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32))\n",
- " \n",
- " self.prediction = tf.boolean_mask(self.tags_seq_depends, mask)\n",
- " mask_label = tf.boolean_mask(self.depends, mask)\n",
- " correct_pred = tf.equal(self.prediction, mask_label)\n",
- " correct_index = tf.cast(correct_pred, tf.float32)\n",
- " self.accuracy_depends = tf.reduce_mean(tf.cast(correct_pred, tf.float32))"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": 16,
- "metadata": {},
- "outputs": [
- {
- "name": "stderr",
- "output_type": "stream",
- "text": [
- "/usr/local/lib/python3.6/dist-packages/tensorflow/python/ops/gradients_impl.py:112: UserWarning: Converting sparse IndexedSlices to a dense Tensor of unknown shape. This may consume a large amount of memory.\n",
- " \"Converting sparse IndexedSlices to a dense Tensor of unknown shape. \"\n"
- ]
- }
- ],
- "source": [
- "tf.reset_default_graph()\n",
- "sess = tf.InteractiveSession()\n",
- "\n",
- "dim_word = 128\n",
- "dim_char = 256\n",
- "dropout = 0.8\n",
- "learning_rate = 1e-3\n",
- "hidden_size_char = 128\n",
- "hidden_size_word = 64\n",
- "batch_size = 8\n",
- "\n",
- "model = CRF(dim_word = dim_word,\n",
- " dim_char = dim_char,\n",
- " dropout = dropout,\n",
- " learning_rate = learning_rate,\n",
- " hidden_size_char = hidden_size_char,\n",
- " hidden_size_word = hidden_size_word,\n",
- " maxlen = words.shape[1])\n",
- "sess.run(tf.global_variables_initializer())"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": 17,
- "metadata": {},
- "outputs": [
- {
- "name": "stderr",
- "output_type": "stream",
- "text": [
- "train minibatch loop: 100%|██████████| 5666/5666 [1:03:18<00:00, 1.51it/s, accuracy=0.756, accuracy_depends=0.524, cost=51.9] \n",
- "test minibatch loop: 100%|██████████| 630/630 [02:52<00:00, 4.17it/s, accuracy=0.707, accuracy_depends=0.515, cost=53.7]\n",
- "train minibatch loop: 0%| | 0/5666 [00:00, ?it/s]"
- ]
- },
- {
- "name": "stdout",
- "output_type": "stream",
- "text": [
- "time taken: 3971.3269250392914\n",
- "epoch: 0, training loss: 74.662333, training acc: 0.687690, training depends: 0.423968, valid loss: 53.324856, valid acc: 0.745969, valid depends: 0.509276\n",
- "\n"
- ]
- },
- {
- "name": "stderr",
- "output_type": "stream",
- "text": [
- "train minibatch loop: 100%|██████████| 5666/5666 [1:03:24<00:00, 1.50it/s, accuracy=0.815, accuracy_depends=0.673, cost=37.4]\n",
- "test minibatch loop: 100%|██████████| 630/630 [02:51<00:00, 4.11it/s, accuracy=0.717, accuracy_depends=0.566, cost=44.6]\n",
- "train minibatch loop: 0%| | 0/5666 [00:00, ?it/s]"
- ]
- },
- {
- "name": "stdout",
- "output_type": "stream",
- "text": [
- "time taken: 3975.9911386966705\n",
- "epoch: 1, training loss: 41.632860, training acc: 0.800347, training depends: 0.614527, valid loss: 38.777110, valid acc: 0.807298, valid depends: 0.642584\n",
- "\n"
- ]
- },
- {
- "name": "stderr",
- "output_type": "stream",
- "text": [
- "train minibatch loop: 100%|██████████| 5666/5666 [1:03:28<00:00, 1.51it/s, accuracy=0.81, accuracy_depends=0.726, cost=32.1] \n",
- "test minibatch loop: 100%|██████████| 630/630 [02:52<00:00, 4.12it/s, accuracy=0.838, accuracy_depends=0.677, cost=34.6]\n",
- "train minibatch loop: 0%| | 0/5666 [00:00, ?it/s]"
- ]
- },
- {
- "name": "stdout",
- "output_type": "stream",
- "text": [
- "time taken: 3980.6265354156494\n",
- "epoch: 2, training loss: 34.020197, training acc: 0.828245, training depends: 0.679861, valid loss: 33.156008, valid acc: 0.823669, valid depends: 0.699404\n",
- "\n"
- ]
- },
- {
- "name": "stderr",
- "output_type": "stream",
- "text": [
- "train minibatch loop: 100%|██████████| 5666/5666 [1:03:29<00:00, 1.50it/s, accuracy=0.839, accuracy_depends=0.75, cost=28.7] \n",
- "test minibatch loop: 100%|██████████| 630/630 [02:51<00:00, 4.12it/s, accuracy=0.808, accuracy_depends=0.717, cost=36.1]\n",
- "train minibatch loop: 0%| | 0/5666 [00:00, ?it/s]"
- ]
- },
- {
- "name": "stdout",
- "output_type": "stream",
- "text": [
- "time taken: 3980.9095969200134\n",
- "epoch: 3, training loss: 28.242658, training acc: 0.845301, training depends: 0.740665, valid loss: 28.623581, valid acc: 0.831968, valid depends: 0.757451\n",
- "\n"
- ]
- },
- {
- "name": "stderr",
- "output_type": "stream",
- "text": [
- "train minibatch loop: 100%|██████████| 5666/5666 [1:03:32<00:00, 1.51it/s, accuracy=0.839, accuracy_depends=0.78, cost=27] \n",
- "test minibatch loop: 100%|██████████| 630/630 [02:52<00:00, 4.19it/s, accuracy=0.788, accuracy_depends=0.828, cost=25.5]\n",
- "train minibatch loop: 0%| | 0/5666 [00:00, ?it/s]"
- ]
- },
- {
- "name": "stdout",
- "output_type": "stream",
- "text": [
- "time taken: 3985.202503681183\n",
- "epoch: 4, training loss: 23.337289, training acc: 0.859588, training depends: 0.795232, valid loss: 25.202329, valid acc: 0.840135, valid depends: 0.799228\n",
- "\n"
- ]
- },
- {
- "name": "stderr",
- "output_type": "stream",
- "text": [
- "train minibatch loop: 100%|██████████| 5666/5666 [1:03:33<00:00, 1.51it/s, accuracy=0.869, accuracy_depends=0.821, cost=21.9]\n",
- "test minibatch loop: 100%|██████████| 630/630 [02:51<00:00, 4.14it/s, accuracy=0.838, accuracy_depends=0.768, cost=23.9]\n",
- "train minibatch loop: 0%| | 0/5666 [00:00, ?it/s]"
- ]
- },
- {
- "name": "stdout",
- "output_type": "stream",
- "text": [
- "time taken: 3985.2355260849\n",
- "epoch: 5, training loss: 18.881109, training acc: 0.873684, training depends: 0.846420, valid loss: 22.490008, valid acc: 0.849709, valid depends: 0.828853\n",
- "\n"
- ]
- },
- {
- "name": "stderr",
- "output_type": "stream",
- "text": [
- "train minibatch loop: 100%|██████████| 5666/5666 [1:03:32<00:00, 1.50it/s, accuracy=0.863, accuracy_depends=0.827, cost=21.8]\n",
- "test minibatch loop: 100%|██████████| 630/630 [02:52<00:00, 4.14it/s, accuracy=0.848, accuracy_depends=0.838, cost=20.6]\n",
- "train minibatch loop: 0%| | 0/5666 [00:00, ?it/s]"
- ]
- },
- {
- "name": "stdout",
- "output_type": "stream",
- "text": [
- "time taken: 3985.4282426834106\n",
- "epoch: 6, training loss: 15.691809, training acc: 0.885103, training depends: 0.882228, valid loss: 19.544368, valid acc: 0.861741, valid depends: 0.863059\n",
- "\n"
- ]
- },
- {
- "name": "stderr",
- "output_type": "stream",
- "text": [
- "train minibatch loop: 100%|██████████| 5666/5666 [1:03:33<00:00, 1.50it/s, accuracy=0.869, accuracy_depends=0.887, cost=17.2]\n",
- "test minibatch loop: 100%|██████████| 630/630 [02:52<00:00, 4.10it/s, accuracy=0.778, accuracy_depends=0.879, cost=26.3]\n",
- "train minibatch loop: 0%| | 0/5666 [00:00, ?it/s]"
- ]
- },
- {
- "name": "stdout",
- "output_type": "stream",
- "text": [
- "time taken: 3986.3802030086517\n",
- "epoch: 7, training loss: 13.322382, training acc: 0.895488, training depends: 0.906931, valid loss: 18.686160, valid acc: 0.859560, valid depends: 0.879505\n",
- "\n"
- ]
- },
- {
- "name": "stderr",
- "output_type": "stream",
- "text": [
- "train minibatch loop: 100%|██████████| 5666/5666 [1:03:32<00:00, 1.50it/s, accuracy=0.893, accuracy_depends=0.893, cost=15.4]\n",
- "test minibatch loop: 100%|██████████| 630/630 [02:52<00:00, 4.12it/s, accuracy=0.848, accuracy_depends=0.848, cost=21.4]\n",
- "train minibatch loop: 0%| | 0/5666 [00:00, ?it/s]"
- ]
- },
- {
- "name": "stdout",
- "output_type": "stream",
- "text": [
- "time taken: 3984.483060359955\n",
- "epoch: 8, training loss: 11.466844, training acc: 0.906599, training depends: 0.924221, valid loss: 16.073830, valid acc: 0.877447, valid depends: 0.899887\n",
- "\n"
- ]
- },
- {
- "name": "stderr",
- "output_type": "stream",
- "text": [
- "train minibatch loop: 100%|██████████| 5666/5666 [1:03:33<00:00, 1.50it/s, accuracy=0.899, accuracy_depends=0.905, cost=15.3]\n",
- "test minibatch loop: 100%|██████████| 630/630 [02:52<00:00, 4.08it/s, accuracy=0.838, accuracy_depends=0.848, cost=20.1]"
- ]
- },
- {
- "name": "stdout",
- "output_type": "stream",
- "text": [
- "time taken: 3986.1450822353363\n",
- "epoch: 9, training loss: 9.785712, training acc: 0.918415, training depends: 0.937485, valid loss: 16.119294, valid acc: 0.880889, valid depends: 0.899497\n",
- "\n"
- ]
- },
- {
- "name": "stderr",
- "output_type": "stream",
- "text": [
- "\n"
- ]
- }
- ],
- "source": [
- "import time\n",
- "\n",
- "for e in range(10):\n",
- " lasttime = time.time()\n",
- " train_acc, train_loss, test_acc, test_loss, train_acc_depends, test_acc_depends = 0, 0, 0, 0, 0, 0\n",
- " pbar = tqdm(\n",
- " range(0, len(train_X), batch_size), desc = 'train minibatch loop'\n",
- " )\n",
- " for i in pbar:\n",
- " batch_x = train_X[i : min(i + batch_size, train_X.shape[0])]\n",
- " batch_char = train_char[i : min(i + batch_size, train_X.shape[0])]\n",
- " batch_y = train_Y[i : min(i + batch_size, train_X.shape[0])]\n",
- " batch_depends = train_depends[i : min(i + batch_size, train_X.shape[0])]\n",
- " acc_depends, acc, cost, _ = sess.run(\n",
- " [model.accuracy_depends, model.accuracy, model.cost, model.optimizer],\n",
- " feed_dict = {\n",
- " model.word_ids: batch_x,\n",
- " model.char_ids: batch_char,\n",
- " model.labels: batch_y,\n",
- " model.depends: batch_depends\n",
- " },\n",
- " )\n",
- " #assert not np.isnan(cost)\n",
- " train_loss += cost\n",
- " train_acc += acc\n",
- " train_acc_depends += acc_depends\n",
- " pbar.set_postfix(cost = cost, accuracy = acc, accuracy_depends = acc_depends)\n",
- " pbar = tqdm(\n",
- " range(0, len(test_X), batch_size), desc = 'test minibatch loop'\n",
- " )\n",
- " for i in pbar:\n",
- " batch_x = test_X[i : min(i + batch_size, test_X.shape[0])]\n",
- " batch_char = test_char[i : min(i + batch_size, test_X.shape[0])]\n",
- " batch_y = test_Y[i : min(i + batch_size, test_X.shape[0])]\n",
- " batch_depends = test_depends[i : min(i + batch_size, test_X.shape[0])]\n",
- " acc_depends, acc, cost = sess.run(\n",
- " [model.accuracy_depends, model.accuracy, model.cost],\n",
- " feed_dict = {\n",
- " model.word_ids: batch_x,\n",
- " model.char_ids: batch_char,\n",
- " model.labels: batch_y,\n",
- " model.depends: batch_depends\n",
- " },\n",
- " )\n",
- " #assert not np.isnan(cost)\n",
- " test_loss += cost\n",
- " test_acc += acc\n",
- " test_acc_depends += acc_depends\n",
- " pbar.set_postfix(cost = cost, accuracy = acc, accuracy_depends = acc_depends)\n",
- " \n",
- " train_loss /= len(train_X) / batch_size\n",
- " train_acc /= len(train_X) / batch_size\n",
- " train_acc_depends /= len(train_X) / batch_size\n",
- " test_loss /= len(test_X) / batch_size\n",
- " test_acc /= len(test_X) / batch_size\n",
- " test_acc_depends /= len(test_X) / batch_size\n",
- "\n",
- " print('time taken:', time.time() - lasttime)\n",
- " print(\n",
- " 'epoch: %d, training loss: %f, training acc: %f, training depends: %f, valid loss: %f, valid acc: %f, valid depends: %f\\n'\n",
- " % (e, train_loss, train_acc, train_acc_depends, test_loss, test_acc, test_acc_depends)\n",
- " )"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": 20,
- "metadata": {},
- "outputs": [
- {
- "name": "stderr",
- "output_type": "stream",
- "text": [
- "train minibatch loop: 100%|██████████| 5666/5666 [1:03:34<00:00, 1.50it/s, accuracy=0.946, accuracy_depends=0.923, cost=9.98] \n",
- "test minibatch loop: 100%|██████████| 630/630 [02:52<00:00, 4.13it/s, accuracy=0.879, accuracy_depends=0.899, cost=16.5]\n",
- "train minibatch loop: 0%| | 0/5666 [00:00, ?it/s]"
- ]
- },
- {
- "name": "stdout",
- "output_type": "stream",
- "text": [
- "time taken: 3986.8803112506866\n",
- "epoch: 0, training loss: 4.550854, training acc: 0.962310, training depends: 0.971532, valid loss: 12.594443, valid acc: 0.914797, valid depends: 0.934598\n",
- "\n"
- ]
- },
- {
- "name": "stderr",
- "output_type": "stream",
- "text": [
- "train minibatch loop: 100%|██████████| 5666/5666 [1:03:33<00:00, 1.51it/s, accuracy=0.94, accuracy_depends=0.946, cost=8.71] \n",
- "test minibatch loop: 100%|██████████| 630/630 [02:52<00:00, 4.17it/s, accuracy=0.859, accuracy_depends=0.899, cost=21.7]\n",
- "train minibatch loop: 0%| | 0/5666 [00:00, ?it/s]"
- ]
- },
- {
- "name": "stdout",
- "output_type": "stream",
- "text": [
- "time taken: 3985.9828474521637\n",
- "epoch: 1, training loss: 5.258419, training acc: 0.958980, training depends: 0.965066, valid loss: 13.166835, valid acc: 0.907223, valid depends: 0.925843\n",
- "\n"
- ]
- },
- {
- "name": "stderr",
- "output_type": "stream",
- "text": [
- "train minibatch loop: 100%|██████████| 5666/5666 [1:03:32<00:00, 1.49it/s, accuracy=0.964, accuracy_depends=0.946, cost=6.36] \n",
- "test minibatch loop: 100%|██████████| 630/630 [02:52<00:00, 4.13it/s, accuracy=0.899, accuracy_depends=0.909, cost=19.7]"
- ]
- },
- {
- "name": "stdout",
- "output_type": "stream",
- "text": [
- "time taken: 3985.3402976989746\n",
- "epoch: 2, training loss: 3.728318, training acc: 0.968881, training depends: 0.976212, valid loss: 11.758984, valid acc: 0.920203, valid depends: 0.942277\n",
- "\n"
- ]
- },
- {
- "name": "stderr",
- "output_type": "stream",
- "text": [
- "\n"
- ]
- }
- ],
- "source": [
- "import time\n",
- "\n",
- "for e in range(3):\n",
- " lasttime = time.time()\n",
- " train_acc, train_loss, test_acc, test_loss, train_acc_depends, test_acc_depends = 0, 0, 0, 0, 0, 0\n",
- " pbar = tqdm(\n",
- " range(0, len(train_X), batch_size), desc = 'train minibatch loop'\n",
- " )\n",
- " for i in pbar:\n",
- " batch_x = train_X[i : min(i + batch_size, train_X.shape[0])]\n",
- " batch_char = train_char[i : min(i + batch_size, train_X.shape[0])]\n",
- " batch_y = train_Y[i : min(i + batch_size, train_X.shape[0])]\n",
- " batch_depends = train_depends[i : min(i + batch_size, train_X.shape[0])]\n",
- " acc_depends, acc, cost, _ = sess.run(\n",
- " [model.accuracy_depends, model.accuracy, model.cost, model.optimizer],\n",
- " feed_dict = {\n",
- " model.word_ids: batch_x,\n",
- " model.char_ids: batch_char,\n",
- " model.labels: batch_y,\n",
- " model.depends: batch_depends\n",
- " },\n",
- " )\n",
- " #assert not np.isnan(cost)\n",
- " train_loss += cost\n",
- " train_acc += acc\n",
- " train_acc_depends += acc_depends\n",
- " pbar.set_postfix(cost = cost, accuracy = acc, accuracy_depends = acc_depends)\n",
- " pbar = tqdm(\n",
- " range(0, len(test_X), batch_size), desc = 'test minibatch loop'\n",
- " )\n",
- " for i in pbar:\n",
- " batch_x = test_X[i : min(i + batch_size, test_X.shape[0])]\n",
- " batch_char = test_char[i : min(i + batch_size, test_X.shape[0])]\n",
- " batch_y = test_Y[i : min(i + batch_size, test_X.shape[0])]\n",
- " batch_depends = test_depends[i : min(i + batch_size, test_X.shape[0])]\n",
- " acc_depends, acc, cost = sess.run(\n",
- " [model.accuracy_depends, model.accuracy, model.cost],\n",
- " feed_dict = {\n",
- " model.word_ids: batch_x,\n",
- " model.char_ids: batch_char,\n",
- " model.labels: batch_y,\n",
- " model.depends: batch_depends\n",
- " },\n",
- " )\n",
- " #assert not np.isnan(cost)\n",
- " test_loss += cost\n",
- " test_acc += acc\n",
- " test_acc_depends += acc_depends\n",
- " pbar.set_postfix(cost = cost, accuracy = acc, accuracy_depends = acc_depends)\n",
- " \n",
- " train_loss /= len(train_X) / batch_size\n",
- " train_acc /= len(train_X) / batch_size\n",
- " train_acc_depends /= len(train_X) / batch_size\n",
- " test_loss /= len(test_X) / batch_size\n",
- " test_acc /= len(test_X) / batch_size\n",
- " test_acc_depends /= len(test_X) / batch_size\n",
- "\n",
- " print('time taken:', time.time() - lasttime)\n",
- " print(\n",
- " 'epoch: %d, training loss: %f, training acc: %f, training depends: %f, valid loss: %f, valid acc: %f, valid depends: %f\\n'\n",
- " % (e, train_loss, train_acc, train_acc_depends, test_loss, test_acc, test_acc_depends)\n",
- " )"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": 21,
- "metadata": {},
- "outputs": [
- {
- "data": {
- "text/plain": [
- "['Placeholder',\n",
- " 'Placeholder_1',\n",
- " 'Placeholder_2',\n",
- " 'Placeholder_3',\n",
- " 'Variable',\n",
- " 'Variable_1',\n",
- " 'char_0/Q/kernel',\n",
- " 'char_0/Q/bias',\n",
- " 'char_0/K_V/kernel',\n",
- " 'char_0/K_V/bias',\n",
- " 'char_0/gamma',\n",
- " 'char_feedforward_0/dense/kernel',\n",
- " 'char_feedforward_0/dense/bias',\n",
- " 'char_feedforward_0/dense_1/kernel',\n",
- " 'char_feedforward_0/dense_1/bias',\n",
- " 'char_feedforward_0/gamma',\n",
- " 'char_1/Q/kernel',\n",
- " 'char_1/Q/bias',\n",
- " 'char_1/K_V/kernel',\n",
- " 'char_1/K_V/bias',\n",
- " 'char_1/gamma',\n",
- " 'char_feedforward_1/dense/kernel',\n",
- " 'char_feedforward_1/dense/bias',\n",
- " 'char_feedforward_1/dense_1/kernel',\n",
- " 'char_feedforward_1/dense_1/bias',\n",
- " 'char_feedforward_1/gamma',\n",
- " 'dense/kernel',\n",
- " 'dense/bias',\n",
- " 'word_char_0/Q/kernel',\n",
- " 'word_char_0/Q/bias',\n",
- " 'word_char_0/K_V/kernel',\n",
- " 'word_char_0/K_V/bias',\n",
- " 'word_char_0/gamma',\n",
- " 'word_char_attention_0/Q/kernel',\n",
- " 'word_char_attention_0/Q/bias',\n",
- " 'word_char_attention_0/K_V/kernel',\n",
- " 'word_char_attention_0/K_V/bias',\n",
- " 'word_char_attention_0/gamma',\n",
- " 'word_feedforward_0/dense/kernel',\n",
- " 'word_feedforward_0/dense/bias',\n",
- " 'word_feedforward_0/dense_1/kernel',\n",
- " 'word_feedforward_0/dense_1/bias',\n",
- " 'word_feedforward_0/gamma',\n",
- " 'word_char_1/Q/kernel',\n",
- " 'word_char_1/Q/bias',\n",
- " 'word_char_1/K_V/kernel',\n",
- " 'word_char_1/K_V/bias',\n",
- " 'word_char_1/gamma',\n",
- " 'word_char_attention_1/Q/kernel',\n",
- " 'word_char_attention_1/Q/bias',\n",
- " 'word_char_attention_1/K_V/kernel',\n",
- " 'word_char_attention_1/K_V/bias',\n",
- " 'word_char_attention_1/gamma',\n",
- " 'word_feedforward_1/dense/kernel',\n",
- " 'word_feedforward_1/dense/bias',\n",
- " 'word_feedforward_1/dense_1/kernel',\n",
- " 'word_feedforward_1/dense_1/bias',\n",
- " 'word_feedforward_1/gamma',\n",
- " 'dense_1/kernel',\n",
- " 'dense_1/bias',\n",
- " 'transitions',\n",
- " 'Variable_2',\n",
- " 'depend_0/Q/kernel',\n",
- " 'depend_0/Q/bias',\n",
- " 'depend_0/K_V/kernel',\n",
- " 'depend_0/K_V/bias',\n",
- " 'depend_0/gamma',\n",
- " 'depend_attention_0/Q/kernel',\n",
- " 'depend_attention_0/Q/bias',\n",
- " 'depend_attention_0/K_V/kernel',\n",
- " 'depend_attention_0/K_V/bias',\n",
- " 'depend_attention_0/gamma',\n",
- " 'depend_feedforward_0/dense/kernel',\n",
- " 'depend_feedforward_0/dense/bias',\n",
- " 'depend_feedforward_0/dense_1/kernel',\n",
- " 'depend_feedforward_0/dense_1/bias',\n",
- " 'depend_feedforward_0/gamma',\n",
- " 'depend_1/Q/kernel',\n",
- " 'depend_1/Q/bias',\n",
- " 'depend_1/K_V/kernel',\n",
- " 'depend_1/K_V/bias',\n",
- " 'depend_1/gamma',\n",
- " 'depend_attention_1/Q/kernel',\n",
- " 'depend_attention_1/Q/bias',\n",
- " 'depend_attention_1/K_V/kernel',\n",
- " 'depend_attention_1/K_V/bias',\n",
- " 'depend_attention_1/gamma',\n",
- " 'depend_feedforward_1/dense/kernel',\n",
- " 'depend_feedforward_1/dense/bias',\n",
- " 'depend_feedforward_1/dense_1/kernel',\n",
- " 'depend_feedforward_1/dense_1/bias',\n",
- " 'depend_feedforward_1/gamma',\n",
- " 'dense_2/kernel',\n",
- " 'dense_2/bias',\n",
- " 'depends/transitions',\n",
- " 'logits',\n",
- " 'logits_depends']"
- ]
- },
- "execution_count": 21,
- "metadata": {},
- "output_type": "execute_result"
- }
- ],
- "source": [
- "saver = tf.train.Saver(tf.trainable_variables())\n",
- "saver.save(sess, 'attention/model.ckpt')\n",
- "\n",
- "strings = ','.join(\n",
- " [\n",
- " n.name\n",
- " for n in tf.get_default_graph().as_graph_def().node\n",
- " if ('Variable' in n.op\n",
- " or 'Placeholder' in n.name\n",
- " or 'logits' in n.name\n",
- " or 'logits_depends' in n.name\n",
- " or 'alphas' in n.name)\n",
- " and 'Adam' not in n.name\n",
- " and 'beta' not in n.name\n",
- " and 'OptimizeLoss' not in n.name\n",
- " and 'Global_Step' not in n.name\n",
- " and 'Epoch_Step' not in n.name\n",
- " and 'learning_rate' not in n.name\n",
- " ]\n",
- ")\n",
- "strings.split(',')"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": 22,
- "metadata": {},
- "outputs": [],
- "source": [
- "def pred2label(pred):\n",
- " out = []\n",
- " for pred_i in pred:\n",
- " out_i = []\n",
- " for p in pred_i:\n",
- " out_i.append(idx2tag[p])\n",
- " out.append(out_i)\n",
- " return out"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": 23,
- "metadata": {},
- "outputs": [],
- "source": [
- "seq, deps = sess.run([model.tags_seq, model.tags_seq_depends],\n",
- " feed_dict = {\n",
- " model.word_ids: batch_x,\n",
- " model.char_ids: batch_char,\n",
- " },\n",
- ")"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": 24,
- "metadata": {},
- "outputs": [
- {
- "name": "stderr",
- "output_type": "stream",
- "text": [
- "validation minibatch loop: 100%|██████████| 630/630 [02:47<00:00, 4.18it/s]\n"
- ]
- }
- ],
- "source": [
- "real_Y, predict_Y, real_depends, predict_depends = [], [], [], []\n",
- "\n",
- "pbar = tqdm(\n",
- " range(0, len(test_X), batch_size), desc = 'validation minibatch loop'\n",
- ")\n",
- "for i in pbar:\n",
- " batch_x = test_X[i : min(i + batch_size, test_X.shape[0])]\n",
- " batch_char = test_char[i : min(i + batch_size, test_X.shape[0])]\n",
- " batch_y = test_Y[i : min(i + batch_size, test_X.shape[0])]\n",
- " batch_depends = test_depends[i : min(i + batch_size, test_X.shape[0])]\n",
- " seq, deps = sess.run([model.tags_seq, model.tags_seq_depends],\n",
- " feed_dict = {\n",
- " model.word_ids: batch_x,\n",
- " model.char_ids: batch_char,\n",
- " },\n",
- " )\n",
- " predicted = pred2label(seq)\n",
- " real = pred2label(batch_y)\n",
- " predict_Y.extend(predicted)\n",
- " real_Y.extend(real)\n",
- " \n",
- " real_depends.extend(batch_depends.tolist())\n",
- " predict_depends.extend(deps.tolist())"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": 25,
- "metadata": {},
- "outputs": [
- {
- "name": "stdout",
- "output_type": "stream",
- "text": [
- " precision recall f1-score support\n",
- "\n",
- " PAD 1.0000 1.0000 1.0000 841796\n",
- " acl 0.8768 0.8849 0.8809 3016\n",
- " advcl 0.8290 0.7943 0.8113 1196\n",
- " advmod 0.9043 0.9163 0.9102 4754\n",
- " amod 0.9121 0.8773 0.8943 4149\n",
- " appos 0.8934 0.8983 0.8958 2547\n",
- " aux 1.0000 1.0000 1.0000 6\n",
- " case 0.9593 0.9670 0.9631 10888\n",
- " cc 0.9523 0.9606 0.9564 3198\n",
- " ccomp 0.7984 0.7385 0.7673 413\n",
- " compound 0.8677 0.8956 0.8815 6679\n",
- "compound:plur 0.9073 0.9255 0.9163 550\n",
- " conj 0.8625 0.9330 0.8964 4162\n",
- " cop 0.9296 0.9679 0.9484 996\n",
- " csubj 0.9000 0.4091 0.5625 22\n",
- " csubj:pass 0.8462 0.8462 0.8462 13\n",
- " dep 0.8274 0.7377 0.7800 507\n",
- " det 0.8897 0.9196 0.9044 4094\n",
- " fixed 0.8851 0.7966 0.8385 580\n",
- " flat 0.9468 0.9198 0.9331 10333\n",
- " iobj 1.0000 0.6000 0.7500 20\n",
- " mark 0.8535 0.8447 0.8491 1359\n",
- " nmod 0.8749 0.8907 0.8827 4107\n",
- " nsubj 0.8746 0.8881 0.8813 6471\n",
- " nsubj:pass 0.8478 0.7116 0.7738 1949\n",
- " nummod 0.9568 0.9524 0.9546 3884\n",
- " obj 0.9082 0.8946 0.9013 5274\n",
- " obl 0.9203 0.8854 0.9025 5740\n",
- " parataxis 0.7980 0.7980 0.7980 391\n",
- " punct 0.9933 0.9957 0.9945 16561\n",
- " root 0.8974 0.9200 0.9085 5037\n",
- " xcomp 0.8580 0.8593 0.8587 1301\n",
- "\n",
- " avg / total 0.9906 0.9906 0.9906 951993\n",
- "\n"
- ]
- }
- ],
- "source": [
- "from sklearn.metrics import classification_report\n",
- "print(classification_report(np.array(real_Y).ravel(), np.array(predict_Y).ravel(), digits = 4))"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": 26,
- "metadata": {},
- "outputs": [
- {
- "name": "stdout",
- "output_type": "stream",
- "text": [
- " precision recall f1-score support\n",
- "\n",
- " 0 1.0000 1.0000 1.0000 841796\n",
- " 1 0.9486 0.9277 0.9381 5037\n",
- " 2 0.9157 0.9547 0.9348 4325\n",
- " 3 0.9505 0.9137 0.9318 4856\n",
- " 4 0.9439 0.9311 0.9374 6309\n",
- " 5 0.9422 0.9396 0.9409 6540\n",
- " 6 0.9314 0.9516 0.9414 5697\n",
- " 7 0.9468 0.9461 0.9464 5414\n",
- " 8 0.9524 0.9394 0.9458 5559\n",
- " 9 0.9432 0.9421 0.9427 5028\n",
- " 10 0.9308 0.9544 0.9425 4300\n",
- " 11 0.9623 0.9323 0.9471 4358\n",
- " 12 0.9449 0.9493 0.9471 3903\n",
- " 13 0.9338 0.9442 0.9390 3497\n",
- " 14 0.9444 0.9475 0.9459 3445\n",
- " 15 0.9445 0.9487 0.9466 3177\n",
- " 16 0.9411 0.9589 0.9500 3068\n",
- " 17 0.9350 0.9589 0.9468 2774\n",
- " 18 0.9527 0.9352 0.9439 2499\n",
- " 19 0.9767 0.9207 0.9478 2319\n",
- " 20 0.9445 0.9558 0.9501 2013\n",
- " 21 0.9321 0.9374 0.9347 2124\n",
- " 22 0.9337 0.9423 0.9380 1749\n",
- " 23 0.9508 0.9175 0.9339 1685\n",
- " 24 0.9608 0.9240 0.9421 1540\n",
- " 25 0.8654 0.9661 0.9130 1358\n",
- " 26 0.9511 0.9245 0.9376 1179\n",
- " 27 0.9416 0.9367 0.9392 1154\n",
- " 28 0.8961 0.9549 0.9245 975\n",
- " 29 0.9260 0.9383 0.9321 1054\n",
- " 30 0.9342 0.9551 0.9445 1025\n",
- " 31 0.9482 0.9146 0.9311 761\n",
- " 32 0.9549 0.9126 0.9333 835\n",
- " 33 0.9235 0.9506 0.9368 749\n",
- " 34 0.9492 0.9465 0.9478 710\n",
- " 35 0.9323 0.9649 0.9483 599\n",
- " 36 0.9750 0.9458 0.9602 535\n",
- " 37 0.9363 0.9620 0.9490 474\n",
- " 38 0.9099 0.9815 0.9443 432\n",
- " 39 0.9462 0.9342 0.9401 395\n",
- " 40 0.9170 0.9535 0.9349 452\n",
- " 41 0.9446 0.9214 0.9328 407\n",
- " 42 0.9452 0.9452 0.9452 292\n",
- " 43 0.9731 0.9031 0.9368 320\n",
- " 44 0.9030 0.9767 0.9384 343\n",
- " 45 0.9343 0.9812 0.9572 319\n",
- " 46 0.9943 0.7955 0.8838 220\n",
- " 47 0.9420 0.9684 0.9550 285\n",
- " 48 0.9160 0.9745 0.9443 235\n",
- " 49 0.9113 0.9893 0.9487 187\n",
- " 50 0.9568 0.8636 0.9078 154\n",
- " 51 0.9706 0.9538 0.9621 173\n",
- " 52 0.9554 0.9934 0.9740 151\n",
- " 53 0.9116 0.9515 0.9311 206\n",
- " 54 0.9008 0.9833 0.9402 120\n",
- " 55 0.9371 0.9371 0.9371 159\n",
- " 56 0.9179 0.9535 0.9354 129\n",
- " 57 0.9091 0.8824 0.8955 102\n",
- " 58 0.9350 0.9127 0.9237 126\n",
- " 59 0.9725 0.7910 0.8724 134\n",
- " 60 0.9576 0.9826 0.9700 115\n",
- " 61 0.9200 0.9485 0.9340 97\n",
- " 62 0.9200 0.9079 0.9139 76\n",
- " 63 0.9551 0.9770 0.9659 87\n",
- " 64 0.9878 0.9310 0.9586 87\n",
- " 65 0.9103 0.9861 0.9467 72\n",
- " 66 0.9474 0.9863 0.9664 73\n",
- " 67 1.0000 0.9667 0.9831 60\n",
- " 68 0.9855 0.8831 0.9315 77\n",
- " 69 0.8889 0.9231 0.9057 52\n",
- " 70 0.9524 1.0000 0.9756 80\n",
- " 71 0.9241 0.9605 0.9419 76\n",
- " 72 0.9870 0.9870 0.9870 77\n",
- " 73 0.9531 1.0000 0.9760 61\n",
- " 74 1.0000 0.9667 0.9831 30\n",
- " 75 0.9412 1.0000 0.9697 64\n",
- " 76 1.0000 0.8571 0.9231 28\n",
- " 77 0.9487 1.0000 0.9737 37\n",
- " 78 0.9677 0.9677 0.9677 31\n",
- " 79 1.0000 1.0000 1.0000 25\n",
- " 80 1.0000 0.9348 0.9663 46\n",
- " 81 1.0000 0.9756 0.9877 41\n",
- " 82 1.0000 0.9302 0.9639 43\n",
- " 83 0.9474 1.0000 0.9730 18\n",
- " 84 0.8846 1.0000 0.9388 23\n",
- " 85 0.9583 1.0000 0.9787 23\n",
- " 86 1.0000 0.8636 0.9268 44\n",
- " 87 1.0000 1.0000 1.0000 10\n",
- " 88 0.9412 0.9412 0.9412 17\n",
- " 89 1.0000 0.8750 0.9333 8\n",
- " 90 0.9167 0.9565 0.9362 23\n",
- " 91 1.0000 1.0000 1.0000 15\n",
- " 92 1.0000 1.0000 1.0000 34\n",
- " 93 0.8571 1.0000 0.9231 6\n",
- " 94 0.9231 1.0000 0.9600 12\n",
- " 95 1.0000 1.0000 1.0000 9\n",
- " 96 1.0000 0.9333 0.9655 15\n",
- " 97 1.0000 1.0000 1.0000 30\n",
- " 98 1.0000 1.0000 1.0000 8\n",
- " 99 1.0000 0.9200 0.9583 25\n",
- " 100 0.8571 1.0000 0.9231 6\n",
- " 101 1.0000 0.9744 0.9870 39\n",
- " 102 1.0000 1.0000 1.0000 7\n",
- " 103 0.8889 1.0000 0.9412 16\n",
- " 104 1.0000 0.9500 0.9744 20\n",
- " 105 1.0000 0.9000 0.9474 10\n",
- " 106 0.9500 1.0000 0.9744 19\n",
- " 107 0.7500 1.0000 0.8571 27\n",
- " 108 1.0000 1.0000 1.0000 15\n",
- " 109 1.0000 1.0000 1.0000 3\n",
- " 110 1.0000 1.0000 1.0000 14\n",
- " 111 1.0000 1.0000 1.0000 9\n",
- " 112 0.9474 1.0000 0.9730 18\n",
- " 113 0.8571 1.0000 0.9231 6\n",
- " 114 1.0000 1.0000 1.0000 10\n",
- " 115 1.0000 1.0000 1.0000 7\n",
- " 116 1.0000 0.9375 0.9677 16\n",
- " 117 1.0000 0.5000 0.6667 2\n",
- " 118 1.0000 1.0000 1.0000 12\n",
- " 119 1.0000 1.0000 1.0000 4\n",
- " 120 1.0000 0.9231 0.9600 13\n",
- " 121 1.0000 1.0000 1.0000 6\n",
- " 122 1.0000 1.0000 1.0000 3\n",
- " 123 1.0000 0.8333 0.9091 6\n",
- " 124 1.0000 1.0000 1.0000 2\n",
- " 125 1.0000 1.0000 1.0000 2\n",
- " 126 0.8846 1.0000 0.9388 23\n",
- " 127 1.0000 1.0000 1.0000 6\n",
- " 128 1.0000 1.0000 1.0000 5\n",
- " 129 1.0000 0.8333 0.9091 6\n",
- " 130 1.0000 1.0000 1.0000 12\n",
- " 131 1.0000 0.7143 0.8333 7\n",
- " 132 1.0000 1.0000 1.0000 2\n",
- " 133 1.0000 1.0000 1.0000 4\n",
- " 134 0.9000 0.9000 0.9000 10\n",
- " 135 0.8571 1.0000 0.9231 6\n",
- " 136 1.0000 1.0000 1.0000 7\n",
- " 137 1.0000 1.0000 1.0000 8\n",
- " 138 1.0000 1.0000 1.0000 12\n",
- " 139 1.0000 1.0000 1.0000 1\n",
- " 140 1.0000 1.0000 1.0000 2\n",
- " 141 1.0000 1.0000 1.0000 2\n",
- " 142 1.0000 1.0000 1.0000 4\n",
- " 144 1.0000 1.0000 1.0000 4\n",
- " 146 1.0000 1.0000 1.0000 3\n",
- " 147 1.0000 1.0000 1.0000 7\n",
- " 149 1.0000 1.0000 1.0000 2\n",
- " 150 1.0000 1.0000 1.0000 2\n",
- " 151 1.0000 1.0000 1.0000 2\n",
- " 152 1.0000 1.0000 1.0000 1\n",
- " 153 1.0000 1.0000 1.0000 1\n",
- " 154 1.0000 1.0000 1.0000 2\n",
- " 156 1.0000 1.0000 1.0000 6\n",
- " 157 1.0000 1.0000 1.0000 1\n",
- " 158 1.0000 1.0000 1.0000 5\n",
- " 159 1.0000 1.0000 1.0000 1\n",
- " 160 1.0000 1.0000 1.0000 2\n",
- " 162 0.6667 0.6667 0.6667 3\n",
- " 163 0.6667 1.0000 0.8000 2\n",
- " 164 1.0000 1.0000 1.0000 2\n",
- " 167 1.0000 0.7500 0.8571 4\n",
- " 174 1.0000 1.0000 1.0000 2\n",
- " 176 1.0000 1.0000 1.0000 4\n",
- " 177 1.0000 1.0000 1.0000 2\n",
- " 178 1.0000 1.0000 1.0000 1\n",
- " 179 1.0000 1.0000 1.0000 1\n",
- " 182 1.0000 1.0000 1.0000 4\n",
- " 183 1.0000 1.0000 1.0000 4\n",
- "\n",
- "avg / total 0.9933 0.9932 0.9932 951993\n",
- "\n"
- ]
- }
- ],
- "source": [
- "from sklearn.metrics import classification_report\n",
- "print(classification_report(np.array(real_depends).ravel(), \n",
- " np.array(predict_depends).ravel(), digits = 4))"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": 27,
- "metadata": {},
- "outputs": [
- {
- "data": {
- "text/plain": [
- "['tolong', 'tangkap', 'gambar', 'kami']"
- ]
- },
- "execution_count": 27,
- "metadata": {},
- "output_type": "execute_result"
- }
- ],
- "source": [
- "string = 'tolong tangkap gambar kami'\n",
- "\n",
- "def char_str_idx(corpus, dic, UNK = 0):\n",
- " maxlen = max([len(i) for i in corpus])\n",
- " X = np.zeros((len(corpus), maxlen))\n",
- " for i in range(len(corpus)):\n",
- " for no, k in enumerate(corpus[i][:maxlen]):\n",
- " val = dic[k] if k in dic else UNK\n",
- " X[i, no] = val\n",
- " return X\n",
- "\n",
- "def generate_char_seq(batch, UNK = 2):\n",
- " maxlen_c = max([len(k) for k in batch])\n",
- " x = [[len(i) for i in k] for k in batch]\n",
- " maxlen = max([j for i in x for j in i])\n",
- " temp = np.zeros((len(batch),maxlen_c,maxlen),dtype=np.int32)\n",
- " for i in range(len(batch)):\n",
- " for k in range(len(batch[i])):\n",
- " for no, c in enumerate(batch[i][k][::-1]):\n",
- " temp[i,k,-1-no] = char2idx.get(c, UNK)\n",
- " return temp\n",
- "\n",
- "sequence = process_string(string)\n",
- "sequence"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": 28,
- "metadata": {},
- "outputs": [],
- "source": [
- "X_seq = char_str_idx([sequence], word2idx, 2)\n",
- "X_char_seq = generate_char_seq([sequence])"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": 29,
- "metadata": {},
- "outputs": [
- {
- "data": {
- "text/plain": [
- "(1, 4, 7)"
- ]
- },
- "execution_count": 29,
- "metadata": {},
- "output_type": "execute_result"
- }
- ],
- "source": [
- "X_char_seq.shape"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": 30,
- "metadata": {},
- "outputs": [],
- "source": [
- "seq, deps = sess.run([model.tags_seq, model.tags_seq_depends],\n",
- " feed_dict={model.word_ids:X_seq,\n",
- " model.char_ids:X_char_seq})"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": 31,
- "metadata": {},
- "outputs": [
- {
- "data": {
- "text/plain": [
- "array([2, 0, 3, 3], dtype=int32)"
- ]
- },
- "execution_count": 31,
- "metadata": {},
- "output_type": "execute_result"
- }
- ],
- "source": [
- "deps[0] - 1"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": 32,
- "metadata": {},
- "outputs": [
- {
- "data": {
- "text/plain": [
- "['advmod', 'root', 'case', 'nmod']"
- ]
- },
- "execution_count": 32,
- "metadata": {},
- "output_type": "execute_result"
- }
- ],
- "source": [
- "[idx2tag[i] for i in seq[0]]"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": 33,
- "metadata": {},
- "outputs": [],
- "source": [
- "import json\n",
- "with open('attention-is-all-you-need-dependency.json','w') as fopen:\n",
- " fopen.write(json.dumps({'idx2tag':idx2tag,'idx2word':idx2word,\n",
- " 'word2idx':word2idx,'tag2idx':tag2idx,'char2idx':char2idx}))"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": 34,
- "metadata": {},
- "outputs": [],
- "source": [
- "def freeze_graph(model_dir, output_node_names):\n",
- "\n",
- " if not tf.gfile.Exists(model_dir):\n",
- " raise AssertionError(\n",
- " \"Export directory doesn't exists. Please specify an export \"\n",
- " 'directory: %s' % model_dir\n",
- " )\n",
- "\n",
- " checkpoint = tf.train.get_checkpoint_state(model_dir)\n",
- " input_checkpoint = checkpoint.model_checkpoint_path\n",
- "\n",
- " absolute_model_dir = '/'.join(input_checkpoint.split('/')[:-1])\n",
- " output_graph = absolute_model_dir + '/frozen_model.pb'\n",
- " clear_devices = True\n",
- " with tf.Session(graph = tf.Graph()) as sess:\n",
- " saver = tf.train.import_meta_graph(\n",
- " input_checkpoint + '.meta', clear_devices = clear_devices\n",
- " )\n",
- " saver.restore(sess, input_checkpoint)\n",
- " output_graph_def = tf.graph_util.convert_variables_to_constants(\n",
- " sess,\n",
- " tf.get_default_graph().as_graph_def(),\n",
- " output_node_names.split(','),\n",
- " )\n",
- " with tf.gfile.GFile(output_graph, 'wb') as f:\n",
- " f.write(output_graph_def.SerializeToString())\n",
- " print('%d ops in the final graph.' % len(output_graph_def.node))\n",
- " \n",
- "def load_graph(frozen_graph_filename):\n",
- " with tf.gfile.GFile(frozen_graph_filename, 'rb') as f:\n",
- " graph_def = tf.GraphDef()\n",
- " graph_def.ParseFromString(f.read())\n",
- " with tf.Graph().as_default() as graph:\n",
- " tf.import_graph_def(graph_def)\n",
- " return graph"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": 36,
- "metadata": {},
- "outputs": [
- {
- "name": "stdout",
- "output_type": "stream",
- "text": [
- "INFO:tensorflow:Restoring parameters from attention/model.ckpt\n",
- "INFO:tensorflow:Froze 107 variables.\n",
- "INFO:tensorflow:Converted 107 variables to const ops.\n",
- "2815 ops in the final graph.\n"
- ]
- }
- ],
- "source": [
- "freeze_graph('attention', strings)"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": 37,
- "metadata": {},
- "outputs": [],
- "source": [
- "g = load_graph('attention/frozen_model.pb')"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": 38,
- "metadata": {},
- "outputs": [
- {
- "name": "stderr",
- "output_type": "stream",
- "text": [
- "/usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py:1702: UserWarning: An interactive session is already active. This can cause out-of-memory errors in some cases. You must explicitly call `InteractiveSession.close()` to release resources held by the other session(s).\n",
- " warnings.warn('An interactive session is already active. This can '\n"
- ]
- },
- {
- "name": "stdout",
- "output_type": "stream",
- "text": [
- "[[14 4 7 20]] [[3 1 4 4]]\n"
- ]
- }
- ],
- "source": [
- "word_ids = g.get_tensor_by_name('import/Placeholder:0')\n",
- "char_ids = g.get_tensor_by_name('import/Placeholder_1:0')\n",
- "tags_seq = g.get_tensor_by_name('import/logits:0')\n",
- "depends_seq = g.get_tensor_by_name('import/logits_depends:0')\n",
- "test_sess = tf.InteractiveSession(graph = g)\n",
- "seq, deps = test_sess.run([tags_seq, depends_seq],\n",
- " feed_dict = {\n",
- " word_ids: X_seq,\n",
- " char_ids: X_char_seq,\n",
- " })\n",
- "\n",
- "print(seq,deps)"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": 40,
- "metadata": {},
- "outputs": [
- {
- "data": {
- "text/plain": [
- "{0: 'PAD',\n",
- " 1: 'nsubj',\n",
- " 2: 'cop',\n",
- " 3: 'det',\n",
- " 4: 'root',\n",
- " 5: 'nsubj:pass',\n",
- " 6: 'acl',\n",
- " 7: 'case',\n",
- " 8: 'obl',\n",
- " 9: 'flat',\n",
- " 10: 'punct',\n",
- " 11: 'appos',\n",
- " 12: 'amod',\n",
- " 13: 'compound',\n",
- " 14: 'advmod',\n",
- " 15: 'cc',\n",
- " 16: 'obj',\n",
- " 17: 'conj',\n",
- " 18: 'mark',\n",
- " 19: 'advcl',\n",
- " 20: 'nmod',\n",
- " 21: 'nummod',\n",
- " 22: 'dep',\n",
- " 23: 'xcomp',\n",
- " 24: 'ccomp',\n",
- " 25: 'parataxis',\n",
- " 26: 'compound:plur',\n",
- " 27: 'fixed',\n",
- " 28: 'aux',\n",
- " 29: 'csubj',\n",
- " 30: 'iobj',\n",
- " 31: 'csubj:pass'}"
- ]
- },
- "execution_count": 40,
- "metadata": {},
- "output_type": "execute_result"
- }
- ],
- "source": [
- "idx2tag"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "metadata": {},
- "outputs": [],
- "source": []
- }
- ],
- "metadata": {
- "kernelspec": {
- "display_name": "Python 3",
- "language": "python",
- "name": "python3"
- },
- "language_info": {
- "codemirror_mode": {
- "name": "ipython",
- "version": 3
- },
- "file_extension": ".py",
- "mimetype": "text/x-python",
- "name": "python",
- "nbconvert_exporter": "python",
- "pygments_lexer": "ipython3",
- "version": "3.6.8"
- }
- },
- "nbformat": 4,
- "nbformat_minor": 2
-}
diff --git a/dependency-parser/5.biaffine-attention-cross-entropy.ipynb b/dependency-parser/5.biaffine-attention-cross-entropy.ipynb
new file mode 100644
index 0000000..dbee694
--- /dev/null
+++ b/dependency-parser/5.biaffine-attention-cross-entropy.ipynb
@@ -0,0 +1,1379 @@
+{
+ "cells": [
+ {
+ "cell_type": "code",
+ "execution_count": 1,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "# !wget https://raw.githubusercontent.com/UniversalDependencies/UD_English-EWT/master/en_ewt-ud-dev.conllu\n",
+ "# !wget https://raw.githubusercontent.com/UniversalDependencies/UD_English-EWT/master/en_ewt-ud-train.conllu\n",
+ "# !wget https://raw.githubusercontent.com/UniversalDependencies/UD_English-EWT/master/en_ewt-ud-test.conllu\n",
+ "# !pip install malaya -U"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 2,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "import os\n",
+ "os.environ['CUDA_VISIBLE_DEVICES'] = '1'"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 3,
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stderr",
+ "output_type": "stream",
+ "text": [
+ "/home/husein/.local/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:516: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n",
+ " _np_qint8 = np.dtype([(\"qint8\", np.int8, 1)])\n",
+ "/home/husein/.local/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:517: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n",
+ " _np_quint8 = np.dtype([(\"quint8\", np.uint8, 1)])\n",
+ "/home/husein/.local/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:518: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n",
+ " _np_qint16 = np.dtype([(\"qint16\", np.int16, 1)])\n",
+ "/home/husein/.local/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:519: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n",
+ " _np_quint16 = np.dtype([(\"quint16\", np.uint16, 1)])\n",
+ "/home/husein/.local/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:520: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n",
+ " _np_qint32 = np.dtype([(\"qint32\", np.int32, 1)])\n",
+ "/home/husein/.local/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:525: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n",
+ " np_resource = np.dtype([(\"resource\", np.ubyte, 1)])\n",
+ "/home/husein/.local/lib/python3.6/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:541: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n",
+ " _np_qint8 = np.dtype([(\"qint8\", np.int8, 1)])\n",
+ "/home/husein/.local/lib/python3.6/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:542: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n",
+ " _np_quint8 = np.dtype([(\"quint8\", np.uint8, 1)])\n",
+ "/home/husein/.local/lib/python3.6/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:543: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n",
+ " _np_qint16 = np.dtype([(\"qint16\", np.int16, 1)])\n",
+ "/home/husein/.local/lib/python3.6/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:544: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n",
+ " _np_quint16 = np.dtype([(\"quint16\", np.uint16, 1)])\n",
+ "/home/husein/.local/lib/python3.6/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:545: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n",
+ " _np_qint32 = np.dtype([(\"qint32\", np.int32, 1)])\n",
+ "/home/husein/.local/lib/python3.6/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:550: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n",
+ " np_resource = np.dtype([(\"resource\", np.ubyte, 1)])\n"
+ ]
+ }
+ ],
+ "source": [
+ "import malaya\n",
+ "import re\n",
+ "from malaya.texts._text_functions import split_into_sentences\n",
+ "from malaya.texts import _regex\n",
+ "import numpy as np\n",
+ "import itertools\n",
+ "import tensorflow as tf\n",
+ "from tensorflow.keras.preprocessing.sequence import pad_sequences\n",
+ "\n",
+ "tokenizer = malaya.preprocessing._tokenizer\n",
+ "splitter = split_into_sentences"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 4,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "def is_number_regex(s):\n",
+ " if re.match(\"^\\d+?\\.\\d+?$\", s) is None:\n",
+ " return s.isdigit()\n",
+ " return True\n",
+ "\n",
+ "def preprocessing(w):\n",
+ " if is_number_regex(w):\n",
+ " return ''\n",
+ " elif re.match(_regex._money, w):\n",
+ " return ''\n",
+ " elif re.match(_regex._date, w):\n",
+ " return ''\n",
+ " elif re.match(_regex._expressions['email'], w):\n",
+ " return ''\n",
+ " elif re.match(_regex._expressions['url'], w):\n",
+ " return ''\n",
+ " else:\n",
+ " w = ''.join(''.join(s)[:2] for _, s in itertools.groupby(w))\n",
+ " return w"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 5,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "({'PAD': 0,\n",
+ " 'UNK': 1,\n",
+ " '_ROOT': 2,\n",
+ " '': 3,\n",
+ " '': 4,\n",
+ " '': 5,\n",
+ " '': 6,\n",
+ " '': 7},\n",
+ " {'PAD': 0,\n",
+ " 'UNK': 1,\n",
+ " '_ROOT': 2,\n",
+ " '': 3,\n",
+ " '': 4,\n",
+ " '': 5,\n",
+ " '': 6,\n",
+ " '': 7})"
+ ]
+ },
+ "execution_count": 5,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "word2idx = {'PAD': 0,'UNK':1, '_ROOT': 2}\n",
+ "tag2idx = {'PAD': 0, '_': 1}\n",
+ "char2idx = {'PAD': 0,'UNK':1, '_ROOT': 2}\n",
+ "word_idx = 3\n",
+ "tag_idx = 2\n",
+ "char_idx = 3\n",
+ "\n",
+ "special_tokens = ['', '', '', '', '']\n",
+ "\n",
+ "for t in special_tokens:\n",
+ " word2idx[t] = word_idx\n",
+ " word_idx += 1\n",
+ " char2idx[t] = char_idx\n",
+ " char_idx += 1\n",
+ " \n",
+ "word2idx, char2idx"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 6,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "PAD = \"_PAD\"\n",
+ "PAD_POS = \"_PAD_POS\"\n",
+ "PAD_TYPE = \"_\"\n",
+ "PAD_CHAR = \"_PAD_CHAR\"\n",
+ "ROOT = \"_ROOT\"\n",
+ "ROOT_POS = \"_ROOT_POS\"\n",
+ "ROOT_TYPE = \"_\"\n",
+ "ROOT_CHAR = \"_ROOT_CHAR\"\n",
+ "END = \"_END\"\n",
+ "END_POS = \"_END_POS\"\n",
+ "END_TYPE = \"_\"\n",
+ "END_CHAR = \"_END_CHAR\"\n",
+ "\n",
+ "def process_corpus(corpus, until = None):\n",
+ " global word2idx, tag2idx, char2idx, word_idx, tag_idx, char_idx\n",
+ " sentences, words, depends, labels, pos, chars = [], [], [], [], [], []\n",
+ " temp_sentence, temp_word, temp_depend, temp_label, temp_pos = [], [], [], [], []\n",
+ " first_time = True\n",
+ " for sentence in corpus:\n",
+ " try:\n",
+ " if len(sentence):\n",
+ " if sentence[0] == '#':\n",
+ " continue\n",
+ " if first_time:\n",
+ " print(sentence)\n",
+ " first_time = False\n",
+ " sentence = sentence.split('\\t')\n",
+ " for c in sentence[1]:\n",
+ " if c not in char2idx:\n",
+ " char2idx[c] = char_idx\n",
+ " char_idx += 1\n",
+ " if sentence[7] not in tag2idx:\n",
+ " tag2idx[sentence[7]] = tag_idx\n",
+ " tag_idx += 1\n",
+ " sentence[1] = preprocessing(sentence[1])\n",
+ " if sentence[1] not in word2idx:\n",
+ " word2idx[sentence[1]] = word_idx\n",
+ " word_idx += 1\n",
+ " temp_word.append(word2idx[sentence[1]])\n",
+ " temp_depend.append(int(sentence[6]))\n",
+ " temp_label.append(tag2idx[sentence[7]])\n",
+ " temp_sentence.append(sentence[1])\n",
+ " temp_pos.append(sentence[3])\n",
+ " else:\n",
+ " if len(temp_sentence) < 2 or len(temp_word) != len(temp_label):\n",
+ " temp_word = []\n",
+ " temp_depend = []\n",
+ " temp_label = []\n",
+ " temp_sentence = []\n",
+ " temp_pos = []\n",
+ " continue\n",
+ " words.append(temp_word)\n",
+ " depends.append(temp_depend)\n",
+ " labels.append(temp_label)\n",
+ " sentences.append( temp_sentence)\n",
+ " pos.append(temp_pos)\n",
+ " char_ = [[char2idx['_ROOT']]]\n",
+ " for w in temp_sentence:\n",
+ " if w in char2idx:\n",
+ " char_.append([char2idx[w]])\n",
+ " else:\n",
+ " char_.append([char2idx[c] for c in w])\n",
+ " chars.append(char_)\n",
+ " temp_word = []\n",
+ " temp_depend = []\n",
+ " temp_label = []\n",
+ " temp_sentence = []\n",
+ " temp_pos = []\n",
+ " except Exception as e:\n",
+ " print(e, sentence)\n",
+ " return sentences[:-1], words[:-1], depends[:-1], labels[:-1], pos[:-1], chars[:-1]"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 7,
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "1\tFrom\tfrom\tADP\tIN\t_\t3\tcase\t3:case\t_\n",
+ "invalid literal for int() with base 10: '_' ['10.1', 'has', 'have', 'VERB', 'VBZ', '_', '_', '_', '8:parataxis', 'CopyOf=-1']\n",
+ "invalid literal for int() with base 10: '_' ['21.1', 'has', 'have', 'VERB', 'VBZ', '_', '_', '_', '16:conj:and', 'CopyOf=-1']\n"
+ ]
+ }
+ ],
+ "source": [
+ "with open('en_ewt-ud-dev.conllu') as fopen:\n",
+ " dev = fopen.read().split('\\n')\n",
+ "\n",
+ "sentences_dev, words_dev, depends_dev, labels_dev, _, _ = process_corpus(dev)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 8,
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "1\tWhat\twhat\tPRON\tWP\tPronType=Int\t0\troot\t0:root\t_\n",
+ "invalid literal for int() with base 10: '_' ['24.1', 'left', 'left', 'VERB', 'VBN', 'Tense=Past|VerbForm=Part', '_', '_', '6:parataxis', 'CopyOf=6']\n"
+ ]
+ }
+ ],
+ "source": [
+ "with open('en_ewt-ud-test.conllu') as fopen:\n",
+ " test = fopen.read().split('\\n')\n",
+ "\n",
+ "sentences_test, words_test, depends_test, labels_test, _, _ = process_corpus(test)\n",
+ "sentences_test.extend(sentences_dev)\n",
+ "words_test.extend(words_dev)\n",
+ "depends_test.extend(depends_dev)\n",
+ "labels_test.extend(labels_dev)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 9,
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "1\tAl\tAl\tPROPN\tNNP\tNumber=Sing\t0\troot\t0:root\tSpaceAfter=No\n",
+ "invalid literal for int() with base 10: '_' ['8.1', 'reported', 'report', 'VERB', 'VBN', 'Tense=Past|VerbForm=Part|Voice=Pass', '_', '_', '5:conj:and', 'CopyOf=5']\n",
+ "invalid literal for int() with base 10: '_' ['22.1', 'used', 'use', 'VERB', 'VBN', 'Tense=Past|VerbForm=Part', '_', '_', '13:advcl:with|17:conj:and', 'CopyOf=17']\n",
+ "invalid literal for int() with base 10: '_' ['22.1', 'used', 'use', 'VERB', 'VBN', 'Tense=Past|VerbForm=Part', '_', '_', '13:advcl:with|17:conj:and', 'CopyOf=17']\n",
+ "invalid literal for int() with base 10: '_' ['11.1', 'called', 'call', 'VERB', 'VBN', 'Tense=Past|VerbForm=Part|Voice=Pass', '_', '_', '3:conj:and', 'CopyOf=3']\n",
+ "invalid literal for int() with base 10: '_' ['14.1', 'is', 'be', 'VERB', 'VBZ', '_', '_', '_', '1:conj:and', 'CopyOf=1']\n",
+ "invalid literal for int() with base 10: '_' ['20.1', 'reflect', 'reflect', 'VERB', 'VBP', 'Mood=Ind|Tense=Pres|VerbForm=Fin', '_', '_', '7:acl:relcl|9:conj', 'CopyOf=9']\n",
+ "invalid literal for int() with base 10: '_' ['21.1', 'recruited', 'recruit', 'VERB', 'VBD', 'Mood=Ind|Tense=Past|VerbForm=Fin', '_', '_', '9:conj:and', 'CopyOf=9']\n",
+ "invalid literal for int() with base 10: '_' ['9.1', 'wish', 'wish', 'VERB', 'VBP', 'Mood=Ind|Tense=Pres|VerbForm=Fin', '_', '_', '2:conj:and', 'CopyOf=2']\n",
+ "invalid literal for int() with base 10: '_' ['38.1', 'supplied', 'supply', 'VERB', 'VBN', 'Tense=Past|VerbForm=Part|Voice=Pass', '_', '_', '16:conj:and', 'CopyOf=16']\n",
+ "invalid literal for int() with base 10: '_' ['18.1', 'keep', 'keep', 'VERB', 'VB', 'Mood=Imp|VerbForm=Fin', '_', '_', '14:conj:and', 'CopyOf=14']\n",
+ "invalid literal for int() with base 10: '_' ['21.1', 'keep', 'keep', 'VERB', 'VB', 'Mood=Imp|VerbForm=Fin', '_', '_', '14:conj:and', 'CopyOf=14']\n",
+ "invalid literal for int() with base 10: '_' ['18.1', 'mean', 'mean', 'VERB', 'VB', 'VerbForm=Inf', '_', '_', '8:conj', 'CopyOf=8']\n",
+ "invalid literal for int() with base 10: '_' ['30.1', 'play', 'play', 'VERB', 'VBP', 'Mood=Ind|Tense=Pres|VerbForm=Fin', '_', '_', '18:acl:relcl|27:conj:but', 'CopyOf=27']\n",
+ "invalid literal for int() with base 10: '_' ['22.1', 'have', 'have', 'VERB', 'VBP', 'Mood=Ind|Tense=Pres|VerbForm=Fin', '_', '_', '17:conj', 'CopyOf=17']\n",
+ "invalid literal for int() with base 10: '_' ['27.1', 'have', 'have', 'VERB', 'VBP', 'Mood=Ind|Tense=Pres|VerbForm=Fin', '_', '_', '17:conj', 'CopyOf=17']\n",
+ "invalid literal for int() with base 10: '_' ['49.1', 'helped', 'help', 'VERB', 'VBD', '_', '_', '_', '38:conj:but', 'CopyOf=38']\n",
+ "invalid literal for int() with base 10: '_' ['7.1', 'found', 'find', 'VERB', 'VBD', 'Mood=Ind|Tense=Past|VerbForm=Fin', '_', '_', '3:conj', 'CopyOf=3']\n",
+ "invalid literal for int() with base 10: '_' ['10.1', 'excited', 'excited', 'ADJ', 'JJ', 'Degree=Pos', '_', '_', '4:advcl', 'CopyOf=4']\n",
+ "invalid literal for int() with base 10: '_' ['15.1', \"'s\", 'be', 'VERB', 'VBZ', '_', '_', '_', '2:conj:and', 'CopyOf=2']\n",
+ "invalid literal for int() with base 10: '_' ['25.1', 'took', 'take', 'VERB', 'VBD', 'Mood=Ind|Tense=Past|VerbForm=Fin', '_', '_', '17:conj:and', 'CopyOf=17']\n",
+ "invalid literal for int() with base 10: '_' ['10.1', 'loss', 'lose', 'VERB', 'VBD', 'Mood=Ind|Tense=Past|VerbForm=Fin', '_', '_', '3:conj:and', 'CopyOf=3']\n",
+ "invalid literal for int() with base 10: '_' ['11.1', 'leave', 'leave', 'VERB', 'VB', 'VerbForm=Inf', '_', '_', '7:parataxis', 'CopyOf=7']\n",
+ "invalid literal for int() with base 10: '_' ['24.1', 'charge', 'charge', 'VERB', 'VBP', 'Mood=Ind|Tense=Pres|VerbForm=Fin', '_', '_', '16:conj:and', 'CopyOf=16']\n"
+ ]
+ }
+ ],
+ "source": [
+ "with open('en_ewt-ud-train.conllu') as fopen:\n",
+ " train = fopen.read().split('\\n')\n",
+ "\n",
+ "sentences_train, words_train, depends_train, labels_train, _, _ = process_corpus(train)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 10,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "(12000, 3824)"
+ ]
+ },
+ "execution_count": 10,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "len(sentences_train), len(sentences_test)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 11,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "21974"
+ ]
+ },
+ "execution_count": 11,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "idx2word = {v:k for k, v in word2idx.items()}\n",
+ "idx2tag = {v:k for k, v in tag2idx.items()}\n",
+ "len(idx2word)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 12,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "def generate_char_seq(batch, UNK = 2):\n",
+ " maxlen_c = max([len(k) for k in batch])\n",
+ " x = [[len(i) for i in k] for k in batch]\n",
+ " maxlen = max([j for i in x for j in i])\n",
+ " temp = np.zeros((len(batch),maxlen_c,maxlen),dtype=np.int32)\n",
+ " for i in range(len(batch)):\n",
+ " for k in range(len(batch[i])):\n",
+ " for no, c in enumerate(batch[i][k]):\n",
+ " temp[i,k,-1-no] = char2idx.get(c, UNK)\n",
+ " return temp"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 13,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "train_X = words_train\n",
+ "train_Y = labels_train\n",
+ "train_depends = depends_train\n",
+ "train_char = sentences_train\n",
+ "\n",
+ "test_X = words_test\n",
+ "test_Y = labels_test\n",
+ "test_depends = depends_test\n",
+ "test_char = sentences_test"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 14,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "class BiAAttention:\n",
+ " def __init__(self, input_size_encoder, input_size_decoder, num_labels):\n",
+ " self.input_size_encoder = input_size_encoder\n",
+ " self.input_size_decoder = input_size_decoder\n",
+ " self.num_labels = num_labels\n",
+ " \n",
+ " self.W_d = tf.get_variable(\"W_d\", shape=[self.num_labels, self.input_size_decoder],\n",
+ " initializer=tf.contrib.layers.xavier_initializer())\n",
+ " self.W_e = tf.get_variable(\"W_e\", shape=[self.num_labels, self.input_size_encoder],\n",
+ " initializer=tf.contrib.layers.xavier_initializer())\n",
+ " self.U = tf.get_variable(\"U\", shape=[self.num_labels, self.input_size_decoder, self.input_size_encoder],\n",
+ " initializer=tf.contrib.layers.xavier_initializer())\n",
+ " \n",
+ " def forward(self, input_d, input_e, mask_d=None, mask_e=None):\n",
+ " batch = tf.shape(input_d)[0]\n",
+ " length_decoder = tf.shape(input_d)[1]\n",
+ " length_encoder = tf.shape(input_e)[1]\n",
+ " out_d = tf.expand_dims(tf.matmul(self.W_d, tf.transpose(input_d, [0, 2, 1])), 3)\n",
+ " out_e = tf.expand_dims(tf.matmul(self.W_e, tf.transpose(input_e, [0, 2, 1])), 2)\n",
+ " output = tf.matmul(tf.expand_dims(input_d, 1), self.U)\n",
+ " output = tf.matmul(output, tf.transpose(tf.expand_dims(input_e, 1), [0, 1, 3, 2]))\n",
+ " \n",
+ " output = output + out_d + out_e\n",
+ " \n",
+ " if mask_d is not None:\n",
+ " d = tf.expand_dims(tf.expand_dims(mask_d, 1), 3)\n",
+ " e = tf.expand_dims(tf.expand_dims(mask_e, 1), 2)\n",
+ " output = output * d * e\n",
+ " \n",
+ " return output\n",
+ " \n",
+ "class BiLinear:\n",
+ " def __init__(self, left_features, right_features, out_features):\n",
+ " self.left_features = left_features\n",
+ " self.right_features = right_features\n",
+ " self.out_features = out_features\n",
+ " \n",
+ " self.U = tf.get_variable(\"U-bi\", shape=[out_features, left_features, right_features],\n",
+ " initializer=tf.contrib.layers.xavier_initializer())\n",
+ " self.W_l = tf.get_variable(\"Wl\", shape=[out_features, left_features],\n",
+ " initializer=tf.contrib.layers.xavier_initializer())\n",
+ " self.W_r = tf.get_variable(\"Wr\", shape=[out_features, right_features],\n",
+ " initializer=tf.contrib.layers.xavier_initializer())\n",
+ " \n",
+ " def forward(self, input_left, input_right):\n",
+ " left_size = tf.shape(input_left)\n",
+ " output_shape = tf.concat([left_size[:-1], [self.out_features]], axis = 0)\n",
+ " batch = tf.cast(tf.reduce_prod(left_size[:-1]), tf.int32)\n",
+ " input_left = tf.reshape(input_left, (batch, self.left_features))\n",
+ " input_right = tf.reshape(input_right, (batch, self.right_features))\n",
+ " tiled = tf.tile(tf.expand_dims(input_left, axis = 0), (self.out_features,1,1))\n",
+ " output = tf.transpose(tf.reduce_sum(tf.matmul(tiled, self.U), axis = 2))\n",
+ " output = output + tf.matmul(input_left, tf.transpose(self.W_l))\\\n",
+ " + tf.matmul(input_right, tf.transpose(self.W_r))\n",
+ " \n",
+ " return tf.reshape(output, output_shape)\n",
+ "\n",
+ "class Attention:\n",
+ " def __init__(self, word_dim, num_words, char_dim, num_chars, num_filters, kernel_size,\n",
+ " hidden_size, encoder_layers, num_labels, arc_space, type_space):\n",
+ " \n",
+ " def cells(size, reuse=False):\n",
+ " return tf.nn.rnn_cell.LSTMCell(size,\n",
+ " initializer=tf.orthogonal_initializer(),reuse=reuse)\n",
+ " \n",
+ " self.word_embedd = tf.Variable(tf.random_uniform([num_words, word_dim], -1, 1))\n",
+ " self.char_embedd = tf.Variable(tf.random_uniform([num_chars, char_dim], -1, 1))\n",
+ " self.conv1d = tf.layers.Conv1D(num_filters, kernel_size, 1, padding='VALID')\n",
+ " self.num_labels = num_labels\n",
+ " self.encoder = tf.nn.rnn_cell.MultiRNNCell([cells(hidden_size) for _ in range(encoder_layers)])\n",
+ "\n",
+ " \n",
+ " \n",
+ " def encode(self, input_word, input_char):\n",
+ " word = tf.nn.embedding_lookup(self.word_embedd, input_word)\n",
+ " char = tf.nn.embedding_lookup(self.char_embedd, input_char)\n",
+ " b = tf.shape(char)[0]\n",
+ " wl = tf.shape(char)[1]\n",
+ " cl = tf.shape(char)[2]\n",
+ " d = char.shape[3]\n",
+ " char = tf.reshape(char, [b * wl, cl, d])\n",
+ " char = tf.reduce_max(self.conv1d(char), axis = 1)\n",
+ " char = tf.nn.tanh(char)\n",
+ " d = char.shape[-1]\n",
+ " char = tf.reshape(char, [b, wl, d])\n",
+ " \n",
+ " src_encoding = tf.concat([word, char], axis=2)\n",
+ " output, hn = tf.nn.dynamic_rnn(self.encoder, src_encoding, dtype = tf.float32,\n",
+ " scope = 'encoder')\n",
+ " arc_h = tf.nn.elu(self.arc_h(output))\n",
+ " arc_c = tf.nn.elu(self.arc_c(output))\n",
+ " \n",
+ " type_h = tf.nn.elu(self.type_h(output))\n",
+ " type_c = tf.nn.elu(self.type_c(output))\n",
+ " \n",
+ " return (arc_h, arc_c), (type_h, type_c), hn\n",
+ " \n",
+ " def forward(self, input_word, input_char, mask):\n",
+ " arcs, types, _ = self.encode(input_word, input_char)\n",
+ " \n",
+ " out_arc = tf.squeeze(self.attention.forward(arcs[0], arcs[1], mask_d=mask, mask_e=mask), axis = 1)\n",
+ " return out_arc, types, mask\n",
+ " \n",
+ " def loss(self, input_word, input_char, mask, heads, types):\n",
+ " out_arc, out_type, _ = self.forward(input_word, input_char, mask)\n",
+ " type_h, type_c = out_type\n",
+ " batch = tf.shape(out_arc)[0]\n",
+ " max_len = tf.shape(out_arc)[1]\n",
+ " batch_index = tf.range(0, batch)\n",
+ " t = tf.transpose(heads)\n",
+ " broadcasted = tf.broadcast_to(batch_index, tf.shape(t))\n",
+ " concatenated = tf.transpose(tf.concat([tf.expand_dims(broadcasted, axis = 0), \n",
+ " tf.expand_dims(t, axis = 0)], axis = 0))\n",
+ " type_h = tf.gather_nd(type_h, concatenated)\n",
+ " out_type = self.bilinear.forward(type_h, type_c)\n",
+ " minus_inf = -1e8\n",
+ " minus_mask = (1 - mask) * minus_inf\n",
+ " out_arc = out_arc + tf.expand_dims(minus_mask, axis = 2) + tf.expand_dims(minus_mask, axis = 1)\n",
+ " loss_arc = tf.nn.log_softmax(out_arc, dim=1)\n",
+ " loss_type = tf.nn.log_softmax(out_type, dim=2)\n",
+ " loss_arc = loss_arc * tf.expand_dims(mask, axis = 2) * tf.expand_dims(mask, axis = 1)\n",
+ " loss_type = loss_type * tf.expand_dims(mask, axis = 2)\n",
+ " num = tf.reduce_sum(mask) - tf.cast(batch, tf.float32)\n",
+ " child_index = tf.tile(tf.expand_dims(tf.range(0, max_len), 1), [1, batch])\n",
+ " t = tf.transpose(heads)\n",
+ " broadcasted = tf.broadcast_to(batch_index, tf.shape(t))\n",
+ " concatenated = tf.transpose(tf.concat([tf.expand_dims(broadcasted, axis = 0),\n",
+ " tf.expand_dims(t, axis = 0),\n",
+ " tf.expand_dims(child_index, axis = 0)], axis = 0))\n",
+ " loss_arc = tf.gather_nd(loss_arc, concatenated)\n",
+ " loss_arc = tf.transpose(loss_arc, [1, 0])\n",
+ " \n",
+ " t = tf.transpose(types)\n",
+ " broadcasted = tf.broadcast_to(batch_index, tf.shape(t))\n",
+ " concatenated = tf.transpose(tf.concat([tf.expand_dims(broadcasted, axis = 0),\n",
+ " tf.expand_dims(child_index, axis = 0),\n",
+ " tf.expand_dims(t, axis = 0)], axis = 0))\n",
+ " loss_type = tf.gather_nd(loss_type, concatenated)\n",
+ " loss_type = tf.transpose(loss_type, [1, 0])\n",
+ " return tf.reduce_sum(-loss_arc) / num, tf.reduce_sum(-loss_type) / num\n",
+ " \n",
+ " def decode(self, input_word, input_char, mask, leading_symbolic=0):\n",
+ " out_arc, out_type, _ = self.forward(input_word, input_char, mask)\n",
+ " batch = tf.shape(out_arc)[0]\n",
+ " max_len = tf.shape(out_arc)[1]\n",
+ " sec_max_len = tf.shape(out_arc)[2]\n",
+ " out_arc = out_arc + tf.linalg.diag(tf.fill([max_len], -np.inf))\n",
+ " minus_mask = tf.expand_dims(tf.cast(1 - mask, tf.bool), axis = 2)\n",
+ " minus_mask = tf.tile(minus_mask, [1, 1, sec_max_len])\n",
+ " out_arc = tf.where(minus_mask, tf.fill(tf.shape(out_arc), -np.inf), out_arc)\n",
+ " heads = tf.argmax(out_arc, axis = 1)\n",
+ " type_h, type_c = out_type\n",
+ " batch = tf.shape(type_h)[0]\n",
+ " max_len = tf.shape(type_h)[1]\n",
+ " batch_index = tf.range(0, batch)\n",
+ " t = tf.cast(tf.transpose(heads), tf.int32)\n",
+ " broadcasted = tf.broadcast_to(batch_index, tf.shape(t))\n",
+ " concatenated = tf.transpose(tf.concat([tf.expand_dims(broadcasted, axis = 0), \n",
+ " tf.expand_dims(t, axis = 0)], axis = 0))\n",
+ " type_h = tf.gather_nd(type_h, concatenated)\n",
+ " out_type = self.bilinear.forward(type_h, type_c)\n",
+ " out_type = out_type[:, :, leading_symbolic:]\n",
+ " types = tf.argmax(out_type, axis = 2)\n",
+ " return heads, types\n",
+ " \n",
+ "class Model:\n",
+ " def __init__(\n",
+ " self, \n",
+ " dim_word,\n",
+ " dim_char,\n",
+ " dropout,\n",
+ " learning_rate,\n",
+ " hidden_size_char,\n",
+ " hidden_size_word,\n",
+ " num_layers,\n",
+ " cov = 0.0):\n",
+ " \n",
+ " def cells(size, reuse = False):\n",
+ " return tf.contrib.rnn.DropoutWrapper(\n",
+ " tf.nn.rnn_cell.LSTMCell(\n",
+ " size,\n",
+ " initializer = tf.orthogonal_initializer(),\n",
+ " reuse = reuse,\n",
+ " ),\n",
+ " output_keep_prob = dropout,\n",
+ " )\n",
+ " \n",
+ " self.words = tf.placeholder(tf.int32, (None, None))\n",
+ " self.chars = tf.placeholder(tf.int32, (None, None, None))\n",
+ " self.heads = tf.placeholder(tf.int32, (None, None))\n",
+ " self.types = tf.placeholder(tf.int32, (None, None))\n",
+ " self.mask = tf.cast(tf.math.not_equal(self.words, 0), tf.float32)\n",
+ " self.maxlen = tf.shape(self.words)[1]\n",
+ " self.lengths = tf.count_nonzero(self.words, 1)\n",
+ " mask = self.mask\n",
+ " heads = self.heads\n",
+ " types = self.types\n",
+ " \n",
+ " self.arc_h = tf.layers.Dense(hidden_size_word)\n",
+ " self.arc_c = tf.layers.Dense(hidden_size_word)\n",
+ " self.attention = BiAAttention(hidden_size_word, hidden_size_word, 1)\n",
+ "\n",
+ " self.type_h = tf.layers.Dense(hidden_size_word)\n",
+ " self.type_c = tf.layers.Dense(hidden_size_word)\n",
+ " self.bilinear = BiLinear(hidden_size_word, hidden_size_word, len(tag2idx))\n",
+ " \n",
+ " self.word_embeddings = tf.Variable(\n",
+ " tf.truncated_normal(\n",
+ " [len(word2idx), dim_word], stddev = 1.0 / np.sqrt(dim_word)\n",
+ " )\n",
+ " )\n",
+ " self.char_embeddings = tf.Variable(\n",
+ " tf.truncated_normal(\n",
+ " [len(char2idx), dim_char], stddev = 1.0 / np.sqrt(dim_char)\n",
+ " )\n",
+ " )\n",
+ "\n",
+ " word_embedded = tf.nn.embedding_lookup(\n",
+ " self.word_embeddings, self.words\n",
+ " )\n",
+ " char_embedded = tf.nn.embedding_lookup(\n",
+ " self.char_embeddings, self.chars\n",
+ " )\n",
+ " s = tf.shape(char_embedded)\n",
+ " char_embedded = tf.reshape(\n",
+ " char_embedded, shape = [s[0] * s[1], s[-2], dim_char]\n",
+ " )\n",
+ "\n",
+ " for n in range(num_layers):\n",
+ " (out_fw, out_bw), (\n",
+ " state_fw,\n",
+ " state_bw,\n",
+ " ) = tf.nn.bidirectional_dynamic_rnn(\n",
+ " cell_fw = cells(hidden_size_char),\n",
+ " cell_bw = cells(hidden_size_char),\n",
+ " inputs = char_embedded,\n",
+ " dtype = tf.float32,\n",
+ " scope = 'bidirectional_rnn_char_%d' % (n),\n",
+ " )\n",
+ " char_embedded = tf.concat((out_fw, out_bw), 2)\n",
+ " output = tf.reshape(\n",
+ " char_embedded[:, -1], shape = [s[0], s[1], 2 * hidden_size_char]\n",
+ " )\n",
+ " word_embedded = tf.concat([word_embedded, output], axis = -1)\n",
+ "\n",
+ " for n in range(num_layers):\n",
+ " (out_fw, out_bw), (\n",
+ " state_fw,\n",
+ " state_bw,\n",
+ " ) = tf.nn.bidirectional_dynamic_rnn(\n",
+ " cell_fw = cells(hidden_size_word),\n",
+ " cell_bw = cells(hidden_size_word),\n",
+ " inputs = word_embedded,\n",
+ " dtype = tf.float32,\n",
+ " scope = 'bidirectional_rnn_word_%d' % (n),\n",
+ " )\n",
+ " word_embedded = tf.concat((out_fw, out_bw), 2)\n",
+ " \n",
+ " \n",
+ " arc_h = tf.nn.elu(self.arc_h(word_embedded))\n",
+ " arc_c = tf.nn.elu(self.arc_c(word_embedded))\n",
+ " \n",
+ " type_h = tf.nn.elu(self.type_h(word_embedded))\n",
+ " type_c = tf.nn.elu(self.type_c(word_embedded))\n",
+ " \n",
+ " out_arc = tf.squeeze(self.attention.forward(arc_h, arc_h, mask_d=self.mask, \n",
+ " mask_e=self.mask), axis = 1)\n",
+ " \n",
+ " batch = tf.shape(out_arc)[0]\n",
+ " max_len = tf.shape(out_arc)[1]\n",
+ " sec_max_len = tf.shape(out_arc)[2]\n",
+ " batch_index = tf.range(0, batch)\n",
+ " \n",
+ " decode_arc = out_arc + tf.linalg.diag(tf.fill([max_len], -np.inf))\n",
+ " minus_mask = tf.expand_dims(tf.cast(1 - mask, tf.bool), axis = 2)\n",
+ " minus_mask = tf.tile(minus_mask, [1, 1, sec_max_len])\n",
+ " decode_arc = tf.where(minus_mask, tf.fill(tf.shape(decode_arc), -np.inf), decode_arc)\n",
+ " self.heads_seq = tf.argmax(decode_arc, axis = 1)\n",
+ " \n",
+ " t = tf.cast(tf.transpose(self.heads_seq), tf.int32)\n",
+ " broadcasted = tf.broadcast_to(batch_index, tf.shape(t))\n",
+ " concatenated = tf.transpose(tf.concat([tf.expand_dims(broadcasted, axis = 0), \n",
+ " tf.expand_dims(t, axis = 0)], axis = 0))\n",
+ " type_h = tf.gather_nd(type_h, concatenated)\n",
+ " out_type = self.bilinear.forward(type_h, type_c)\n",
+ " self.tags_seq = tf.argmax(out_type, axis = 2)\n",
+ " \n",
+ " batch = tf.shape(out_arc)[0]\n",
+ " max_len = tf.shape(out_arc)[1]\n",
+ " batch_index = tf.range(0, batch)\n",
+ " t = tf.transpose(heads)\n",
+ " broadcasted = tf.broadcast_to(batch_index, tf.shape(t))\n",
+ " concatenated = tf.transpose(tf.concat([tf.expand_dims(broadcasted, axis = 0), \n",
+ " tf.expand_dims(t, axis = 0)], axis = 0))\n",
+ " type_h = tf.gather_nd(type_h, concatenated)\n",
+ " out_type = self.bilinear.forward(type_h, type_c)\n",
+ " minus_inf = -1e8\n",
+ " minus_mask = (1 - mask) * minus_inf\n",
+ " out_arc = out_arc + tf.expand_dims(minus_mask, axis = 2) + tf.expand_dims(minus_mask, axis = 1)\n",
+ " loss_arc = tf.nn.log_softmax(out_arc, dim=1)\n",
+ " loss_type = tf.nn.log_softmax(out_type, dim=2)\n",
+ " loss_arc = loss_arc * tf.expand_dims(mask, axis = 2) * tf.expand_dims(mask, axis = 1)\n",
+ " loss_type = loss_type * tf.expand_dims(mask, axis = 2)\n",
+ " num = tf.reduce_sum(mask) - tf.cast(batch, tf.float32)\n",
+ " child_index = tf.tile(tf.expand_dims(tf.range(0, max_len), 1), [1, batch])\n",
+ " t = tf.transpose(heads)\n",
+ " broadcasted = tf.broadcast_to(batch_index, tf.shape(t))\n",
+ " concatenated = tf.transpose(tf.concat([tf.expand_dims(broadcasted, axis = 0),\n",
+ " tf.expand_dims(t, axis = 0),\n",
+ " tf.expand_dims(child_index, axis = 0)], axis = 0))\n",
+ " loss_arc = tf.gather_nd(loss_arc, concatenated)\n",
+ " loss_arc = tf.transpose(loss_arc, [1, 0])\n",
+ " \n",
+ " t = tf.transpose(types)\n",
+ " broadcasted = tf.broadcast_to(batch_index, tf.shape(t))\n",
+ " concatenated = tf.transpose(tf.concat([tf.expand_dims(broadcasted, axis = 0),\n",
+ " tf.expand_dims(child_index, axis = 0),\n",
+ " tf.expand_dims(t, axis = 0)], axis = 0))\n",
+ " loss_type = tf.gather_nd(loss_type, concatenated)\n",
+ " loss_type = tf.transpose(loss_type, [1, 0])\n",
+ " self.cost = (tf.reduce_sum(-loss_arc) / num) + (tf.reduce_sum(-loss_type) / num)\n",
+ " self.optimizer = tf.train.AdamOptimizer(learning_rate = learning_rate).minimize(self.cost)\n",
+ " \n",
+ " mask = tf.sequence_mask(self.lengths, maxlen = self.maxlen)\n",
+ " \n",
+ " self.prediction = tf.boolean_mask(self.tags_seq, mask)\n",
+ " mask_label = tf.boolean_mask(self.types, mask)\n",
+ " correct_pred = tf.equal(tf.cast(self.prediction, tf.int32), mask_label)\n",
+ " correct_index = tf.cast(correct_pred, tf.float32)\n",
+ " self.accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32))\n",
+ " \n",
+ " self.prediction = tf.cast(tf.boolean_mask(self.heads_seq, mask), tf.int32)\n",
+ " mask_label = tf.boolean_mask(self.heads, mask)\n",
+ " correct_pred = tf.equal(self.prediction, mask_label)\n",
+ " correct_index = tf.cast(correct_pred, tf.float32)\n",
+ " self.accuracy_depends = tf.reduce_mean(tf.cast(correct_pred, tf.float32))"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 15,
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/tensorflow/python/util/deprecation.py:507: calling count_nonzero (from tensorflow.python.ops.math_ops) with axis is deprecated and will be removed in a future version.\n",
+ "Instructions for updating:\n",
+ "reduction_indices is deprecated, use axis instead\n",
+ "WARNING:tensorflow:From :183: LSTMCell.__init__ (from tensorflow.python.ops.rnn_cell_impl) is deprecated and will be removed in a future version.\n",
+ "Instructions for updating:\n",
+ "This class is equivalent as tf.keras.layers.LSTMCell, and will be replaced by that in Tensorflow 2.0.\n",
+ "WARNING:tensorflow:From :238: bidirectional_dynamic_rnn (from tensorflow.python.ops.rnn) is deprecated and will be removed in a future version.\n",
+ "Instructions for updating:\n",
+ "Please use `keras.layers.Bidirectional(keras.layers.RNN(cell))`, which is equivalent to this API\n",
+ "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/tensorflow/python/ops/rnn.py:464: dynamic_rnn (from tensorflow.python.ops.rnn) is deprecated and will be removed in a future version.\n",
+ "Instructions for updating:\n",
+ "Please use `keras.layers.RNN(cell)`, which is equivalent to this API\n",
+ "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/tensorflow/python/ops/rnn_cell_impl.py:961: calling Zeros.__init__ (from tensorflow.python.ops.init_ops) with dtype is deprecated and will be removed in a future version.\n",
+ "Instructions for updating:\n",
+ "Call initializer instance with the dtype argument instead of passing it to the constructor\n",
+ "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/tensorflow/python/ops/init_ops.py:1251: calling VarianceScaling.__init__ (from tensorflow.python.ops.init_ops) with dtype is deprecated and will be removed in a future version.\n",
+ "Instructions for updating:\n",
+ "Call initializer instance with the dtype argument instead of passing it to the constructor\n",
+ "WARNING:tensorflow:From :277: add_dispatch_support..wrapper (from tensorflow.python.ops.array_ops) is deprecated and will be removed in a future version.\n",
+ "Instructions for updating:\n",
+ "Use tf.where in 2.0, which has the same broadcast rule as np.where\n",
+ "WARNING:tensorflow:From :300: calling log_softmax (from tensorflow.python.ops.nn_ops) with dim is deprecated and will be removed in a future version.\n",
+ "Instructions for updating:\n",
+ "dim is deprecated, use axis instead\n"
+ ]
+ }
+ ],
+ "source": [
+ "tf.reset_default_graph()\n",
+ "sess = tf.InteractiveSession()\n",
+ "\n",
+ "dim_word = 128\n",
+ "dim_char = 256\n",
+ "dropout = 1.0\n",
+ "learning_rate = 1e-3\n",
+ "hidden_size_char = 128\n",
+ "hidden_size_word = 128\n",
+ "num_layers = 2\n",
+ "\n",
+ "model = Model(dim_word,dim_char,dropout,learning_rate,hidden_size_char,hidden_size_word,num_layers)\n",
+ "sess.run(tf.global_variables_initializer())"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 16,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "batch_x = train_X[:5]\n",
+ "batch_x = pad_sequences(batch_x,padding='post')\n",
+ "batch_char = train_char[:5]\n",
+ "batch_char = generate_char_seq(batch_char)\n",
+ "batch_y = train_Y[:5]\n",
+ "batch_y = pad_sequences(batch_y,padding='post')\n",
+ "batch_depends = train_depends[:5]\n",
+ "batch_depends = pad_sequences(batch_depends,padding='post')"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 17,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "[0.0, 0.05172414, 7.4798884]"
+ ]
+ },
+ "execution_count": 17,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "sess.run([model.accuracy, model.accuracy_depends, model.cost],\n",
+ " feed_dict = {model.words: batch_x,\n",
+ " model.chars: batch_char,\n",
+ " model.types: batch_y,\n",
+ " model.heads: batch_depends})"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 18,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "(array([19, 19, 19, 23, 23, 19, 19, 19, 19, 23, 23, 23, 23, 23, 23, 23, 23,\n",
+ " 23, 23, 23, 17, 17, 17, 17, 35, 35, 35, 43, 43, 43, 43, 43, 43, 43,\n",
+ " 35, 35]),\n",
+ " array([2, 2, 0, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,\n",
+ " 2, 2, 2, 2, 2, 2, 2, 0, 0, 0, 0, 0, 0, 0]),\n",
+ " array([ 0, 1, 1, 1, 6, 7, 1, 7, 8, 8, 8, 8, 8, 15, 8, 18, 18,\n",
+ " 7, 21, 21, 18, 23, 21, 21, 28, 28, 28, 21, 1, 0, 0, 0, 0, 0,\n",
+ " 0, 0], dtype=int32))"
+ ]
+ },
+ "execution_count": 18,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "tags_seq, heads = sess.run(\n",
+ " [model.tags_seq, model.heads_seq],\n",
+ " feed_dict = {\n",
+ " model.words: batch_x,\n",
+ " model.chars: batch_char\n",
+ " },\n",
+ ")\n",
+ "tags_seq[0], heads[0], batch_depends[0]"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 19,
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stderr",
+ "output_type": "stream",
+ "text": [
+ "train minibatch loop: 100%|██████████| 375/375 [01:07<00:00, 5.53it/s, accuracy=0.675, accuracy_depends=0.664, cost=2] \n",
+ "test minibatch loop: 100%|██████████| 120/120 [00:08<00:00, 14.80it/s, accuracy=0.688, accuracy_depends=0.66, cost=1.91] \n",
+ "train minibatch loop: 0%| | 1/375 [00:00<01:09, 5.39it/s, accuracy=0.641, accuracy_depends=0.606, cost=2.29]"
+ ]
+ },
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "epoch: 0, training loss: 4.134478, training acc: 0.409135, training depends: 0.386963, valid loss: 2.286293, valid acc: 0.647242, valid depends: 0.598646\n",
+ "\n"
+ ]
+ },
+ {
+ "name": "stderr",
+ "output_type": "stream",
+ "text": [
+ "train minibatch loop: 100%|██████████| 375/375 [01:07<00:00, 5.57it/s, accuracy=0.782, accuracy_depends=0.767, cost=1.27] \n",
+ "test minibatch loop: 100%|██████████| 120/120 [00:08<00:00, 14.59it/s, accuracy=0.789, accuracy_depends=0.761, cost=1.13]\n",
+ "train minibatch loop: 0%| | 1/375 [00:00<01:11, 5.20it/s, accuracy=0.752, accuracy_depends=0.759, cost=1.31]"
+ ]
+ },
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "epoch: 1, training loss: 1.745138, training acc: 0.708187, training depends: 0.671352, valid loss: 1.594311, valid acc: 0.737908, valid depends: 0.680541\n",
+ "\n"
+ ]
+ },
+ {
+ "name": "stderr",
+ "output_type": "stream",
+ "text": [
+ "train minibatch loop: 100%|██████████| 375/375 [01:07<00:00, 5.59it/s, accuracy=0.796, accuracy_depends=0.802, cost=1.02] \n",
+ "test minibatch loop: 100%|██████████| 120/120 [00:08<00:00, 14.77it/s, accuracy=0.818, accuracy_depends=0.806, cost=0.888]\n",
+ "train minibatch loop: 0%| | 0/375 [00:00, ?it/s]"
+ ]
+ },
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "epoch: 2, training loss: 1.160756, training acc: 0.782296, training depends: 0.746494, valid loss: 1.441877, valid acc: 0.755935, valid depends: 0.702582\n",
+ "\n"
+ ]
+ },
+ {
+ "name": "stderr",
+ "output_type": "stream",
+ "text": [
+ "train minibatch loop: 100%|██████████| 375/375 [01:07<00:00, 5.55it/s, accuracy=0.84, accuracy_depends=0.818, cost=0.81] \n",
+ "test minibatch loop: 100%|██████████| 120/120 [00:08<00:00, 14.75it/s, accuracy=0.834, accuracy_depends=0.814, cost=0.777]\n",
+ "train minibatch loop: 0%| | 1/375 [00:00<01:11, 5.27it/s, accuracy=0.842, accuracy_depends=0.842, cost=0.637]"
+ ]
+ },
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "epoch: 3, training loss: 0.858341, training acc: 0.820096, training depends: 0.786564, valid loss: 1.435897, valid acc: 0.776457, valid depends: 0.710300\n",
+ "\n"
+ ]
+ },
+ {
+ "name": "stderr",
+ "output_type": "stream",
+ "text": [
+ "train minibatch loop: 100%|██████████| 375/375 [01:06<00:00, 5.60it/s, accuracy=0.84, accuracy_depends=0.844, cost=0.666] \n",
+ "test minibatch loop: 100%|██████████| 120/120 [00:07<00:00, 15.36it/s, accuracy=0.85, accuracy_depends=0.814, cost=0.692]\n",
+ "train minibatch loop: 0%| | 1/375 [00:00<01:08, 5.45it/s, accuracy=0.872, accuracy_depends=0.877, cost=0.448]"
+ ]
+ },
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "epoch: 4, training loss: 0.668751, training acc: 0.842080, training depends: 0.813836, valid loss: 1.501069, valid acc: 0.782813, valid depends: 0.712437\n",
+ "\n"
+ ]
+ },
+ {
+ "name": "stderr",
+ "output_type": "stream",
+ "text": [
+ "train minibatch loop: 100%|██████████| 375/375 [01:07<00:00, 5.59it/s, accuracy=0.846, accuracy_depends=0.854, cost=0.568]\n",
+ "test minibatch loop: 100%|██████████| 120/120 [00:08<00:00, 14.78it/s, accuracy=0.834, accuracy_depends=0.818, cost=0.709]\n",
+ "train minibatch loop: 0%| | 1/375 [00:00<01:07, 5.52it/s, accuracy=0.877, accuracy_depends=0.879, cost=0.351]"
+ ]
+ },
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "epoch: 5, training loss: 0.542109, training acc: 0.860690, training depends: 0.831518, valid loss: 1.586633, valid acc: 0.784649, valid depends: 0.711521\n",
+ "\n"
+ ]
+ },
+ {
+ "name": "stderr",
+ "output_type": "stream",
+ "text": [
+ "train minibatch loop: 100%|██████████| 375/375 [01:06<00:00, 5.61it/s, accuracy=0.871, accuracy_depends=0.863, cost=0.493]\n",
+ "test minibatch loop: 100%|██████████| 120/120 [00:08<00:00, 14.64it/s, accuracy=0.842, accuracy_depends=0.826, cost=0.704]\n",
+ "train minibatch loop: 0%| | 1/375 [00:00<01:08, 5.43it/s, accuracy=0.888, accuracy_depends=0.891, cost=0.271]"
+ ]
+ },
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "epoch: 6, training loss: 0.442386, training acc: 0.872789, training depends: 0.845544, valid loss: 1.746875, valid acc: 0.800454, valid depends: 0.713102\n",
+ "\n"
+ ]
+ },
+ {
+ "name": "stderr",
+ "output_type": "stream",
+ "text": [
+ "train minibatch loop: 100%|██████████| 375/375 [01:07<00:00, 5.55it/s, accuracy=0.88, accuracy_depends=0.872, cost=0.452] \n",
+ "test minibatch loop: 100%|██████████| 120/120 [00:08<00:00, 14.75it/s, accuracy=0.846, accuracy_depends=0.81, cost=0.884]\n",
+ "train minibatch loop: 0%| | 1/375 [00:00<01:11, 5.25it/s, accuracy=0.912, accuracy_depends=0.916, cost=0.222]"
+ ]
+ },
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "epoch: 7, training loss: 0.372068, training acc: 0.886267, training depends: 0.856076, valid loss: 1.880266, valid acc: 0.801846, valid depends: 0.711724\n",
+ "\n"
+ ]
+ },
+ {
+ "name": "stderr",
+ "output_type": "stream",
+ "text": [
+ "train minibatch loop: 100%|██████████| 375/375 [01:07<00:00, 5.60it/s, accuracy=0.881, accuracy_depends=0.883, cost=0.341] \n",
+ "test minibatch loop: 100%|██████████| 120/120 [00:08<00:00, 14.50it/s, accuracy=0.862, accuracy_depends=0.83, cost=0.752]\n",
+ "train minibatch loop: 0%| | 1/375 [00:00<01:09, 5.40it/s, accuracy=0.912, accuracy_depends=0.909, cost=0.2]"
+ ]
+ },
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "epoch: 8, training loss: 0.306797, training acc: 0.896594, training depends: 0.864416, valid loss: 2.109596, valid acc: 0.804174, valid depends: 0.714363\n",
+ "\n"
+ ]
+ },
+ {
+ "name": "stderr",
+ "output_type": "stream",
+ "text": [
+ "train minibatch loop: 100%|██████████| 375/375 [01:06<00:00, 5.61it/s, accuracy=0.889, accuracy_depends=0.896, cost=0.268] \n",
+ "test minibatch loop: 100%|██████████| 120/120 [00:08<00:00, 14.85it/s, accuracy=0.842, accuracy_depends=0.838, cost=0.824]\n",
+ "train minibatch loop: 0%| | 1/375 [00:00<01:13, 5.12it/s, accuracy=0.921, accuracy_depends=0.906, cost=0.206]"
+ ]
+ },
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "epoch: 9, training loss: 0.248885, training acc: 0.905460, training depends: 0.872795, valid loss: 2.231252, valid acc: 0.811433, valid depends: 0.713417\n",
+ "\n"
+ ]
+ },
+ {
+ "name": "stderr",
+ "output_type": "stream",
+ "text": [
+ "train minibatch loop: 100%|██████████| 375/375 [01:07<00:00, 5.60it/s, accuracy=0.9, accuracy_depends=0.904, cost=0.236] \n",
+ "test minibatch loop: 100%|██████████| 120/120 [00:08<00:00, 14.58it/s, accuracy=0.85, accuracy_depends=0.826, cost=0.933]\n",
+ "train minibatch loop: 0%| | 1/375 [00:00<01:06, 5.60it/s, accuracy=0.919, accuracy_depends=0.912, cost=0.136]"
+ ]
+ },
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "epoch: 10, training loss: 0.209872, training acc: 0.910505, training depends: 0.877886, valid loss: 2.240191, valid acc: 0.811876, valid depends: 0.717278\n",
+ "\n"
+ ]
+ },
+ {
+ "name": "stderr",
+ "output_type": "stream",
+ "text": [
+ "train minibatch loop: 100%|██████████| 375/375 [01:06<00:00, 5.62it/s, accuracy=0.904, accuracy_depends=0.904, cost=0.177] \n",
+ "test minibatch loop: 100%|██████████| 120/120 [00:08<00:00, 14.63it/s, accuracy=0.846, accuracy_depends=0.814, cost=1.07]\n",
+ "train minibatch loop: 0%| | 1/375 [00:00<01:10, 5.33it/s, accuracy=0.927, accuracy_depends=0.927, cost=0.0913]"
+ ]
+ },
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "epoch: 11, training loss: 0.179435, training acc: 0.917127, training depends: 0.882472, valid loss: 2.472229, valid acc: 0.808497, valid depends: 0.717172\n",
+ "\n"
+ ]
+ },
+ {
+ "name": "stderr",
+ "output_type": "stream",
+ "text": [
+ "train minibatch loop: 100%|██████████| 375/375 [01:07<00:00, 5.57it/s, accuracy=0.909, accuracy_depends=0.908, cost=0.19] \n",
+ "test minibatch loop: 100%|██████████| 120/120 [00:08<00:00, 14.70it/s, accuracy=0.858, accuracy_depends=0.798, cost=1.54]\n",
+ "train minibatch loop: 0%| | 1/375 [00:00<01:12, 5.15it/s, accuracy=0.938, accuracy_depends=0.931, cost=0.0815]"
+ ]
+ },
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "epoch: 12, training loss: 0.146958, training acc: 0.924886, training depends: 0.886345, valid loss: 2.617582, valid acc: 0.816131, valid depends: 0.719541\n",
+ "\n"
+ ]
+ },
+ {
+ "name": "stderr",
+ "output_type": "stream",
+ "text": [
+ "train minibatch loop: 100%|██████████| 375/375 [01:07<00:00, 5.58it/s, accuracy=0.921, accuracy_depends=0.907, cost=0.151] \n",
+ "test minibatch loop: 100%|██████████| 120/120 [00:08<00:00, 14.43it/s, accuracy=0.879, accuracy_depends=0.806, cost=1.17]\n",
+ "train minibatch loop: 0%| | 1/375 [00:00<01:12, 5.17it/s, accuracy=0.954, accuracy_depends=0.924, cost=0.0877]"
+ ]
+ },
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "epoch: 13, training loss: 0.124930, training acc: 0.929599, training depends: 0.888695, valid loss: 2.686270, valid acc: 0.823643, valid depends: 0.720902\n",
+ "\n"
+ ]
+ },
+ {
+ "name": "stderr",
+ "output_type": "stream",
+ "text": [
+ "train minibatch loop: 100%|██████████| 375/375 [01:07<00:00, 5.56it/s, accuracy=0.921, accuracy_depends=0.925, cost=0.108] \n",
+ "test minibatch loop: 100%|██████████| 120/120 [00:07<00:00, 16.10it/s, accuracy=0.87, accuracy_depends=0.818, cost=1.4] "
+ ]
+ },
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "epoch: 14, training loss: 0.106369, training acc: 0.938695, training depends: 0.891401, valid loss: 2.815747, valid acc: 0.824535, valid depends: 0.724233\n",
+ "\n"
+ ]
+ },
+ {
+ "name": "stderr",
+ "output_type": "stream",
+ "text": [
+ "\n"
+ ]
+ }
+ ],
+ "source": [
+ "from tqdm import tqdm\n",
+ "\n",
+ "batch_size = 32\n",
+ "epoch = 15\n",
+ "\n",
+ "for e in range(epoch):\n",
+ " train_acc, train_loss = [], []\n",
+ " test_acc, test_loss = [], []\n",
+ " train_acc_depends, test_acc_depends = [], []\n",
+ " \n",
+ " pbar = tqdm(\n",
+ " range(0, len(train_X), batch_size), desc = 'train minibatch loop'\n",
+ " )\n",
+ " for i in pbar:\n",
+ " index = min(i + batch_size, len(train_X))\n",
+ " batch_x = train_X[i: index]\n",
+ " batch_x = pad_sequences(batch_x,padding='post')\n",
+ " batch_char = train_char[i: index]\n",
+ " batch_char = generate_char_seq(batch_char)\n",
+ " batch_y = train_Y[i: index]\n",
+ " batch_y = pad_sequences(batch_y,padding='post')\n",
+ " batch_depends = train_depends[i: index]\n",
+ " batch_depends = pad_sequences(batch_depends,padding='post')\n",
+ " \n",
+ " acc_depends, acc, cost, _ = sess.run(\n",
+ " [model.accuracy_depends, model.accuracy, model.cost, model.optimizer],\n",
+ " feed_dict = {\n",
+ " model.words: batch_x,\n",
+ " model.chars: batch_char,\n",
+ " model.types: batch_y,\n",
+ " model.heads: batch_depends\n",
+ " },\n",
+ " )\n",
+ " train_loss.append(cost)\n",
+ " train_acc.append(acc)\n",
+ " train_acc_depends.append(acc_depends)\n",
+ " pbar.set_postfix(cost = cost, accuracy = acc, accuracy_depends = acc_depends)\n",
+ " \n",
+ " pbar = tqdm(\n",
+ " range(0, len(test_X), batch_size), desc = 'test minibatch loop'\n",
+ " )\n",
+ " for i in pbar:\n",
+ " index = min(i + batch_size, len(test_X))\n",
+ " batch_x = test_X[i: index]\n",
+ " batch_x = pad_sequences(batch_x,padding='post')\n",
+ " batch_char = test_char[i: index]\n",
+ " batch_char = generate_char_seq(batch_char)\n",
+ " batch_y = test_Y[i: index]\n",
+ " batch_y = pad_sequences(batch_y,padding='post')\n",
+ " batch_depends = test_depends[i: index]\n",
+ " batch_depends = pad_sequences(batch_depends,padding='post')\n",
+ " \n",
+ " acc_depends, acc, cost = sess.run(\n",
+ " [model.accuracy_depends, model.accuracy, model.cost],\n",
+ " feed_dict = {\n",
+ " model.words: batch_x,\n",
+ " model.chars: batch_char,\n",
+ " model.types: batch_y,\n",
+ " model.heads: batch_depends\n",
+ " },\n",
+ " )\n",
+ " test_loss.append(cost)\n",
+ " test_acc.append(acc)\n",
+ " test_acc_depends.append(acc_depends)\n",
+ " pbar.set_postfix(cost = cost, accuracy = acc, accuracy_depends = acc_depends)\n",
+ " \n",
+ " \n",
+ " print(\n",
+ " 'epoch: %d, training loss: %f, training acc: %f, training depends: %f, valid loss: %f, valid acc: %f, valid depends: %f\\n'\n",
+ " % (e, np.mean(train_loss), \n",
+ " np.mean(train_acc), \n",
+ " np.mean(train_acc_depends), \n",
+ " np.mean(test_loss), \n",
+ " np.mean(test_acc), \n",
+ " np.mean(test_acc_depends)\n",
+ " ))\n",
+ " "
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 20,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "(array([ 3, 6, 22, 26, 6, 18, 16, 5, 3, 13, 10, 11, 6, 12, 31, 10, 16,\n",
+ " 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7,\n",
+ " 22]),\n",
+ " array([ 2, 6, 5, 5, 2, 8, 8, 0, 11, 11, 8, 14, 14, 8, 16, 14, 14,\n",
+ " 8, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,\n",
+ " 0]),\n",
+ " array([ 2, 8, 5, 5, 2, 8, 8, 0, 11, 11, 8, 14, 14, 8, 16, 14, 14,\n",
+ " 8, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,\n",
+ " 0], dtype=int32))"
+ ]
+ },
+ "execution_count": 20,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "tags_seq, heads = sess.run(\n",
+ " [model.tags_seq, model.heads_seq],\n",
+ " feed_dict = {\n",
+ " model.words: batch_x,\n",
+ " model.chars: batch_char\n",
+ " },\n",
+ ")\n",
+ "tags_seq[0], heads[0], batch_depends[0]"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 21,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "def evaluate(heads_pred, types_pred, heads, types, lengths,\n",
+ " symbolic_root=False, symbolic_end=False):\n",
+ " batch_size, _ = heads_pred.shape\n",
+ " ucorr = 0.\n",
+ " lcorr = 0.\n",
+ " total = 0.\n",
+ " ucomplete_match = 0.\n",
+ " lcomplete_match = 0.\n",
+ "\n",
+ " corr_root = 0.\n",
+ " total_root = 0.\n",
+ " start = 1 if symbolic_root else 0\n",
+ " end = 1 if symbolic_end else 0\n",
+ " for i in range(batch_size):\n",
+ " ucm = 1.\n",
+ " lcm = 1.\n",
+ " for j in range(start, lengths[i] - end):\n",
+ "\n",
+ " total += 1\n",
+ " if heads[i, j] == heads_pred[i, j]:\n",
+ " ucorr += 1\n",
+ " if types[i, j] == types_pred[i, j]:\n",
+ " lcorr += 1\n",
+ " else:\n",
+ " lcm = 0\n",
+ " else:\n",
+ " ucm = 0\n",
+ " lcm = 0\n",
+ "\n",
+ " if heads[i, j] == 0:\n",
+ " total_root += 1\n",
+ " corr_root += 1 if heads_pred[i, j] == 0 else 0\n",
+ "\n",
+ " ucomplete_match += ucm\n",
+ " lcomplete_match += lcm\n",
+ " \n",
+ " return ucorr / total, lcorr / total, corr_root / total_root"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 24,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "arcs, types, roots = [], [], []\n",
+ "\n",
+ "for i in range(0, len(test_X), batch_size):\n",
+ " index = min(i + batch_size, len(test_X))\n",
+ " batch_x = test_X[i: index]\n",
+ " batch_x = pad_sequences(batch_x,padding='post')\n",
+ " batch_char = test_char[i: index]\n",
+ " batch_char = generate_char_seq(batch_char)\n",
+ " batch_y = test_Y[i: index]\n",
+ " batch_y = pad_sequences(batch_y,padding='post')\n",
+ " batch_depends = test_depends[i: index]\n",
+ " batch_depends = pad_sequences(batch_depends,padding='post')\n",
+ " \n",
+ " tags_seq, heads = sess.run(\n",
+ " [model.tags_seq, model.heads_seq],\n",
+ " feed_dict = {\n",
+ " model.words: batch_x,\n",
+ " model.chars: batch_char\n",
+ " },\n",
+ " )\n",
+ " \n",
+ " arc_accuracy, type_accuracy, root_accuracy = evaluate(heads, tags_seq, batch_depends, batch_y, \n",
+ " np.count_nonzero(batch_x, axis = 1))\n",
+ " arcs.append(arc_accuracy)\n",
+ " types.append(type_accuracy)\n",
+ " roots.append(root_accuracy)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 25,
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "arc accuracy: 0.7242327707332225\n",
+ "types accuracy: 0.6353447083607996\n",
+ "root accuracy: 0.68515625\n"
+ ]
+ }
+ ],
+ "source": [
+ "print('arc accuracy:', np.mean(arcs))\n",
+ "print('types accuracy:', np.mean(types))\n",
+ "print('root accuracy:', np.mean(roots))"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": []
+ }
+ ],
+ "metadata": {
+ "kernelspec": {
+ "display_name": "Python 3",
+ "language": "python",
+ "name": "python3"
+ },
+ "language_info": {
+ "codemirror_mode": {
+ "name": "ipython",
+ "version": 3
+ },
+ "file_extension": ".py",
+ "mimetype": "text/x-python",
+ "name": "python",
+ "nbconvert_exporter": "python",
+ "pygments_lexer": "ipython3",
+ "version": "3.6.8"
+ }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 2
+}
diff --git a/dependency-parser/6.bert-biaffine-attention-cross-entropy.ipynb b/dependency-parser/6.bert-biaffine-attention-cross-entropy.ipynb
new file mode 100644
index 0000000..eb6f696
--- /dev/null
+++ b/dependency-parser/6.bert-biaffine-attention-cross-entropy.ipynb
@@ -0,0 +1,1277 @@
+{
+ "cells": [
+ {
+ "cell_type": "code",
+ "execution_count": 1,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "# !wget https://raw.githubusercontent.com/UniversalDependencies/UD_English-EWT/master/en_ewt-ud-dev.conllu\n",
+ "# !wget https://raw.githubusercontent.com/UniversalDependencies/UD_English-EWT/master/en_ewt-ud-train.conllu\n",
+ "# !wget https://raw.githubusercontent.com/UniversalDependencies/UD_English-EWT/master/en_ewt-ud-test.conllu\n",
+ "# !wget https://storage.googleapis.com/bert_models/2018_10_18/cased_L-12_H-768_A-12.zip\n",
+ "# !unzip cased_L-12_H-768_A-12.zip"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 2,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "import os\n",
+ "os.environ['CUDA_VISIBLE_DEVICES'] = '1'"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 3,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "tag2idx = {'PAD': 0, 'X': 1}\n",
+ "tag_idx = 2"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 4,
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stderr",
+ "output_type": "stream",
+ "text": [
+ "/home/husein/.local/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:516: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n",
+ " _np_qint8 = np.dtype([(\"qint8\", np.int8, 1)])\n",
+ "/home/husein/.local/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:517: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n",
+ " _np_quint8 = np.dtype([(\"quint8\", np.uint8, 1)])\n",
+ "/home/husein/.local/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:518: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n",
+ " _np_qint16 = np.dtype([(\"qint16\", np.int16, 1)])\n",
+ "/home/husein/.local/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:519: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n",
+ " _np_quint16 = np.dtype([(\"quint16\", np.uint16, 1)])\n",
+ "/home/husein/.local/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:520: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n",
+ " _np_qint32 = np.dtype([(\"qint32\", np.int32, 1)])\n",
+ "/home/husein/.local/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:525: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n",
+ " np_resource = np.dtype([(\"resource\", np.ubyte, 1)])\n"
+ ]
+ },
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/bert/optimization.py:87: The name tf.train.Optimizer is deprecated. Please use tf.compat.v1.train.Optimizer instead.\n",
+ "\n",
+ "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/bert/tokenization.py:125: The name tf.gfile.GFile is deprecated. Please use tf.io.gfile.GFile instead.\n",
+ "\n"
+ ]
+ },
+ {
+ "name": "stderr",
+ "output_type": "stream",
+ "text": [
+ "/home/husein/.local/lib/python3.6/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:541: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n",
+ " _np_qint8 = np.dtype([(\"qint8\", np.int8, 1)])\n",
+ "/home/husein/.local/lib/python3.6/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:542: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n",
+ " _np_quint8 = np.dtype([(\"quint8\", np.uint8, 1)])\n",
+ "/home/husein/.local/lib/python3.6/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:543: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n",
+ " _np_qint16 = np.dtype([(\"qint16\", np.int16, 1)])\n",
+ "/home/husein/.local/lib/python3.6/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:544: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n",
+ " _np_quint16 = np.dtype([(\"quint16\", np.uint16, 1)])\n",
+ "/home/husein/.local/lib/python3.6/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:545: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n",
+ " _np_qint32 = np.dtype([(\"qint32\", np.int32, 1)])\n",
+ "/home/husein/.local/lib/python3.6/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:550: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n",
+ " np_resource = np.dtype([(\"resource\", np.ubyte, 1)])\n"
+ ]
+ }
+ ],
+ "source": [
+ "import bert\n",
+ "from bert import run_classifier\n",
+ "from bert import optimization\n",
+ "from bert import tokenization\n",
+ "from bert import modeling\n",
+ "import tensorflow as tf\n",
+ "import numpy as np\n",
+ "\n",
+ "BERT_VOCAB = 'cased_L-12_H-768_A-12/vocab.txt'\n",
+ "BERT_INIT_CHKPNT = 'cased_L-12_H-768_A-12/bert_model.ckpt'\n",
+ "BERT_CONFIG = 'cased_L-12_H-768_A-12/bert_config.json'\n",
+ "\n",
+ "tokenizer = tokenization.FullTokenizer(\n",
+ " vocab_file=BERT_VOCAB, do_lower_case=False)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 5,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "def process_corpus(corpus, until = None):\n",
+ " global word2idx, tag2idx, char2idx, word_idx, tag_idx, char_idx\n",
+ " sentences, words, depends, labels, pos, sequences = [], [], [], [], [], []\n",
+ " temp_sentence, temp_word, temp_depend, temp_label, temp_pos = [], [], [], [], []\n",
+ " first_time = True\n",
+ " for sentence in corpus:\n",
+ " try:\n",
+ " if len(sentence):\n",
+ " if sentence[0] == '#':\n",
+ " continue\n",
+ " if first_time:\n",
+ " print(sentence)\n",
+ " first_time = False\n",
+ " sentence = sentence.split('\\t')\n",
+ " if sentence[7] not in tag2idx:\n",
+ " tag2idx[sentence[7]] = tag_idx\n",
+ " tag_idx += 1\n",
+ " temp_word.append(sentence[1])\n",
+ " temp_depend.append(int(sentence[6]) + 1)\n",
+ " temp_label.append(tag2idx[sentence[7]])\n",
+ " temp_sentence.append(sentence[1])\n",
+ " temp_pos.append(sentence[3])\n",
+ " else:\n",
+ " if len(temp_sentence) < 2 or len(temp_word) != len(temp_label):\n",
+ " temp_word = []\n",
+ " temp_depend = []\n",
+ " temp_label = []\n",
+ " temp_sentence = []\n",
+ " temp_pos = []\n",
+ " continue\n",
+ " bert_tokens = ['[CLS]']\n",
+ " labels_ = [0]\n",
+ " depends_ = [0]\n",
+ " seq_ = []\n",
+ " for no, orig_token in enumerate(temp_word):\n",
+ " labels_.append(temp_label[no])\n",
+ " depends_.append(temp_depend[no])\n",
+ " t = tokenizer.tokenize(orig_token)\n",
+ " bert_tokens.extend(t)\n",
+ " labels_.extend([1] * (len(t) - 1))\n",
+ " depends_.extend([0] * (len(t) - 1))\n",
+ " seq_.append(no + 1)\n",
+ " bert_tokens.append('[SEP]')\n",
+ " labels_.append(0)\n",
+ " depends_.append(0)\n",
+ " words.append(tokenizer.convert_tokens_to_ids(bert_tokens))\n",
+ " depends.append(depends_)\n",
+ " labels.append(labels_)\n",
+ " sentences.append(temp_sentence)\n",
+ " pos.append(temp_pos)\n",
+ " sequences.append(seq_)\n",
+ " temp_word = []\n",
+ " temp_depend = []\n",
+ " temp_label = []\n",
+ " temp_sentence = []\n",
+ " temp_pos = []\n",
+ " except Exception as e:\n",
+ " print(e, sentence)\n",
+ " return sentences[:-1], words[:-1], depends[:-1], labels[:-1], pos[:-1], sequences[:-1]"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 6,
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "1\tFrom\tfrom\tADP\tIN\t_\t3\tcase\t3:case\t_\n",
+ "invalid literal for int() with base 10: '_' ['10.1', 'has', 'have', 'VERB', 'VBZ', '_', '_', '_', '8:parataxis', 'CopyOf=-1']\n",
+ "invalid literal for int() with base 10: '_' ['21.1', 'has', 'have', 'VERB', 'VBZ', '_', '_', '_', '16:conj:and', 'CopyOf=-1']\n"
+ ]
+ }
+ ],
+ "source": [
+ "with open('en_ewt-ud-dev.conllu') as fopen:\n",
+ " dev = fopen.read().split('\\n')\n",
+ "\n",
+ "sentences_dev, words_dev, depends_dev, labels_dev, _, seq_dev = process_corpus(dev)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 7,
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "1\tWhat\twhat\tPRON\tWP\tPronType=Int\t0\troot\t0:root\t_\n",
+ "invalid literal for int() with base 10: '_' ['24.1', 'left', 'left', 'VERB', 'VBN', 'Tense=Past|VerbForm=Part', '_', '_', '6:parataxis', 'CopyOf=6']\n"
+ ]
+ }
+ ],
+ "source": [
+ "with open('en_ewt-ud-test.conllu') as fopen:\n",
+ " test = fopen.read().split('\\n')\n",
+ "\n",
+ "sentences_test, words_test, depends_test, labels_test, _, seq_test = process_corpus(test)\n",
+ "sentences_test.extend(sentences_dev)\n",
+ "words_test.extend(words_dev)\n",
+ "depends_test.extend(depends_dev)\n",
+ "labels_test.extend(labels_dev)\n",
+ "seq_test.extend(seq_dev)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 8,
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "1\tAl\tAl\tPROPN\tNNP\tNumber=Sing\t0\troot\t0:root\tSpaceAfter=No\n",
+ "invalid literal for int() with base 10: '_' ['8.1', 'reported', 'report', 'VERB', 'VBN', 'Tense=Past|VerbForm=Part|Voice=Pass', '_', '_', '5:conj:and', 'CopyOf=5']\n",
+ "invalid literal for int() with base 10: '_' ['22.1', 'used', 'use', 'VERB', 'VBN', 'Tense=Past|VerbForm=Part', '_', '_', '13:advcl:with|17:conj:and', 'CopyOf=17']\n",
+ "invalid literal for int() with base 10: '_' ['22.1', 'used', 'use', 'VERB', 'VBN', 'Tense=Past|VerbForm=Part', '_', '_', '13:advcl:with|17:conj:and', 'CopyOf=17']\n",
+ "invalid literal for int() with base 10: '_' ['11.1', 'called', 'call', 'VERB', 'VBN', 'Tense=Past|VerbForm=Part|Voice=Pass', '_', '_', '3:conj:and', 'CopyOf=3']\n",
+ "invalid literal for int() with base 10: '_' ['14.1', 'is', 'be', 'VERB', 'VBZ', '_', '_', '_', '1:conj:and', 'CopyOf=1']\n",
+ "invalid literal for int() with base 10: '_' ['20.1', 'reflect', 'reflect', 'VERB', 'VBP', 'Mood=Ind|Tense=Pres|VerbForm=Fin', '_', '_', '7:acl:relcl|9:conj', 'CopyOf=9']\n",
+ "invalid literal for int() with base 10: '_' ['21.1', 'recruited', 'recruit', 'VERB', 'VBD', 'Mood=Ind|Tense=Past|VerbForm=Fin', '_', '_', '9:conj:and', 'CopyOf=9']\n",
+ "invalid literal for int() with base 10: '_' ['9.1', 'wish', 'wish', 'VERB', 'VBP', 'Mood=Ind|Tense=Pres|VerbForm=Fin', '_', '_', '2:conj:and', 'CopyOf=2']\n",
+ "invalid literal for int() with base 10: '_' ['38.1', 'supplied', 'supply', 'VERB', 'VBN', 'Tense=Past|VerbForm=Part|Voice=Pass', '_', '_', '16:conj:and', 'CopyOf=16']\n",
+ "invalid literal for int() with base 10: '_' ['18.1', 'keep', 'keep', 'VERB', 'VB', 'Mood=Imp|VerbForm=Fin', '_', '_', '14:conj:and', 'CopyOf=14']\n",
+ "invalid literal for int() with base 10: '_' ['21.1', 'keep', 'keep', 'VERB', 'VB', 'Mood=Imp|VerbForm=Fin', '_', '_', '14:conj:and', 'CopyOf=14']\n",
+ "invalid literal for int() with base 10: '_' ['18.1', 'mean', 'mean', 'VERB', 'VB', 'VerbForm=Inf', '_', '_', '8:conj', 'CopyOf=8']\n",
+ "invalid literal for int() with base 10: '_' ['30.1', 'play', 'play', 'VERB', 'VBP', 'Mood=Ind|Tense=Pres|VerbForm=Fin', '_', '_', '18:acl:relcl|27:conj:but', 'CopyOf=27']\n",
+ "invalid literal for int() with base 10: '_' ['22.1', 'have', 'have', 'VERB', 'VBP', 'Mood=Ind|Tense=Pres|VerbForm=Fin', '_', '_', '17:conj', 'CopyOf=17']\n",
+ "invalid literal for int() with base 10: '_' ['27.1', 'have', 'have', 'VERB', 'VBP', 'Mood=Ind|Tense=Pres|VerbForm=Fin', '_', '_', '17:conj', 'CopyOf=17']\n",
+ "invalid literal for int() with base 10: '_' ['49.1', 'helped', 'help', 'VERB', 'VBD', '_', '_', '_', '38:conj:but', 'CopyOf=38']\n",
+ "invalid literal for int() with base 10: '_' ['7.1', 'found', 'find', 'VERB', 'VBD', 'Mood=Ind|Tense=Past|VerbForm=Fin', '_', '_', '3:conj', 'CopyOf=3']\n",
+ "invalid literal for int() with base 10: '_' ['10.1', 'excited', 'excited', 'ADJ', 'JJ', 'Degree=Pos', '_', '_', '4:advcl', 'CopyOf=4']\n",
+ "invalid literal for int() with base 10: '_' ['15.1', \"'s\", 'be', 'VERB', 'VBZ', '_', '_', '_', '2:conj:and', 'CopyOf=2']\n",
+ "invalid literal for int() with base 10: '_' ['25.1', 'took', 'take', 'VERB', 'VBD', 'Mood=Ind|Tense=Past|VerbForm=Fin', '_', '_', '17:conj:and', 'CopyOf=17']\n",
+ "invalid literal for int() with base 10: '_' ['10.1', 'loss', 'lose', 'VERB', 'VBD', 'Mood=Ind|Tense=Past|VerbForm=Fin', '_', '_', '3:conj:and', 'CopyOf=3']\n",
+ "invalid literal for int() with base 10: '_' ['11.1', 'leave', 'leave', 'VERB', 'VB', 'VerbForm=Inf', '_', '_', '7:parataxis', 'CopyOf=7']\n",
+ "invalid literal for int() with base 10: '_' ['24.1', 'charge', 'charge', 'VERB', 'VBP', 'Mood=Ind|Tense=Pres|VerbForm=Fin', '_', '_', '16:conj:and', 'CopyOf=16']\n"
+ ]
+ }
+ ],
+ "source": [
+ "with open('en_ewt-ud-train.conllu') as fopen:\n",
+ " train = fopen.read().split('\\n')\n",
+ "\n",
+ "sentences_train, words_train, depends_train, labels_train, _, _ = process_corpus(train)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 9,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "(12000, 3824)"
+ ]
+ },
+ "execution_count": 9,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "len(sentences_train), len(sentences_test)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 10,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "idx2tag = {v:k for k, v in tag2idx.items()}"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 11,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "train_X = words_train\n",
+ "train_Y = labels_train\n",
+ "train_depends = depends_train\n",
+ "\n",
+ "test_X = words_test\n",
+ "test_Y = labels_test\n",
+ "test_depends = depends_test"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 12,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "epoch = 15\n",
+ "batch_size = 32\n",
+ "warmup_proportion = 0.1\n",
+ "num_train_steps = int(len(train_X) / batch_size * epoch)\n",
+ "num_warmup_steps = int(num_train_steps * warmup_proportion)\n",
+ "bert_config = modeling.BertConfig.from_json_file(BERT_CONFIG)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 15,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "class BiAAttention:\n",
+ " def __init__(self, input_size_encoder, input_size_decoder, num_labels):\n",
+ " self.input_size_encoder = input_size_encoder\n",
+ " self.input_size_decoder = input_size_decoder\n",
+ " self.num_labels = num_labels\n",
+ " \n",
+ " self.W_d = tf.get_variable(\"W_d\", shape=[self.num_labels, self.input_size_decoder],\n",
+ " initializer=tf.contrib.layers.xavier_initializer())\n",
+ " self.W_e = tf.get_variable(\"W_e\", shape=[self.num_labels, self.input_size_encoder],\n",
+ " initializer=tf.contrib.layers.xavier_initializer())\n",
+ " self.U = tf.get_variable(\"U\", shape=[self.num_labels, self.input_size_decoder, self.input_size_encoder],\n",
+ " initializer=tf.contrib.layers.xavier_initializer())\n",
+ " \n",
+ " def forward(self, input_d, input_e, mask_d=None, mask_e=None):\n",
+ " batch = tf.shape(input_d)[0]\n",
+ " length_decoder = tf.shape(input_d)[1]\n",
+ " length_encoder = tf.shape(input_e)[1]\n",
+ " out_d = tf.expand_dims(tf.matmul(self.W_d, tf.transpose(input_d, [0, 2, 1])), 3)\n",
+ " out_e = tf.expand_dims(tf.matmul(self.W_e, tf.transpose(input_e, [0, 2, 1])), 2)\n",
+ " output = tf.matmul(tf.expand_dims(input_d, 1), self.U)\n",
+ " output = tf.matmul(output, tf.transpose(tf.expand_dims(input_e, 1), [0, 1, 3, 2]))\n",
+ " \n",
+ " output = output + out_d + out_e\n",
+ " \n",
+ " if mask_d is not None:\n",
+ " d = tf.expand_dims(tf.expand_dims(mask_d, 1), 3)\n",
+ " e = tf.expand_dims(tf.expand_dims(mask_e, 1), 2)\n",
+ " output = output * d * e\n",
+ " \n",
+ " return output\n",
+ " \n",
+ "class BiLinear:\n",
+ " def __init__(self, left_features, right_features, out_features):\n",
+ " self.left_features = left_features\n",
+ " self.right_features = right_features\n",
+ " self.out_features = out_features\n",
+ " \n",
+ " self.U = tf.get_variable(\"U-bi\", shape=[out_features, left_features, right_features],\n",
+ " initializer=tf.contrib.layers.xavier_initializer())\n",
+ " self.W_l = tf.get_variable(\"Wl\", shape=[out_features, left_features],\n",
+ " initializer=tf.contrib.layers.xavier_initializer())\n",
+ " self.W_r = tf.get_variable(\"Wr\", shape=[out_features, right_features],\n",
+ " initializer=tf.contrib.layers.xavier_initializer())\n",
+ " \n",
+ " def forward(self, input_left, input_right):\n",
+ " left_size = tf.shape(input_left)\n",
+ " output_shape = tf.concat([left_size[:-1], [self.out_features]], axis = 0)\n",
+ " batch = tf.cast(tf.reduce_prod(left_size[:-1]), tf.int32)\n",
+ " input_left = tf.reshape(input_left, (batch, self.left_features))\n",
+ " input_right = tf.reshape(input_right, (batch, self.right_features))\n",
+ " tiled = tf.tile(tf.expand_dims(input_left, axis = 0), (self.out_features,1,1))\n",
+ " output = tf.transpose(tf.reduce_sum(tf.matmul(tiled, self.U), axis = 2))\n",
+ " output = output + tf.matmul(input_left, tf.transpose(self.W_l))\\\n",
+ " + tf.matmul(input_right, tf.transpose(self.W_r))\n",
+ " \n",
+ " return tf.reshape(output, output_shape)\n",
+ "\n",
+ "class Attention:\n",
+ " def __init__(self, word_dim, num_words, char_dim, num_chars, num_filters, kernel_size,\n",
+ " hidden_size, encoder_layers, num_labels, arc_space, type_space):\n",
+ " \n",
+ " def cells(size, reuse=False):\n",
+ " return tf.nn.rnn_cell.LSTMCell(size,\n",
+ " initializer=tf.orthogonal_initializer(),reuse=reuse)\n",
+ " \n",
+ " self.word_embedd = tf.Variable(tf.random_uniform([num_words, word_dim], -1, 1))\n",
+ " self.char_embedd = tf.Variable(tf.random_uniform([num_chars, char_dim], -1, 1))\n",
+ " self.conv1d = tf.layers.Conv1D(num_filters, kernel_size, 1, padding='VALID')\n",
+ " self.num_labels = num_labels\n",
+ " self.encoder = tf.nn.rnn_cell.MultiRNNCell([cells(hidden_size) for _ in range(encoder_layers)])\n",
+ "\n",
+ " \n",
+ " \n",
+ " def encode(self, input_word, input_char):\n",
+ " word = tf.nn.embedding_lookup(self.word_embedd, input_word)\n",
+ " char = tf.nn.embedding_lookup(self.char_embedd, input_char)\n",
+ " b = tf.shape(char)[0]\n",
+ " wl = tf.shape(char)[1]\n",
+ " cl = tf.shape(char)[2]\n",
+ " d = char.shape[3]\n",
+ " char = tf.reshape(char, [b * wl, cl, d])\n",
+ " char = tf.reduce_max(self.conv1d(char), axis = 1)\n",
+ " char = tf.nn.tanh(char)\n",
+ " d = char.shape[-1]\n",
+ " char = tf.reshape(char, [b, wl, d])\n",
+ " \n",
+ " src_encoding = tf.concat([word, char], axis=2)\n",
+ " output, hn = tf.nn.dynamic_rnn(self.encoder, src_encoding, dtype = tf.float32,\n",
+ " scope = 'encoder')\n",
+ " arc_h = tf.nn.elu(self.arc_h(output))\n",
+ " arc_c = tf.nn.elu(self.arc_c(output))\n",
+ " \n",
+ " type_h = tf.nn.elu(self.type_h(output))\n",
+ " type_c = tf.nn.elu(self.type_c(output))\n",
+ " \n",
+ " return (arc_h, arc_c), (type_h, type_c), hn\n",
+ " \n",
+ " def forward(self, input_word, input_char, mask):\n",
+ " arcs, types, _ = self.encode(input_word, input_char)\n",
+ " \n",
+ " out_arc = tf.squeeze(self.attention.forward(arcs[0], arcs[1], mask_d=mask, mask_e=mask), axis = 1)\n",
+ " return out_arc, types, mask\n",
+ " \n",
+ " def loss(self, input_word, input_char, mask, heads, types):\n",
+ " out_arc, out_type, _ = self.forward(input_word, input_char, mask)\n",
+ " type_h, type_c = out_type\n",
+ " batch = tf.shape(out_arc)[0]\n",
+ " max_len = tf.shape(out_arc)[1]\n",
+ " batch_index = tf.range(0, batch)\n",
+ " t = tf.transpose(heads)\n",
+ " broadcasted = tf.broadcast_to(batch_index, tf.shape(t))\n",
+ " concatenated = tf.transpose(tf.concat([tf.expand_dims(broadcasted, axis = 0), \n",
+ " tf.expand_dims(t, axis = 0)], axis = 0))\n",
+ " type_h = tf.gather_nd(type_h, concatenated)\n",
+ " out_type = self.bilinear.forward(type_h, type_c)\n",
+ " minus_inf = -1e8\n",
+ " minus_mask = (1 - mask) * minus_inf\n",
+ " out_arc = out_arc + tf.expand_dims(minus_mask, axis = 2) + tf.expand_dims(minus_mask, axis = 1)\n",
+ " loss_arc = tf.nn.log_softmax(out_arc, dim=1)\n",
+ " loss_type = tf.nn.log_softmax(out_type, dim=2)\n",
+ " loss_arc = loss_arc * tf.expand_dims(mask, axis = 2) * tf.expand_dims(mask, axis = 1)\n",
+ " loss_type = loss_type * tf.expand_dims(mask, axis = 2)\n",
+ " num = tf.reduce_sum(mask) - tf.cast(batch, tf.float32)\n",
+ " child_index = tf.tile(tf.expand_dims(tf.range(0, max_len), 1), [1, batch])\n",
+ " t = tf.transpose(heads)\n",
+ " broadcasted = tf.broadcast_to(batch_index, tf.shape(t))\n",
+ " concatenated = tf.transpose(tf.concat([tf.expand_dims(broadcasted, axis = 0),\n",
+ " tf.expand_dims(t, axis = 0),\n",
+ " tf.expand_dims(child_index, axis = 0)], axis = 0))\n",
+ " loss_arc = tf.gather_nd(loss_arc, concatenated)\n",
+ " loss_arc = tf.transpose(loss_arc, [1, 0])\n",
+ " \n",
+ " t = tf.transpose(types)\n",
+ " broadcasted = tf.broadcast_to(batch_index, tf.shape(t))\n",
+ " concatenated = tf.transpose(tf.concat([tf.expand_dims(broadcasted, axis = 0),\n",
+ " tf.expand_dims(child_index, axis = 0),\n",
+ " tf.expand_dims(t, axis = 0)], axis = 0))\n",
+ " loss_type = tf.gather_nd(loss_type, concatenated)\n",
+ " loss_type = tf.transpose(loss_type, [1, 0])\n",
+ " return tf.reduce_sum(-loss_arc) / num, tf.reduce_sum(-loss_type) / num\n",
+ " \n",
+ " def decode(self, input_word, input_char, mask, leading_symbolic=0):\n",
+ " out_arc, out_type, _ = self.forward(input_word, input_char, mask)\n",
+ " batch = tf.shape(out_arc)[0]\n",
+ " max_len = tf.shape(out_arc)[1]\n",
+ " sec_max_len = tf.shape(out_arc)[2]\n",
+ " out_arc = out_arc + tf.linalg.diag(tf.fill([max_len], -np.inf))\n",
+ " minus_mask = tf.expand_dims(tf.cast(1 - mask, tf.bool), axis = 2)\n",
+ " minus_mask = tf.tile(minus_mask, [1, 1, sec_max_len])\n",
+ " out_arc = tf.where(minus_mask, tf.fill(tf.shape(out_arc), -np.inf), out_arc)\n",
+ " heads = tf.argmax(out_arc, axis = 1)\n",
+ " type_h, type_c = out_type\n",
+ " batch = tf.shape(type_h)[0]\n",
+ " max_len = tf.shape(type_h)[1]\n",
+ " batch_index = tf.range(0, batch)\n",
+ " t = tf.cast(tf.transpose(heads), tf.int32)\n",
+ " broadcasted = tf.broadcast_to(batch_index, tf.shape(t))\n",
+ " concatenated = tf.transpose(tf.concat([tf.expand_dims(broadcasted, axis = 0), \n",
+ " tf.expand_dims(t, axis = 0)], axis = 0))\n",
+ " type_h = tf.gather_nd(type_h, concatenated)\n",
+ " out_type = self.bilinear.forward(type_h, type_c)\n",
+ " out_type = out_type[:, :, leading_symbolic:]\n",
+ " types = tf.argmax(out_type, axis = 2)\n",
+ " return heads, types\n",
+ " \n",
+ "class Model:\n",
+ " def __init__(\n",
+ " self,\n",
+ " learning_rate,\n",
+ " hidden_size_word,\n",
+ " cov = 0.0):\n",
+ " \n",
+ " def cells(size, reuse = False):\n",
+ " return tf.contrib.rnn.DropoutWrapper(\n",
+ " tf.nn.rnn_cell.LSTMCell(\n",
+ " size,\n",
+ " initializer = tf.orthogonal_initializer(),\n",
+ " reuse = reuse,\n",
+ " ),\n",
+ " output_keep_prob = dropout,\n",
+ " )\n",
+ " \n",
+ " self.words = tf.placeholder(tf.int32, (None, None))\n",
+ " self.heads = tf.placeholder(tf.int32, (None, None))\n",
+ " self.types = tf.placeholder(tf.int32, (None, None))\n",
+ " self.mask = tf.cast(tf.math.not_equal(self.words, 0), tf.float32)\n",
+ " self.maxlen = tf.shape(self.words)[1]\n",
+ " self.lengths = tf.count_nonzero(self.words, 1)\n",
+ " mask = self.mask\n",
+ " heads = self.heads\n",
+ " types = self.types\n",
+ " \n",
+ " self.arc_h = tf.layers.Dense(hidden_size_word)\n",
+ " self.arc_c = tf.layers.Dense(hidden_size_word)\n",
+ " self.attention = BiAAttention(hidden_size_word, hidden_size_word, 1)\n",
+ "\n",
+ " self.type_h = tf.layers.Dense(hidden_size_word)\n",
+ " self.type_c = tf.layers.Dense(hidden_size_word)\n",
+ " self.bilinear = BiLinear(hidden_size_word, hidden_size_word, len(tag2idx))\n",
+ " \n",
+ " model = modeling.BertModel(\n",
+ " config=bert_config,\n",
+ " is_training=True,\n",
+ " input_ids=self.words,\n",
+ " use_one_hot_embeddings=False)\n",
+ " output_layer = model.get_sequence_output()\n",
+ " \n",
+ " arc_h = tf.nn.elu(self.arc_h(output_layer))\n",
+ " arc_c = tf.nn.elu(self.arc_c(output_layer))\n",
+ " \n",
+ " type_h = tf.nn.elu(self.type_h(output_layer))\n",
+ " type_c = tf.nn.elu(self.type_c(output_layer))\n",
+ " \n",
+ " out_arc = tf.squeeze(self.attention.forward(arc_h, arc_h, mask_d=self.mask, \n",
+ " mask_e=self.mask), axis = 1)\n",
+ " \n",
+ " batch = tf.shape(out_arc)[0]\n",
+ " max_len = tf.shape(out_arc)[1]\n",
+ " sec_max_len = tf.shape(out_arc)[2]\n",
+ " batch_index = tf.range(0, batch)\n",
+ " \n",
+ " decode_arc = out_arc + tf.linalg.diag(tf.fill([max_len], -np.inf))\n",
+ " minus_mask = tf.expand_dims(tf.cast(1 - mask, tf.bool), axis = 2)\n",
+ " minus_mask = tf.tile(minus_mask, [1, 1, sec_max_len])\n",
+ " decode_arc = tf.where(minus_mask, tf.fill(tf.shape(decode_arc), -np.inf), decode_arc)\n",
+ " self.heads_seq = tf.argmax(decode_arc, axis = 1)\n",
+ " \n",
+ " t = tf.cast(tf.transpose(self.heads_seq), tf.int32)\n",
+ " broadcasted = tf.broadcast_to(batch_index, tf.shape(t))\n",
+ " concatenated = tf.transpose(tf.concat([tf.expand_dims(broadcasted, axis = 0), \n",
+ " tf.expand_dims(t, axis = 0)], axis = 0))\n",
+ " type_h = tf.gather_nd(type_h, concatenated)\n",
+ " out_type = self.bilinear.forward(type_h, type_c)\n",
+ " self.tags_seq = tf.argmax(out_type, axis = 2)\n",
+ " \n",
+ " batch = tf.shape(out_arc)[0]\n",
+ " max_len = tf.shape(out_arc)[1]\n",
+ " batch_index = tf.range(0, batch)\n",
+ " t = tf.transpose(heads)\n",
+ " broadcasted = tf.broadcast_to(batch_index, tf.shape(t))\n",
+ " concatenated = tf.transpose(tf.concat([tf.expand_dims(broadcasted, axis = 0), \n",
+ " tf.expand_dims(t, axis = 0)], axis = 0))\n",
+ " type_h = tf.gather_nd(type_h, concatenated)\n",
+ " out_type = self.bilinear.forward(type_h, type_c)\n",
+ " minus_inf = -1e8\n",
+ " minus_mask = (1 - mask) * minus_inf\n",
+ " out_arc = out_arc + tf.expand_dims(minus_mask, axis = 2) + tf.expand_dims(minus_mask, axis = 1)\n",
+ " loss_arc = tf.nn.log_softmax(out_arc, dim=1)\n",
+ " loss_type = tf.nn.log_softmax(out_type, dim=2)\n",
+ " loss_arc = loss_arc * tf.expand_dims(mask, axis = 2) * tf.expand_dims(mask, axis = 1)\n",
+ " loss_type = loss_type * tf.expand_dims(mask, axis = 2)\n",
+ " num = tf.reduce_sum(mask) - tf.cast(batch, tf.float32)\n",
+ " child_index = tf.tile(tf.expand_dims(tf.range(0, max_len), 1), [1, batch])\n",
+ " t = tf.transpose(heads)\n",
+ " broadcasted = tf.broadcast_to(batch_index, tf.shape(t))\n",
+ " concatenated = tf.transpose(tf.concat([tf.expand_dims(broadcasted, axis = 0),\n",
+ " tf.expand_dims(t, axis = 0),\n",
+ " tf.expand_dims(child_index, axis = 0)], axis = 0))\n",
+ " loss_arc = tf.gather_nd(loss_arc, concatenated)\n",
+ " loss_arc = tf.transpose(loss_arc, [1, 0])\n",
+ " \n",
+ " t = tf.transpose(types)\n",
+ " broadcasted = tf.broadcast_to(batch_index, tf.shape(t))\n",
+ " concatenated = tf.transpose(tf.concat([tf.expand_dims(broadcasted, axis = 0),\n",
+ " tf.expand_dims(child_index, axis = 0),\n",
+ " tf.expand_dims(t, axis = 0)], axis = 0))\n",
+ " loss_type = tf.gather_nd(loss_type, concatenated)\n",
+ " loss_type = tf.transpose(loss_type, [1, 0])\n",
+ " self.cost = (tf.reduce_sum(-loss_arc) / num) + (tf.reduce_sum(-loss_type) / num)\n",
+ " self.optimizer = optimization.create_optimizer(self.cost, learning_rate, \n",
+ " num_train_steps, num_warmup_steps, False)\n",
+ " \n",
+ " mask = tf.sequence_mask(self.lengths, maxlen = self.maxlen)\n",
+ " \n",
+ " self.prediction = tf.boolean_mask(self.tags_seq, mask)\n",
+ " mask_label = tf.boolean_mask(self.types, mask)\n",
+ " correct_pred = tf.equal(tf.cast(self.prediction, tf.int32), mask_label)\n",
+ " correct_index = tf.cast(correct_pred, tf.float32)\n",
+ " self.accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32))\n",
+ " \n",
+ " self.prediction = tf.cast(tf.boolean_mask(self.heads_seq, mask), tf.int32)\n",
+ " mask_label = tf.boolean_mask(self.heads, mask)\n",
+ " correct_pred = tf.equal(self.prediction, mask_label)\n",
+ " correct_index = tf.cast(correct_pred, tf.float32)\n",
+ " self.accuracy_depends = tf.reduce_mean(tf.cast(correct_pred, tf.float32))"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 16,
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/bert/modeling.py:171: The name tf.variable_scope is deprecated. Please use tf.compat.v1.variable_scope instead.\n",
+ "\n",
+ "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/bert/modeling.py:409: The name tf.get_variable is deprecated. Please use tf.compat.v1.get_variable instead.\n",
+ "\n",
+ "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/bert/modeling.py:490: The name tf.assert_less_equal is deprecated. Please use tf.compat.v1.assert_less_equal instead.\n",
+ "\n",
+ "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/bert/modeling.py:358: calling dropout (from tensorflow.python.ops.nn_ops) with keep_prob is deprecated and will be removed in a future version.\n",
+ "Instructions for updating:\n",
+ "Please use `rate` instead of `keep_prob`. Rate should be set to `rate = 1 - keep_prob`.\n"
+ ]
+ },
+ {
+ "name": "stderr",
+ "output_type": "stream",
+ "text": [
+ "/home/husein/.local/lib/python3.6/site-packages/tensorflow/python/client/session.py:1735: UserWarning: An interactive session is already active. This can cause out-of-memory errors in some cases. You must explicitly call `InteractiveSession.close()` to release resources held by the other session(s).\n",
+ " warnings.warn('An interactive session is already active. This can '\n"
+ ]
+ },
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/bert/modeling.py:671: dense (from tensorflow.python.layers.core) is deprecated and will be removed in a future version.\n",
+ "Instructions for updating:\n",
+ "Use keras.layers.dense instead.\n",
+ "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/tensorflow/python/ops/init_ops.py:1251: calling VarianceScaling.__init__ (from tensorflow.python.ops.init_ops) with dtype is deprecated and will be removed in a future version.\n",
+ "Instructions for updating:\n",
+ "Call initializer instance with the dtype argument instead of passing it to the constructor\n",
+ "WARNING:tensorflow:From :225: add_dispatch_support..wrapper (from tensorflow.python.ops.array_ops) is deprecated and will be removed in a future version.\n",
+ "Instructions for updating:\n",
+ "Use tf.where in 2.0, which has the same broadcast rule as np.where\n",
+ "WARNING:tensorflow:From :248: calling log_softmax (from tensorflow.python.ops.nn_ops) with dim is deprecated and will be removed in a future version.\n",
+ "Instructions for updating:\n",
+ "dim is deprecated, use axis instead\n",
+ "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/bert/optimization.py:27: The name tf.train.get_or_create_global_step is deprecated. Please use tf.compat.v1.train.get_or_create_global_step instead.\n",
+ "\n",
+ "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/bert/optimization.py:32: The name tf.train.polynomial_decay is deprecated. Please use tf.compat.v1.train.polynomial_decay instead.\n",
+ "\n",
+ "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/tensorflow/python/keras/optimizer_v2/learning_rate_schedule.py:409: div (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.\n",
+ "Instructions for updating:\n",
+ "Deprecated in favor of operator or tf.math.divide.\n",
+ "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/bert/optimization.py:70: The name tf.trainable_variables is deprecated. Please use tf.compat.v1.trainable_variables instead.\n",
+ "\n"
+ ]
+ }
+ ],
+ "source": [
+ "tf.reset_default_graph()\n",
+ "sess = tf.InteractiveSession()\n",
+ "\n",
+ "learning_rate = 2e-5\n",
+ "hidden_size_word = 128\n",
+ "\n",
+ "model = Model(learning_rate, hidden_size_word)\n",
+ "sess.run(tf.global_variables_initializer())"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 17,
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/tensorflow/python/training/saver.py:1276: checkpoint_exists (from tensorflow.python.training.checkpoint_management) is deprecated and will be removed in a future version.\n",
+ "Instructions for updating:\n",
+ "Use standard file APIs to check for files with this prefix.\n",
+ "INFO:tensorflow:Restoring parameters from cased_L-12_H-768_A-12/bert_model.ckpt\n"
+ ]
+ }
+ ],
+ "source": [
+ "var_lists = tf.get_collection(tf.GraphKeys.TRAINABLE_VARIABLES, scope = 'bert')\n",
+ "saver = tf.train.Saver(var_list = var_lists)\n",
+ "saver.restore(sess, BERT_INIT_CHKPNT)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 19,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "from tensorflow.keras.preprocessing.sequence import pad_sequences\n",
+ "\n",
+ "batch_x = train_X[:5]\n",
+ "batch_x = pad_sequences(batch_x,padding='post')\n",
+ "batch_y = train_Y[:5]\n",
+ "batch_y = pad_sequences(batch_y,padding='post')\n",
+ "batch_depends = train_depends[:5]\n",
+ "batch_depends = pad_sequences(batch_depends,padding='post')"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 20,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "[0.0070422534, 0.028169014, 12.410244]"
+ ]
+ },
+ "execution_count": 20,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "sess.run([model.accuracy, model.accuracy_depends, model.cost],\n",
+ " feed_dict = {model.words: batch_x,\n",
+ " model.types: batch_y,\n",
+ " model.heads: batch_depends})"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 21,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "(array([ 8, 8, 48, 34, 36, 36, 27, 30, 19, 8, 34, 29, 29, 41, 28, 41, 19,\n",
+ " 20, 20, 41, 47, 20, 23, 47, 28, 19, 27, 41, 18, 48, 36, 41, 27, 34,\n",
+ " 36, 4, 28, 8, 8, 8, 4, 8, 8, 4]),\n",
+ " array([20, 10, 16, 2, 9, 10, 0, 21, 1, 0, 2, 2, 2, 10, 10, 10, 17,\n",
+ " 36, 36, 10, 2, 36, 10, 2, 10, 0, 0, 10, 36, 16, 10, 10, 0, 2,\n",
+ " 10, 1, 10, 0, 0, 0, 0, 0, 0, 0]),\n",
+ " array([ 0, 1, 2, 2, 0, 2, 7, 8, 2, 8, 0, 0, 9, 9, 9, 9, 0,\n",
+ " 9, 16, 9, 19, 19, 8, 22, 22, 19, 24, 22, 0, 0, 22, 29, 29, 29,\n",
+ " 22, 2, 0, 0, 0, 0, 0, 0, 0, 0], dtype=int32))"
+ ]
+ },
+ "execution_count": 21,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "tags_seq, heads = sess.run(\n",
+ " [model.tags_seq, model.heads_seq],\n",
+ " feed_dict = {\n",
+ " model.words: batch_x,\n",
+ " },\n",
+ ")\n",
+ "tags_seq[0], heads[0], batch_depends[0]"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 22,
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stderr",
+ "output_type": "stream",
+ "text": [
+ "train minibatch loop: 100%|██████████| 375/375 [01:10<00:00, 5.31it/s, accuracy=0.754, accuracy_depends=0.482, cost=2.55] \n",
+ "test minibatch loop: 100%|██████████| 120/120 [00:05<00:00, 20.51it/s, accuracy=0.808, accuracy_depends=0.549, cost=2] \n",
+ "train minibatch loop: 0%| | 1/375 [00:00<01:06, 5.64it/s, accuracy=0.746, accuracy_depends=0.383, cost=2.83]"
+ ]
+ },
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "epoch: 0, training loss: 4.682894, training acc: 0.433306, training depends: 0.308738, valid loss: 2.175135, valid acc: 0.757226, valid depends: 0.515791\n",
+ "\n"
+ ]
+ },
+ {
+ "name": "stderr",
+ "output_type": "stream",
+ "text": [
+ "train minibatch loop: 100%|██████████| 375/375 [01:06<00:00, 5.67it/s, accuracy=0.884, accuracy_depends=0.641, cost=1.37] \n",
+ "test minibatch loop: 100%|██████████| 120/120 [00:05<00:00, 20.69it/s, accuracy=0.886, accuracy_depends=0.724, cost=0.95] \n",
+ "train minibatch loop: 0%| | 1/375 [00:00<01:05, 5.72it/s, accuracy=0.848, accuracy_depends=0.53, cost=1.85]"
+ ]
+ },
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "epoch: 1, training loss: 1.797183, training acc: 0.815364, training depends: 0.561427, valid loss: 1.349600, valid acc: 0.857193, valid depends: 0.636366\n",
+ "\n"
+ ]
+ },
+ {
+ "name": "stderr",
+ "output_type": "stream",
+ "text": [
+ "train minibatch loop: 100%|██████████| 375/375 [01:06<00:00, 5.66it/s, accuracy=0.889, accuracy_depends=0.695, cost=1.04] \n",
+ "test minibatch loop: 100%|██████████| 120/120 [00:05<00:00, 20.36it/s, accuracy=0.919, accuracy_depends=0.76, cost=0.708] \n",
+ "train minibatch loop: 0%| | 1/375 [00:00<01:05, 5.68it/s, accuracy=0.877, accuracy_depends=0.61, cost=1.42]"
+ ]
+ },
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "epoch: 2, training loss: 1.193647, training acc: 0.869602, training depends: 0.653151, valid loss: 1.071987, valid acc: 0.879075, valid depends: 0.677740\n",
+ "\n"
+ ]
+ },
+ {
+ "name": "stderr",
+ "output_type": "stream",
+ "text": [
+ "train minibatch loop: 100%|██████████| 375/375 [01:06<00:00, 5.67it/s, accuracy=0.912, accuracy_depends=0.691, cost=0.926]\n",
+ "test minibatch loop: 100%|██████████| 120/120 [00:05<00:00, 20.55it/s, accuracy=0.919, accuracy_depends=0.779, cost=0.63] \n",
+ "train minibatch loop: 0%| | 1/375 [00:00<01:05, 5.68it/s, accuracy=0.893, accuracy_depends=0.627, cost=1.16]"
+ ]
+ },
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "epoch: 3, training loss: 0.931854, training acc: 0.892288, training depends: 0.696346, valid loss: 1.005326, valid acc: 0.883707, valid depends: 0.692016\n",
+ "\n"
+ ]
+ },
+ {
+ "name": "stderr",
+ "output_type": "stream",
+ "text": [
+ "train minibatch loop: 100%|██████████| 375/375 [01:06<00:00, 5.67it/s, accuracy=0.914, accuracy_depends=0.739, cost=0.762]\n",
+ "test minibatch loop: 100%|██████████| 120/120 [00:05<00:00, 20.45it/s, accuracy=0.912, accuracy_depends=0.799, cost=0.51] \n",
+ "train minibatch loop: 0%| | 1/375 [00:00<01:06, 5.66it/s, accuracy=0.889, accuracy_depends=0.654, cost=1.01]"
+ ]
+ },
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "epoch: 4, training loss: 0.777697, training acc: 0.901257, training depends: 0.721131, valid loss: 0.964560, valid acc: 0.877505, valid depends: 0.701398\n",
+ "\n"
+ ]
+ },
+ {
+ "name": "stderr",
+ "output_type": "stream",
+ "text": [
+ "train minibatch loop: 100%|██████████| 375/375 [01:06<00:00, 5.67it/s, accuracy=0.913, accuracy_depends=0.755, cost=0.638]\n",
+ "test minibatch loop: 100%|██████████| 120/120 [00:05<00:00, 20.68it/s, accuracy=0.912, accuracy_depends=0.812, cost=0.492]\n",
+ "train minibatch loop: 0%| | 1/375 [00:00<01:04, 5.79it/s, accuracy=0.893, accuracy_depends=0.659, cost=0.932]"
+ ]
+ },
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "epoch: 5, training loss: 0.668966, training acc: 0.901568, training depends: 0.741328, valid loss: 0.928792, valid acc: 0.891125, valid depends: 0.710297\n",
+ "\n"
+ ]
+ },
+ {
+ "name": "stderr",
+ "output_type": "stream",
+ "text": [
+ "train minibatch loop: 100%|██████████| 375/375 [01:06<00:00, 5.67it/s, accuracy=0.913, accuracy_depends=0.751, cost=0.616]\n",
+ "test minibatch loop: 100%|██████████| 120/120 [00:05<00:00, 20.45it/s, accuracy=0.919, accuracy_depends=0.825, cost=0.394]\n",
+ "train minibatch loop: 0%| | 1/375 [00:00<01:06, 5.67it/s, accuracy=0.896, accuracy_depends=0.707, cost=0.809]"
+ ]
+ },
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "epoch: 6, training loss: 0.594000, training acc: 0.913134, training depends: 0.754423, valid loss: 0.943845, valid acc: 0.888382, valid depends: 0.713479\n",
+ "\n"
+ ]
+ },
+ {
+ "name": "stderr",
+ "output_type": "stream",
+ "text": [
+ "train minibatch loop: 100%|██████████| 375/375 [01:06<00:00, 5.67it/s, accuracy=0.927, accuracy_depends=0.776, cost=0.537] \n",
+ "test minibatch loop: 100%|██████████| 120/120 [00:05<00:00, 20.37it/s, accuracy=0.935, accuracy_depends=0.808, cost=0.534]\n",
+ "train minibatch loop: 0%| | 1/375 [00:00<01:06, 5.61it/s, accuracy=0.909, accuracy_depends=0.709, cost=0.77]"
+ ]
+ },
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "epoch: 7, training loss: 0.538314, training acc: 0.920553, training depends: 0.764744, valid loss: 0.930650, valid acc: 0.903622, valid depends: 0.718959\n",
+ "\n"
+ ]
+ },
+ {
+ "name": "stderr",
+ "output_type": "stream",
+ "text": [
+ "train minibatch loop: 100%|██████████| 375/375 [01:06<00:00, 5.66it/s, accuracy=0.935, accuracy_depends=0.781, cost=0.505]\n",
+ "test minibatch loop: 100%|██████████| 120/120 [00:05<00:00, 20.38it/s, accuracy=0.938, accuracy_depends=0.821, cost=0.457]\n",
+ "train minibatch loop: 0%| | 1/375 [00:00<01:07, 5.53it/s, accuracy=0.915, accuracy_depends=0.711, cost=0.767]"
+ ]
+ },
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "epoch: 8, training loss: 0.486278, training acc: 0.927081, training depends: 0.774812, valid loss: 0.932128, valid acc: 0.904604, valid depends: 0.722158\n",
+ "\n"
+ ]
+ },
+ {
+ "name": "stderr",
+ "output_type": "stream",
+ "text": [
+ "train minibatch loop: 100%|██████████| 375/375 [01:06<00:00, 5.67it/s, accuracy=0.925, accuracy_depends=0.787, cost=0.485] \n",
+ "test minibatch loop: 100%|██████████| 120/120 [00:05<00:00, 20.50it/s, accuracy=0.958, accuracy_depends=0.825, cost=0.524]\n",
+ "train minibatch loop: 0%| | 1/375 [00:00<01:06, 5.61it/s, accuracy=0.924, accuracy_depends=0.735, cost=0.633]"
+ ]
+ },
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "epoch: 9, training loss: 0.447538, training acc: 0.931575, training depends: 0.781835, valid loss: 0.943356, valid acc: 0.905484, valid depends: 0.722892\n",
+ "\n"
+ ]
+ },
+ {
+ "name": "stderr",
+ "output_type": "stream",
+ "text": [
+ "train minibatch loop: 100%|██████████| 375/375 [01:06<00:00, 5.66it/s, accuracy=0.942, accuracy_depends=0.806, cost=0.424] \n",
+ "test minibatch loop: 100%|██████████| 120/120 [00:05<00:00, 20.55it/s, accuracy=0.935, accuracy_depends=0.815, cost=0.496]\n",
+ "train minibatch loop: 0%| | 1/375 [00:00<01:06, 5.67it/s, accuracy=0.896, accuracy_depends=0.748, cost=0.611]"
+ ]
+ },
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "epoch: 10, training loss: 0.413205, training acc: 0.932623, training depends: 0.789132, valid loss: 0.954858, valid acc: 0.903540, valid depends: 0.724419\n",
+ "\n"
+ ]
+ },
+ {
+ "name": "stderr",
+ "output_type": "stream",
+ "text": [
+ "train minibatch loop: 100%|██████████| 375/375 [01:06<00:00, 5.66it/s, accuracy=0.943, accuracy_depends=0.788, cost=0.442] \n",
+ "test minibatch loop: 100%|██████████| 120/120 [00:05<00:00, 20.28it/s, accuracy=0.945, accuracy_depends=0.795, cost=0.602]\n",
+ "train minibatch loop: 0%| | 1/375 [00:00<01:05, 5.69it/s, accuracy=0.92, accuracy_depends=0.761, cost=0.558]"
+ ]
+ },
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "epoch: 11, training loss: 0.389162, training acc: 0.934991, training depends: 0.793624, valid loss: 0.962155, valid acc: 0.910515, valid depends: 0.726305\n",
+ "\n"
+ ]
+ },
+ {
+ "name": "stderr",
+ "output_type": "stream",
+ "text": [
+ "train minibatch loop: 100%|██████████| 375/375 [01:06<00:00, 5.65it/s, accuracy=0.943, accuracy_depends=0.806, cost=0.433] \n",
+ "test minibatch loop: 100%|██████████| 120/120 [00:05<00:00, 20.37it/s, accuracy=0.942, accuracy_depends=0.828, cost=0.454]\n",
+ "train minibatch loop: 0%| | 1/375 [00:00<01:06, 5.67it/s, accuracy=0.919, accuracy_depends=0.759, cost=0.538]"
+ ]
+ },
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "epoch: 12, training loss: 0.368160, training acc: 0.940245, training depends: 0.797881, valid loss: 0.978189, valid acc: 0.906123, valid depends: 0.726453\n",
+ "\n"
+ ]
+ },
+ {
+ "name": "stderr",
+ "output_type": "stream",
+ "text": [
+ "train minibatch loop: 100%|██████████| 375/375 [01:06<00:00, 5.66it/s, accuracy=0.942, accuracy_depends=0.807, cost=0.404] \n",
+ "test minibatch loop: 100%|██████████| 120/120 [00:05<00:00, 20.48it/s, accuracy=0.951, accuracy_depends=0.844, cost=0.43] \n",
+ "train minibatch loop: 0%| | 1/375 [00:00<01:06, 5.63it/s, accuracy=0.934, accuracy_depends=0.759, cost=0.563]"
+ ]
+ },
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "epoch: 13, training loss: 0.356211, training acc: 0.941396, training depends: 0.800658, valid loss: 0.964498, valid acc: 0.910670, valid depends: 0.727217\n",
+ "\n"
+ ]
+ },
+ {
+ "name": "stderr",
+ "output_type": "stream",
+ "text": [
+ "train minibatch loop: 100%|██████████| 375/375 [01:06<00:00, 5.67it/s, accuracy=0.943, accuracy_depends=0.814, cost=0.378] \n",
+ "test minibatch loop: 100%|██████████| 120/120 [00:05<00:00, 20.28it/s, accuracy=0.945, accuracy_depends=0.805, cost=0.468]"
+ ]
+ },
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "epoch: 14, training loss: 0.346428, training acc: 0.943538, training depends: 0.802292, valid loss: 0.971327, valid acc: 0.908659, valid depends: 0.727845\n",
+ "\n"
+ ]
+ },
+ {
+ "name": "stderr",
+ "output_type": "stream",
+ "text": [
+ "\n"
+ ]
+ }
+ ],
+ "source": [
+ "from tqdm import tqdm\n",
+ "\n",
+ "batch_size = 32\n",
+ "epoch = 15\n",
+ "\n",
+ "for e in range(epoch):\n",
+ " train_acc, train_loss = [], []\n",
+ " test_acc, test_loss = [], []\n",
+ " train_acc_depends, test_acc_depends = [], []\n",
+ " \n",
+ " pbar = tqdm(\n",
+ " range(0, len(train_X), batch_size), desc = 'train minibatch loop'\n",
+ " )\n",
+ " for i in pbar:\n",
+ " index = min(i + batch_size, len(train_X))\n",
+ " batch_x = train_X[i: index]\n",
+ " batch_x = pad_sequences(batch_x,padding='post')\n",
+ " batch_y = train_Y[i: index]\n",
+ " batch_y = pad_sequences(batch_y,padding='post')\n",
+ " batch_depends = train_depends[i: index]\n",
+ " batch_depends = pad_sequences(batch_depends,padding='post')\n",
+ " \n",
+ " acc_depends, acc, cost, _ = sess.run(\n",
+ " [model.accuracy_depends, model.accuracy, model.cost, model.optimizer],\n",
+ " feed_dict = {\n",
+ " model.words: batch_x,\n",
+ " model.types: batch_y,\n",
+ " model.heads: batch_depends\n",
+ " },\n",
+ " )\n",
+ " train_loss.append(cost)\n",
+ " train_acc.append(acc)\n",
+ " train_acc_depends.append(acc_depends)\n",
+ " pbar.set_postfix(cost = cost, accuracy = acc, accuracy_depends = acc_depends)\n",
+ " \n",
+ " pbar = tqdm(\n",
+ " range(0, len(test_X), batch_size), desc = 'test minibatch loop'\n",
+ " )\n",
+ " for i in pbar:\n",
+ " index = min(i + batch_size, len(test_X))\n",
+ " batch_x = test_X[i: index]\n",
+ " batch_x = pad_sequences(batch_x,padding='post')\n",
+ " batch_y = test_Y[i: index]\n",
+ " batch_y = pad_sequences(batch_y,padding='post')\n",
+ " batch_depends = test_depends[i: index]\n",
+ " batch_depends = pad_sequences(batch_depends,padding='post')\n",
+ " \n",
+ " acc_depends, acc, cost = sess.run(\n",
+ " [model.accuracy_depends, model.accuracy, model.cost],\n",
+ " feed_dict = {\n",
+ " model.words: batch_x,\n",
+ " model.types: batch_y,\n",
+ " model.heads: batch_depends\n",
+ " },\n",
+ " )\n",
+ " test_loss.append(cost)\n",
+ " test_acc.append(acc)\n",
+ " test_acc_depends.append(acc_depends)\n",
+ " pbar.set_postfix(cost = cost, accuracy = acc, accuracy_depends = acc_depends)\n",
+ " \n",
+ " \n",
+ " print(\n",
+ " 'epoch: %d, training loss: %f, training acc: %f, training depends: %f, valid loss: %f, valid acc: %f, valid depends: %f\\n'\n",
+ " % (e, np.mean(train_loss), \n",
+ " np.mean(train_acc), \n",
+ " np.mean(train_acc_depends), \n",
+ " np.mean(test_loss), \n",
+ " np.mean(test_acc), \n",
+ " np.mean(test_acc_depends)\n",
+ " ))\n",
+ " "
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 25,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "(array([ 0, 40, 6, 22, 26, 23, 18, 16, 1, 1, 5, 3, 13, 10, 11, 6, 12,\n",
+ " 13, 10, 16, 7, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,\n",
+ " 0, 0, 0, 0, 0, 0, 0, 0]),\n",
+ " array([ 3, 2, 8, 5, 5, 2, 8, 8, -1, -1, 0, 11, 10, 8, 14, 13, 8,\n",
+ " 15, 14, 14, 8, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1,\n",
+ " -1, -1, -1, -1, -1, -1, -1, -1]),\n",
+ " array([-1, 2, 8, 5, 5, 2, 8, 8, -1, -1, 0, 11, 11, 8, 14, 14, 8,\n",
+ " 16, 14, 14, 8, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1,\n",
+ " -1, -1, -1, -1, -1, -1, -1, -1], dtype=int32))"
+ ]
+ },
+ "execution_count": 25,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "tags_seq, heads = sess.run(\n",
+ " [model.tags_seq, model.heads_seq],\n",
+ " feed_dict = {\n",
+ " model.words: batch_x,\n",
+ " },\n",
+ ")\n",
+ "tags_seq[0], heads[0] - 1, batch_depends[0] - 1"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 26,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "def evaluate(heads_pred, types_pred, heads, types, lengths,\n",
+ " symbolic_root=False, symbolic_end=False):\n",
+ " batch_size, _ = heads_pred.shape\n",
+ " ucorr = 0.\n",
+ " lcorr = 0.\n",
+ " total = 0.\n",
+ " ucomplete_match = 0.\n",
+ " lcomplete_match = 0.\n",
+ "\n",
+ " corr_root = 0.\n",
+ " total_root = 0.\n",
+ " start = 1 if symbolic_root else 0\n",
+ " end = 1 if symbolic_end else 0\n",
+ " for i in range(batch_size):\n",
+ " ucm = 1.\n",
+ " lcm = 1.\n",
+ " for j in range(start, lengths[i] - end):\n",
+ "\n",
+ " total += 1\n",
+ " if heads[i, j] == heads_pred[i, j]:\n",
+ " ucorr += 1\n",
+ " if types[i, j] == types_pred[i, j]:\n",
+ " lcorr += 1\n",
+ " else:\n",
+ " lcm = 0\n",
+ " else:\n",
+ " ucm = 0\n",
+ " lcm = 0\n",
+ "\n",
+ " if heads[i, j] == 0:\n",
+ " total_root += 1\n",
+ " corr_root += 1 if heads_pred[i, j] == 0 else 0\n",
+ "\n",
+ " ucomplete_match += ucm\n",
+ " lcomplete_match += lcm\n",
+ " \n",
+ " return ucorr / total, lcorr / total, corr_root / total_root"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 27,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "arcs, types, roots = [], [], []\n",
+ "\n",
+ "for i in range(0, len(test_X), batch_size):\n",
+ " index = min(i + batch_size, len(test_X))\n",
+ " batch_x = test_X[i: index]\n",
+ " batch_x = pad_sequences(batch_x,padding='post')\n",
+ " batch_y = test_Y[i: index]\n",
+ " batch_y = pad_sequences(batch_y,padding='post')\n",
+ " batch_depends = test_depends[i: index]\n",
+ " batch_depends = pad_sequences(batch_depends,padding='post')\n",
+ " \n",
+ " tags_seq, heads = sess.run(\n",
+ " [model.tags_seq, model.heads_seq],\n",
+ " feed_dict = {\n",
+ " model.words: batch_x,\n",
+ " },\n",
+ " )\n",
+ " \n",
+ " arc_accuracy, type_accuracy, root_accuracy = evaluate(heads - 1, tags_seq, batch_depends - 1, batch_y, \n",
+ " np.count_nonzero(batch_x, axis = 1))\n",
+ " arcs.append(arc_accuracy)\n",
+ " types.append(type_accuracy)\n",
+ " roots.append(root_accuracy)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 28,
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "arc accuracy: 0.728543873570515\n",
+ "types accuracy: 0.6711201611430444\n",
+ "root accuracy: 0.7393229166666667\n"
+ ]
+ }
+ ],
+ "source": [
+ "print('arc accuracy:', np.mean(arcs))\n",
+ "print('types accuracy:', np.mean(types))\n",
+ "print('root accuracy:', np.mean(roots))"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": []
+ }
+ ],
+ "metadata": {
+ "kernelspec": {
+ "display_name": "Python 3",
+ "language": "python",
+ "name": "python3"
+ },
+ "language_info": {
+ "codemirror_mode": {
+ "name": "ipython",
+ "version": 3
+ },
+ "file_extension": ".py",
+ "mimetype": "text/x-python",
+ "name": "python",
+ "nbconvert_exporter": "python",
+ "pygments_lexer": "ipython3",
+ "version": "3.6.8"
+ }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 2
+}
diff --git a/dependency-parser/7.stackpointer.ipynb b/dependency-parser/7.stackpointer.ipynb
new file mode 100644
index 0000000..4238bf9
--- /dev/null
+++ b/dependency-parser/7.stackpointer.ipynb
@@ -0,0 +1,1781 @@
+{
+ "cells": [
+ {
+ "cell_type": "code",
+ "execution_count": 1,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "# !wget https://raw.githubusercontent.com/UniversalDependencies/UD_English-EWT/master/en_ewt-ud-dev.conllu\n",
+ "# !wget https://raw.githubusercontent.com/UniversalDependencies/UD_English-EWT/master/en_ewt-ud-train.conllu\n",
+ "# !wget https://raw.githubusercontent.com/UniversalDependencies/UD_English-EWT/master/en_ewt-ud-test.conllu"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 2,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "import os\n",
+ "os.environ['CUDA_VISIBLE_DEVICES'] = '1'"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 3,
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stderr",
+ "output_type": "stream",
+ "text": [
+ "/home/husein/.local/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:516: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n",
+ " _np_qint8 = np.dtype([(\"qint8\", np.int8, 1)])\n",
+ "/home/husein/.local/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:517: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n",
+ " _np_quint8 = np.dtype([(\"quint8\", np.uint8, 1)])\n",
+ "/home/husein/.local/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:518: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n",
+ " _np_qint16 = np.dtype([(\"qint16\", np.int16, 1)])\n",
+ "/home/husein/.local/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:519: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n",
+ " _np_quint16 = np.dtype([(\"quint16\", np.uint16, 1)])\n",
+ "/home/husein/.local/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:520: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n",
+ " _np_qint32 = np.dtype([(\"qint32\", np.int32, 1)])\n",
+ "/home/husein/.local/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:525: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n",
+ " np_resource = np.dtype([(\"resource\", np.ubyte, 1)])\n",
+ "/home/husein/.local/lib/python3.6/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:541: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n",
+ " _np_qint8 = np.dtype([(\"qint8\", np.int8, 1)])\n",
+ "/home/husein/.local/lib/python3.6/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:542: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n",
+ " _np_quint8 = np.dtype([(\"quint8\", np.uint8, 1)])\n",
+ "/home/husein/.local/lib/python3.6/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:543: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n",
+ " _np_qint16 = np.dtype([(\"qint16\", np.int16, 1)])\n",
+ "/home/husein/.local/lib/python3.6/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:544: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n",
+ " _np_quint16 = np.dtype([(\"quint16\", np.uint16, 1)])\n",
+ "/home/husein/.local/lib/python3.6/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:545: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n",
+ " _np_qint32 = np.dtype([(\"qint32\", np.int32, 1)])\n",
+ "/home/husein/.local/lib/python3.6/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:550: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n",
+ " np_resource = np.dtype([(\"resource\", np.ubyte, 1)])\n"
+ ]
+ }
+ ],
+ "source": [
+ "import malaya\n",
+ "import re\n",
+ "from malaya.texts._text_functions import split_into_sentences\n",
+ "from malaya.texts import _regex\n",
+ "import numpy as np\n",
+ "import itertools\n",
+ "\n",
+ "tokenizer = malaya.preprocessing._tokenizer\n",
+ "splitter = split_into_sentences"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 4,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "def is_number_regex(s):\n",
+ " if re.match(\"^\\d+?\\.\\d+?$\", s) is None:\n",
+ " return s.isdigit()\n",
+ " return True\n",
+ "\n",
+ "def preprocessing(w):\n",
+ " if is_number_regex(w):\n",
+ " return ''\n",
+ " elif re.match(_regex._money, w):\n",
+ " return ''\n",
+ " elif re.match(_regex._date, w):\n",
+ " return ''\n",
+ " elif re.match(_regex._expressions['email'], w):\n",
+ " return ''\n",
+ " elif re.match(_regex._expressions['url'], w):\n",
+ " return ''\n",
+ " else:\n",
+ " w = ''.join(''.join(s)[:2] for _, s in itertools.groupby(w))\n",
+ " return w"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 5,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "({'PAD': 0,\n",
+ " 'UNK': 1,\n",
+ " '_ROOT': 2,\n",
+ " '': 3,\n",
+ " '': 4,\n",
+ " '': 5,\n",
+ " '': 6,\n",
+ " '': 7},\n",
+ " {'PAD': 0,\n",
+ " 'UNK': 1,\n",
+ " '_ROOT': 2,\n",
+ " '': 3,\n",
+ " '': 4,\n",
+ " '': 5,\n",
+ " '': 6,\n",
+ " '': 7})"
+ ]
+ },
+ "execution_count": 5,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "word2idx = {'PAD': 0,'UNK':1, '_ROOT': 2}\n",
+ "tag2idx = {'PAD': 0, '_': 1}\n",
+ "char2idx = {'PAD': 0,'UNK':1, '_ROOT': 2}\n",
+ "word_idx = 3\n",
+ "tag_idx = 2\n",
+ "char_idx = 3\n",
+ "\n",
+ "special_tokens = ['', '', '', '', '']\n",
+ "\n",
+ "for t in special_tokens:\n",
+ " word2idx[t] = word_idx\n",
+ " word_idx += 1\n",
+ " char2idx[t] = char_idx\n",
+ " char_idx += 1\n",
+ " \n",
+ "word2idx, char2idx"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 6,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "PAD = \"_PAD\"\n",
+ "PAD_POS = \"_PAD_POS\"\n",
+ "PAD_TYPE = \"_\"\n",
+ "PAD_CHAR = \"_PAD_CHAR\"\n",
+ "ROOT = \"_ROOT\"\n",
+ "ROOT_POS = \"_ROOT_POS\"\n",
+ "ROOT_TYPE = \"_\"\n",
+ "ROOT_CHAR = \"_ROOT_CHAR\"\n",
+ "END = \"_END\"\n",
+ "END_POS = \"_END_POS\"\n",
+ "END_TYPE = \"_\"\n",
+ "END_CHAR = \"_END_CHAR\"\n",
+ "\n",
+ "def process_corpus(corpus, until = None):\n",
+ " global word2idx, tag2idx, char2idx, word_idx, tag_idx, char_idx\n",
+ " sentences, words, depends, labels, pos, chars = [], [], [], [], [], []\n",
+ " temp_sentence, temp_word, temp_depend, temp_label, temp_pos = [], [], [], [], []\n",
+ " first_time = True\n",
+ " for sentence in corpus:\n",
+ " try:\n",
+ " if len(sentence):\n",
+ " if sentence[0] == '#':\n",
+ " continue\n",
+ " if first_time:\n",
+ " print(sentence)\n",
+ " first_time = False\n",
+ " sentence = sentence.split('\\t')\n",
+ " for c in sentence[1]:\n",
+ " if c not in char2idx:\n",
+ " char2idx[c] = char_idx\n",
+ " char_idx += 1\n",
+ " if sentence[7] not in tag2idx:\n",
+ " tag2idx[sentence[7]] = tag_idx\n",
+ " tag_idx += 1\n",
+ " sentence[1] = preprocessing(sentence[1])\n",
+ " if sentence[1] not in word2idx:\n",
+ " word2idx[sentence[1]] = word_idx\n",
+ " word_idx += 1\n",
+ " temp_word.append(word2idx[sentence[1]])\n",
+ " temp_depend.append(int(sentence[6]))\n",
+ " temp_label.append(tag2idx[sentence[7]])\n",
+ " temp_sentence.append(sentence[1])\n",
+ " temp_pos.append(sentence[3])\n",
+ " else:\n",
+ " if len(temp_sentence) < 2 or len(temp_word) != len(temp_label):\n",
+ " temp_word = []\n",
+ " temp_depend = []\n",
+ " temp_label = []\n",
+ " temp_sentence = []\n",
+ " temp_pos = []\n",
+ " continue\n",
+ " words.append([word2idx['_ROOT']] + temp_word)\n",
+ " depends.append([0] + temp_depend)\n",
+ " labels.append([tag2idx['_']] + temp_label)\n",
+ " sentences.append([ROOT] + temp_sentence)\n",
+ " pos.append([ROOT_POS] + temp_pos)\n",
+ " char_ = [[char2idx['_ROOT']]]\n",
+ " for w in temp_sentence:\n",
+ " if w in char2idx:\n",
+ " char_.append([char2idx[w]])\n",
+ " else:\n",
+ " char_.append([char2idx[c] for c in w])\n",
+ " chars.append(char_)\n",
+ " temp_word = []\n",
+ " temp_depend = []\n",
+ " temp_label = []\n",
+ " temp_sentence = []\n",
+ " temp_pos = []\n",
+ " except Exception as e:\n",
+ " print(e, sentence)\n",
+ " return sentences[:-1], words[:-1], depends[:-1], labels[:-1], pos[:-1], chars[:-1]"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 7,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "def _obtain_child_index_for_left2right(heads):\n",
+ " child_ids = [[] for _ in range(len(heads))]\n",
+ " # skip the symbolic root.\n",
+ " for child in range(1, len(heads)):\n",
+ " head = heads[child]\n",
+ " child_ids[head].append(child)\n",
+ " return child_ids\n",
+ "\n",
+ "\n",
+ "def _obtain_child_index_for_inside_out(heads):\n",
+ " child_ids = [[] for _ in range(len(heads))]\n",
+ " for head in range(len(heads)):\n",
+ " # first find left children inside-out\n",
+ " for child in reversed(range(1, head)):\n",
+ " if heads[child] == head:\n",
+ " child_ids[head].append(child)\n",
+ " # second find right children inside-out\n",
+ " for child in range(head + 1, len(heads)):\n",
+ " if heads[child] == head:\n",
+ " child_ids[head].append(child)\n",
+ " return child_ids\n",
+ "\n",
+ "\n",
+ "def _obtain_child_index_for_depth(heads, reverse):\n",
+ " def calc_depth(head):\n",
+ " children = child_ids[head]\n",
+ " max_depth = 0\n",
+ " for child in children:\n",
+ " depth = calc_depth(child)\n",
+ " child_with_depth[head].append((child, depth))\n",
+ " max_depth = max(max_depth, depth + 1)\n",
+ " child_with_depth[head] = sorted(child_with_depth[head], key=lambda x: x[1], reverse=reverse)\n",
+ " return max_depth\n",
+ "\n",
+ " child_ids = _obtain_child_index_for_left2right(heads)\n",
+ " child_with_depth = [[] for _ in range(len(heads))]\n",
+ " calc_depth(0)\n",
+ " return [[child for child, depth in child_with_depth[head]] for head in range(len(heads))]\n",
+ "\n",
+ "\n",
+ "def _generate_stack_inputs(heads, types, prior_order):\n",
+ " if prior_order == 'deep_first':\n",
+ " child_ids = _obtain_child_index_for_depth(heads, True)\n",
+ " elif prior_order == 'shallow_first':\n",
+ " child_ids = _obtain_child_index_for_depth(heads, False)\n",
+ " elif prior_order == 'left2right':\n",
+ " child_ids = _obtain_child_index_for_left2right(heads)\n",
+ " elif prior_order == 'inside_out':\n",
+ " child_ids = _obtain_child_index_for_inside_out(heads)\n",
+ " else:\n",
+ " raise ValueError('Unknown prior order: %s' % prior_order)\n",
+ "\n",
+ " stacked_heads = []\n",
+ " children = []\n",
+ " siblings = []\n",
+ " stacked_types = []\n",
+ " skip_connect = []\n",
+ " prev = [0 for _ in range(len(heads))]\n",
+ " sibs = [0 for _ in range(len(heads))]\n",
+ " stack = [0]\n",
+ " position = 1\n",
+ " while len(stack) > 0:\n",
+ " head = stack[-1]\n",
+ " stacked_heads.append(head)\n",
+ " siblings.append(sibs[head])\n",
+ " child_id = child_ids[head]\n",
+ " skip_connect.append(prev[head])\n",
+ " prev[head] = position\n",
+ " if len(child_id) == 0:\n",
+ " children.append(head)\n",
+ " sibs[head] = 0\n",
+ " stacked_types.append(tag2idx['PAD'])\n",
+ " stack.pop()\n",
+ " else:\n",
+ " child = child_id.pop(0)\n",
+ " children.append(child)\n",
+ " sibs[head] = child\n",
+ " stack.append(child)\n",
+ " stacked_types.append(types[child])\n",
+ " position += 1\n",
+ "\n",
+ " return stacked_heads, children, siblings, stacked_types, skip_connect"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 8,
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "1\tFrom\tfrom\tADP\tIN\t_\t3\tcase\t3:case\t_\n",
+ "invalid literal for int() with base 10: '_' ['10.1', 'has', 'have', 'VERB', 'VBZ', '_', '_', '_', '8:parataxis', 'CopyOf=-1']\n",
+ "invalid literal for int() with base 10: '_' ['21.1', 'has', 'have', 'VERB', 'VBZ', '_', '_', '_', '16:conj:and', 'CopyOf=-1']\n"
+ ]
+ }
+ ],
+ "source": [
+ "with open('en_ewt-ud-dev.conllu') as fopen:\n",
+ " dev = fopen.read().split('\\n')\n",
+ "\n",
+ "sentences_dev, words_dev, depends_dev, labels_dev, _, seq_dev = process_corpus(dev)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 9,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "stacked_heads_test, children_test, siblings_test, stacked_types_test = [], [], [], []\n",
+ "for i in range(len(sentences_dev)):\n",
+ " stacked_heads, children, siblings, stacked_types, _ = _generate_stack_inputs(depends_dev[i], \n",
+ " labels_dev[i], 'deep_first')\n",
+ " stacked_heads_test.append(stacked_heads)\n",
+ " children_test.append(children)\n",
+ " siblings_test.append(siblings)\n",
+ " stacked_types_test.append(stacked_types)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 10,
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "1\tWhat\twhat\tPRON\tWP\tPronType=Int\t0\troot\t0:root\t_\n",
+ "invalid literal for int() with base 10: '_' ['24.1', 'left', 'left', 'VERB', 'VBN', 'Tense=Past|VerbForm=Part', '_', '_', '6:parataxis', 'CopyOf=6']\n"
+ ]
+ }
+ ],
+ "source": [
+ "with open('en_ewt-ud-test.conllu') as fopen:\n",
+ " test = fopen.read().split('\\n')\n",
+ "\n",
+ "sentences_test, words_test, depends_test, labels_test, _, seq_test = process_corpus(test)\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 11,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "for i in range(len(sentences_test)):\n",
+ " stacked_heads, children, siblings, stacked_types, _ = _generate_stack_inputs(depends_test[i], \n",
+ " labels_test[i], 'deep_first')\n",
+ " stacked_heads_test.append(stacked_heads)\n",
+ " children_test.append(children)\n",
+ " siblings_test.append(siblings)\n",
+ " stacked_types_test.append(stacked_types)\n",
+ " \n",
+ "sentences_test.extend(sentences_dev)\n",
+ "words_test.extend(words_dev)\n",
+ "depends_test.extend(depends_dev)\n",
+ "labels_test.extend(labels_dev)\n",
+ "seq_test.extend(seq_dev)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 12,
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "1\tAl\tAl\tPROPN\tNNP\tNumber=Sing\t0\troot\t0:root\tSpaceAfter=No\n",
+ "invalid literal for int() with base 10: '_' ['8.1', 'reported', 'report', 'VERB', 'VBN', 'Tense=Past|VerbForm=Part|Voice=Pass', '_', '_', '5:conj:and', 'CopyOf=5']\n",
+ "invalid literal for int() with base 10: '_' ['22.1', 'used', 'use', 'VERB', 'VBN', 'Tense=Past|VerbForm=Part', '_', '_', '13:advcl:with|17:conj:and', 'CopyOf=17']\n",
+ "invalid literal for int() with base 10: '_' ['22.1', 'used', 'use', 'VERB', 'VBN', 'Tense=Past|VerbForm=Part', '_', '_', '13:advcl:with|17:conj:and', 'CopyOf=17']\n",
+ "invalid literal for int() with base 10: '_' ['11.1', 'called', 'call', 'VERB', 'VBN', 'Tense=Past|VerbForm=Part|Voice=Pass', '_', '_', '3:conj:and', 'CopyOf=3']\n",
+ "invalid literal for int() with base 10: '_' ['14.1', 'is', 'be', 'VERB', 'VBZ', '_', '_', '_', '1:conj:and', 'CopyOf=1']\n",
+ "invalid literal for int() with base 10: '_' ['20.1', 'reflect', 'reflect', 'VERB', 'VBP', 'Mood=Ind|Tense=Pres|VerbForm=Fin', '_', '_', '7:acl:relcl|9:conj', 'CopyOf=9']\n",
+ "invalid literal for int() with base 10: '_' ['21.1', 'recruited', 'recruit', 'VERB', 'VBD', 'Mood=Ind|Tense=Past|VerbForm=Fin', '_', '_', '9:conj:and', 'CopyOf=9']\n",
+ "invalid literal for int() with base 10: '_' ['9.1', 'wish', 'wish', 'VERB', 'VBP', 'Mood=Ind|Tense=Pres|VerbForm=Fin', '_', '_', '2:conj:and', 'CopyOf=2']\n",
+ "invalid literal for int() with base 10: '_' ['38.1', 'supplied', 'supply', 'VERB', 'VBN', 'Tense=Past|VerbForm=Part|Voice=Pass', '_', '_', '16:conj:and', 'CopyOf=16']\n",
+ "invalid literal for int() with base 10: '_' ['18.1', 'keep', 'keep', 'VERB', 'VB', 'Mood=Imp|VerbForm=Fin', '_', '_', '14:conj:and', 'CopyOf=14']\n",
+ "invalid literal for int() with base 10: '_' ['21.1', 'keep', 'keep', 'VERB', 'VB', 'Mood=Imp|VerbForm=Fin', '_', '_', '14:conj:and', 'CopyOf=14']\n",
+ "invalid literal for int() with base 10: '_' ['18.1', 'mean', 'mean', 'VERB', 'VB', 'VerbForm=Inf', '_', '_', '8:conj', 'CopyOf=8']\n",
+ "invalid literal for int() with base 10: '_' ['30.1', 'play', 'play', 'VERB', 'VBP', 'Mood=Ind|Tense=Pres|VerbForm=Fin', '_', '_', '18:acl:relcl|27:conj:but', 'CopyOf=27']\n",
+ "invalid literal for int() with base 10: '_' ['22.1', 'have', 'have', 'VERB', 'VBP', 'Mood=Ind|Tense=Pres|VerbForm=Fin', '_', '_', '17:conj', 'CopyOf=17']\n",
+ "invalid literal for int() with base 10: '_' ['27.1', 'have', 'have', 'VERB', 'VBP', 'Mood=Ind|Tense=Pres|VerbForm=Fin', '_', '_', '17:conj', 'CopyOf=17']\n",
+ "invalid literal for int() with base 10: '_' ['49.1', 'helped', 'help', 'VERB', 'VBD', '_', '_', '_', '38:conj:but', 'CopyOf=38']\n",
+ "invalid literal for int() with base 10: '_' ['7.1', 'found', 'find', 'VERB', 'VBD', 'Mood=Ind|Tense=Past|VerbForm=Fin', '_', '_', '3:conj', 'CopyOf=3']\n",
+ "invalid literal for int() with base 10: '_' ['10.1', 'excited', 'excited', 'ADJ', 'JJ', 'Degree=Pos', '_', '_', '4:advcl', 'CopyOf=4']\n",
+ "invalid literal for int() with base 10: '_' ['15.1', \"'s\", 'be', 'VERB', 'VBZ', '_', '_', '_', '2:conj:and', 'CopyOf=2']\n",
+ "invalid literal for int() with base 10: '_' ['25.1', 'took', 'take', 'VERB', 'VBD', 'Mood=Ind|Tense=Past|VerbForm=Fin', '_', '_', '17:conj:and', 'CopyOf=17']\n",
+ "invalid literal for int() with base 10: '_' ['10.1', 'loss', 'lose', 'VERB', 'VBD', 'Mood=Ind|Tense=Past|VerbForm=Fin', '_', '_', '3:conj:and', 'CopyOf=3']\n",
+ "invalid literal for int() with base 10: '_' ['11.1', 'leave', 'leave', 'VERB', 'VB', 'VerbForm=Inf', '_', '_', '7:parataxis', 'CopyOf=7']\n",
+ "invalid literal for int() with base 10: '_' ['24.1', 'charge', 'charge', 'VERB', 'VBP', 'Mood=Ind|Tense=Pres|VerbForm=Fin', '_', '_', '16:conj:and', 'CopyOf=16']\n"
+ ]
+ }
+ ],
+ "source": [
+ "with open('en_ewt-ud-train.conllu') as fopen:\n",
+ " train = fopen.read().split('\\n')\n",
+ "\n",
+ "sentences_train, words_train, depends_train, labels_train, _, _ = process_corpus(train)\n",
+ "\n",
+ "stacked_heads_train, children_train, siblings_train, stacked_types_train = [], [], [], []\n",
+ "for i in range(len(sentences_train)):\n",
+ " stacked_heads, children, siblings, stacked_types, _ = _generate_stack_inputs(depends_train[i], \n",
+ " labels_train[i], 'deep_first')\n",
+ " stacked_heads_train.append(stacked_heads)\n",
+ " children_train.append(children)\n",
+ " siblings_train.append(siblings)\n",
+ " stacked_types_train.append(stacked_types)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 13,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "(12000, 3824)"
+ ]
+ },
+ "execution_count": 13,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "len(sentences_train), len(sentences_test)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 14,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "21974"
+ ]
+ },
+ "execution_count": 14,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "idx2word = {v:k for k, v in word2idx.items()}\n",
+ "idx2tag = {v:k for k, v in tag2idx.items()}\n",
+ "len(idx2word)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 15,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "import tensorflow as tf"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 16,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "from enum import Enum\n",
+ "\n",
+ "class PriorOrder(Enum):\n",
+ " DEPTH = 0\n",
+ " INSIDE_OUT = 1\n",
+ " LEFT2RIGTH = 2\n",
+ "\n",
+ "class BiAAttention:\n",
+ " def __init__(self, input_size_encoder, input_size_decoder, num_labels):\n",
+ " self.input_size_encoder = input_size_encoder\n",
+ " self.input_size_decoder = input_size_decoder\n",
+ " self.num_labels = num_labels\n",
+ " \n",
+ " self.W_d = tf.get_variable(\"W_d\", shape=[self.num_labels, self.input_size_decoder],\n",
+ " initializer=tf.contrib.layers.xavier_initializer())\n",
+ " self.W_e = tf.get_variable(\"W_e\", shape=[self.num_labels, self.input_size_encoder],\n",
+ " initializer=tf.contrib.layers.xavier_initializer())\n",
+ " self.U = tf.get_variable(\"U\", shape=[self.num_labels, self.input_size_decoder, self.input_size_encoder],\n",
+ " initializer=tf.contrib.layers.xavier_initializer())\n",
+ " \n",
+ " def forward(self, input_d, input_e, mask_d=None, mask_e=None):\n",
+ " batch = tf.shape(input_d)[0]\n",
+ " length_decoder = tf.shape(input_d)[1]\n",
+ " length_encoder = tf.shape(input_e)[1]\n",
+ " out_d = tf.expand_dims(tf.matmul(self.W_d, tf.transpose(input_d, [0, 2, 1])), 3)\n",
+ " out_e = tf.expand_dims(tf.matmul(self.W_e, tf.transpose(input_e, [0, 2, 1])), 2)\n",
+ " output = tf.matmul(tf.expand_dims(input_d, 1), self.U)\n",
+ " output = tf.matmul(output, tf.transpose(tf.expand_dims(input_e, 1), [0, 1, 3, 2]))\n",
+ " \n",
+ " output = output + out_d + out_e\n",
+ " \n",
+ " if mask_d is not None:\n",
+ " d = tf.expand_dims(tf.expand_dims(mask_d, 1), 3)\n",
+ " e = tf.expand_dims(tf.expand_dims(mask_e, 1), 2)\n",
+ " output = output * d * e\n",
+ " \n",
+ " return output\n",
+ " \n",
+ "class BiLinear:\n",
+ " def __init__(self, left_features, right_features, out_features):\n",
+ " self.left_features = left_features\n",
+ " self.right_features = right_features\n",
+ " self.out_features = out_features\n",
+ " \n",
+ " self.U = tf.get_variable(\"U-bi\", shape=[out_features, left_features, right_features],\n",
+ " initializer=tf.contrib.layers.xavier_initializer())\n",
+ " self.W_l = tf.get_variable(\"Wl\", shape=[out_features, left_features],\n",
+ " initializer=tf.contrib.layers.xavier_initializer())\n",
+ " self.W_r = tf.get_variable(\"Wr\", shape=[out_features, right_features],\n",
+ " initializer=tf.contrib.layers.xavier_initializer())\n",
+ " \n",
+ " def forward(self, input_left, input_right):\n",
+ " left_size = tf.shape(input_left)\n",
+ " output_shape = tf.concat([left_size[:-1], [self.out_features]], axis = 0)\n",
+ " batch = tf.cast(tf.reduce_prod(left_size[:-1]), tf.int32)\n",
+ " input_left = tf.reshape(input_left, (batch, self.left_features))\n",
+ " input_right = tf.reshape(input_right, (batch, self.right_features))\n",
+ " tiled = tf.tile(tf.expand_dims(input_left, axis = 0), (self.out_features,1,1))\n",
+ " output = tf.transpose(tf.reduce_sum(tf.matmul(tiled, self.U), axis = 2))\n",
+ " output = output + tf.matmul(input_left, tf.transpose(self.W_l))\\\n",
+ " + tf.matmul(input_right, tf.transpose(self.W_r))\n",
+ " \n",
+ " return tf.reshape(output, output_shape)\n",
+ "\n",
+ "class StackPointer:\n",
+ " def __init__(self, word_dim, num_words, char_dim, num_chars, num_filters, kernel_size,\n",
+ " input_size_decoder, hidden_size, layers,\n",
+ " num_labels, arc_space, type_space):\n",
+ " \n",
+ " def cells(size, reuse=False):\n",
+ " return tf.nn.rnn_cell.LSTMCell(size,\n",
+ " initializer=tf.orthogonal_initializer(),reuse=reuse,\n",
+ " state_is_tuple=False)\n",
+ " \n",
+ " self.word_embedd = tf.Variable(tf.random_uniform([num_words, word_dim], -1, 1))\n",
+ " self.char_embedd = tf.Variable(tf.random_uniform([num_chars, char_dim], -1, 1))\n",
+ " self.conv1d = tf.layers.Conv1D(num_filters, kernel_size, 1, padding='VALID')\n",
+ " self.num_labels = num_labels\n",
+ " self.prior_order = PriorOrder.DEPTH\n",
+ " self.char_dim = char_dim\n",
+ " self.layers = layers\n",
+ " self.encoder = tf.nn.rnn_cell.MultiRNNCell([cells(hidden_size) for _ in range(layers)],\n",
+ " state_is_tuple=False)\n",
+ " self.encoder_char = tf.nn.rnn_cell.MultiRNNCell([cells(hidden_size) for _ in range(layers)],\n",
+ " state_is_tuple=False)\n",
+ " self.decoder = tf.nn.rnn_cell.MultiRNNCell([cells(hidden_size) for _ in range(layers)],\n",
+ " state_is_tuple=False)\n",
+ " self.hidden_size = hidden_size\n",
+ " self.arc_space = arc_space\n",
+ " \n",
+ " \n",
+ " self.src_dense = tf.layers.Dense(hidden_size)\n",
+ " self.hx_dense = tf.layers.Dense(hidden_size)\n",
+ "\n",
+ " self.arc_h = tf.layers.Dense(arc_space)\n",
+ " self.arc_c = tf.layers.Dense(arc_space)\n",
+ " self.attention = BiAAttention(arc_space, arc_space, 1)\n",
+ "\n",
+ " self.type_h = tf.layers.Dense(type_space)\n",
+ " self.type_c = tf.layers.Dense(type_space)\n",
+ " self.bilinear = BiLinear(type_space, type_space, self.num_labels)\n",
+ " \n",
+ " def encode(self, input_word, input_char):\n",
+ " word = tf.nn.embedding_lookup(self.word_embedd, input_word)\n",
+ " char = tf.nn.embedding_lookup(self.char_embedd, input_char)\n",
+ " s = tf.shape(char)\n",
+ " char = tf.reshape(\n",
+ " char, shape = [s[0] * s[1], s[-2], self.char_dim]\n",
+ " )\n",
+ " output, _ = tf.nn.dynamic_rnn(self.encoder_char, char, dtype = tf.float32,\n",
+ " scope = 'encoder-char')\n",
+ " output = tf.reshape(\n",
+ " output[:, -1], shape = [s[0], s[1], self.hidden_size]\n",
+ " )\n",
+ " word_embedded = tf.concat([word, output], axis = -1)\n",
+ " output, hn = tf.nn.dynamic_rnn(self.encoder, word_embedded, dtype = tf.float32,\n",
+ " scope = 'encoder')\n",
+ " return output, hn\n",
+ " \n",
+ " def decode(self, output_encoder, heads, heads_stack, siblings, hn):\n",
+ " batch = tf.shape(output_encoder)[0]\n",
+ " batch_index = tf.range(0, batch)\n",
+ " t = tf.transpose(heads_stack)\n",
+ " broadcasted = tf.broadcast_to(batch_index, tf.shape(t))\n",
+ " concatenated = tf.transpose(tf.concat([tf.expand_dims(broadcasted, axis = 0), \n",
+ " tf.expand_dims(t, axis = 0)], axis = 0))\n",
+ " src_encoding = tf.gather_nd(output_encoder, concatenated)\n",
+ " \n",
+ " mask_sibs = tf.expand_dims(tf.cast(tf.not_equal(siblings, 0), tf.float32), axis = 2)\n",
+ " t = tf.transpose(siblings)\n",
+ " broadcasted = tf.broadcast_to(batch_index, tf.shape(t))\n",
+ " concatenated = tf.transpose(tf.concat([tf.expand_dims(broadcasted, axis = 0), \n",
+ " tf.expand_dims(t, axis = 0)], axis = 0))\n",
+ " output_enc_sibling = tf.gather_nd(output_encoder, concatenated) * mask_sibs\n",
+ " src_encoding = src_encoding + output_enc_sibling\n",
+ " \n",
+ " t = tf.transpose(heads_stack)\n",
+ " broadcasted = tf.broadcast_to(batch_index, tf.shape(t))\n",
+ " concatenated = tf.transpose(tf.concat([tf.expand_dims(broadcasted, axis = 0), \n",
+ " tf.expand_dims(t, axis = 0)],axis = 0))\n",
+ " g = tf.transpose(tf.gather_nd(heads, concatenated))\n",
+ " broadcasted = tf.broadcast_to(batch_index, tf.shape(g))\n",
+ " concatenated = tf.transpose(tf.concat([tf.expand_dims(broadcasted, axis = 0), \n",
+ " tf.expand_dims(g, axis = 0)],axis = 0))\n",
+ " output_enc_gpar = tf.gather_nd(output_encoder, concatenated)\n",
+ " src_encoding = src_encoding + output_enc_gpar\n",
+ " \n",
+ " src_encoding = tf.nn.elu(self.src_dense(src_encoding))\n",
+ " output, hn = tf.nn.dynamic_rnn(self.decoder, src_encoding, dtype = tf.float32,\n",
+ " initial_state = hn,\n",
+ " scope = 'decoder')\n",
+ " return output, hn\n",
+ " \n",
+ " def loss(self, input_word, input_char, \n",
+ " heads, stacked_heads, children, siblings, stacked_types,\n",
+ " mask_e, mask_d,\n",
+ " label_smooth = 1.0):\n",
+ " \n",
+ " output_enc, hn_enc = self.encode(input_word, input_char)\n",
+ " arc_c = tf.nn.elu(self.arc_c(output_enc))\n",
+ " type_c = tf.nn.elu(self.type_c(output_enc))\n",
+ " \n",
+ " output_dec, _ = self.decode(output_enc, heads, stacked_heads, siblings, hn_enc)\n",
+ " arc_h = tf.nn.elu(self.arc_h(output_dec))\n",
+ " type_h = tf.nn.elu(self.type_h(output_dec))\n",
+ " \n",
+ " max_len_d = tf.shape(arc_h)[1]\n",
+ " \n",
+ " out_arc = tf.squeeze(self.attention.forward(arc_h, arc_c, mask_d=mask_d, mask_e=mask_e), axis = 1)\n",
+ " batch = tf.shape(arc_c)[0]\n",
+ " max_len_e = tf.shape(arc_c)[1]\n",
+ " batch_index = tf.range(0, batch)\n",
+ " \n",
+ " t = tf.transpose(children)\n",
+ " broadcasted = tf.broadcast_to(batch_index, tf.shape(t))\n",
+ " concatenated = tf.transpose(tf.concat([tf.expand_dims(broadcasted, axis = 0), \n",
+ " tf.expand_dims(t, axis = 0)], axis = 0))\n",
+ " type_c = tf.gather_nd(type_c, concatenated)\n",
+ " out_type = self.bilinear.forward(type_h, type_c)\n",
+ " print(out_arc.shape,out_type.shape)\n",
+ " \n",
+ " minus_inf = -1e8\n",
+ " minus_mask_d = (1 - mask_d) * minus_inf\n",
+ " minus_mask_e = (1 - mask_e) * minus_inf\n",
+ " \n",
+ " out_arc = out_arc + tf.expand_dims(minus_mask_d, 2) + tf.expand_dims(minus_mask_e, 1)\n",
+ " loss_arc = tf.nn.log_softmax(out_arc, axis = 2)\n",
+ " loss_type = tf.nn.log_softmax(out_type, axis = 2)\n",
+ " coverage = tf.cumsum(tf.exp(loss_arc), axis = 1)\n",
+ " \n",
+ " mask_leaf = tf.cast(tf.equal(children, stacked_heads), tf.float32)\n",
+ " mask_non_leaf = (1.0 - mask_leaf)\n",
+ " \n",
+ " mask_d_2 = tf.expand_dims(mask_d, 2)\n",
+ " mask_e_1 = tf.expand_dims(mask_e, 1)\n",
+ " \n",
+ " loss_arc = loss_arc * mask_d_2 * mask_e_1\n",
+ " coverage = coverage * mask_d_2 * mask_e_1\n",
+ " loss_type = loss_type * mask_d_2\n",
+ " mask_leaf = mask_leaf * mask_d\n",
+ " mask_non_leaf = mask_non_leaf * mask_d\n",
+ " num_leaf = tf.reduce_sum(mask_leaf)\n",
+ " num_non_leaf = tf.reduce_sum(mask_non_leaf)\n",
+ " head_index = tf.tile(tf.expand_dims(tf.range(0, max_len_d), 1), [1, batch])\n",
+ " \n",
+ " t = tf.transpose(children)\n",
+ " broadcasted = tf.broadcast_to(batch_index, tf.shape(t))\n",
+ " concatenated = tf.transpose(tf.concat([tf.expand_dims(broadcasted, axis = 0),\n",
+ " tf.expand_dims(head_index, axis = 0),\n",
+ " tf.expand_dims(t, axis = 0)], axis = 0))\n",
+ " loss_arc = tf.gather_nd(loss_arc, concatenated)\n",
+ " \n",
+ " t = tf.transpose(stacked_types)\n",
+ " broadcasted = tf.broadcast_to(batch_index, tf.shape(t))\n",
+ " concatenated = tf.transpose(tf.concat([tf.expand_dims(broadcasted, axis = 0),\n",
+ " tf.expand_dims(head_index, axis = 0),\n",
+ " tf.expand_dims(t, axis = 0)], axis = 0))\n",
+ " loss_type = tf.gather_nd(loss_type, concatenated)\n",
+ " \n",
+ " loss_arc_leaf = loss_arc * mask_leaf\n",
+ " loss_arc_non_leaf = loss_arc * mask_non_leaf\n",
+ "\n",
+ " loss_type_leaf = loss_type * mask_leaf\n",
+ " loss_type_non_leaf = loss_type * mask_non_leaf\n",
+ " \n",
+ " loss_cov = tf.clip_by_value(coverage - 2.0, 0.0, 100.0)\n",
+ " \n",
+ " return (tf.reduce_sum(-loss_arc_leaf) / num_leaf, \n",
+ " tf.reduce_sum(-loss_arc_non_leaf) / num_non_leaf,\n",
+ " tf.reduce_sum(-loss_type_leaf) / num_leaf, \n",
+ " tf.reduce_sum(-loss_type_non_leaf) / num_non_leaf,\n",
+ " tf.reduce_sum(loss_cov) / (num_leaf + num_non_leaf), \n",
+ " num_leaf, \n",
+ " num_non_leaf)\n",
+ " \n",
+ "class Model:\n",
+ " def __init__(self, learning_rate = 1e-3, cov = 0.0):\n",
+ " self.stackpointer = StackPointer(word_dim = 128, \n",
+ " num_words = len(word2idx), \n",
+ " char_dim = 128, \n",
+ " num_chars = len(char2idx), \n",
+ " num_filters = 128, \n",
+ " kernel_size = 3,\n",
+ " input_size_decoder = 256, \n",
+ " hidden_size = 256, \n",
+ " layers = 1,\n",
+ " num_labels = len(tag2idx), \n",
+ " arc_space = 128, \n",
+ " type_space = 128)\n",
+ " self.words = tf.placeholder(tf.int32, (None, None))\n",
+ " self.chars = tf.placeholder(tf.int32, (None, None, None))\n",
+ " self.heads = tf.placeholder(tf.int32, (None, None))\n",
+ " self.stacked_heads = tf.placeholder(tf.int32, (None, None))\n",
+ " self.siblings = tf.placeholder(tf.int32, (None, None))\n",
+ " self.childrens = tf.placeholder(tf.int32, (None, None))\n",
+ " self.stacked_types = tf.placeholder(tf.int32, (None, None))\n",
+ " self.mask_e = tf.placeholder(tf.float32, (None, None))\n",
+ " self.mask_d = tf.placeholder(tf.float32, (None, None))\n",
+ " loss_arc_leaf, loss_arc_non_leaf, \\\n",
+ " loss_type_leaf, loss_type_non_leaf, \\\n",
+ " loss_cov, num_leaf, num_non_leaf = self.stackpointer.loss(self.words, self.chars, self.heads, \n",
+ " self.stacked_heads, self.childrens, \n",
+ " self.siblings, self.stacked_types,\n",
+ " self.mask_e, self.mask_d)\n",
+ " loss_arc = loss_arc_leaf + loss_arc_non_leaf\n",
+ " loss_type = loss_type_leaf + loss_type_non_leaf\n",
+ " self.cost = loss_arc + loss_type + cov * loss_cov\n",
+ " self.optimizer = tf.train.AdamOptimizer(learning_rate = learning_rate).minimize(self.cost)\n",
+ " \n",
+ " self.encode_output, self.encode_hidden = self.stackpointer.encode(self.words, self.chars)\n",
+ " self.encode_arc_c = tf.nn.elu(self.stackpointer.arc_c(self.encode_output))\n",
+ " self.type_c = tf.nn.elu(self.stackpointer.type_c(self.encode_output))\n",
+ " \n",
+ " self.src_encoding = tf.placeholder(tf.float32, (None, self.stackpointer.hidden_size))\n",
+ " self.arc_c = tf.placeholder(tf.float32, (None, self.stackpointer.arc_space))\n",
+ " self.hx = tf.placeholder(tf.float32, (None, \n",
+ " self.stackpointer.hidden_size * 2 * self.stackpointer.layers)) \n",
+ " \n",
+ " src_encoding = tf.nn.elu(self.stackpointer.src_dense(self.src_encoding))\n",
+ " output_dec, hx = self.stackpointer.decoder(src_encoding, self.hx)\n",
+ " arc_h = tf.nn.elu(self.stackpointer.arc_h(tf.expand_dims(output_dec, axis = 1)))\n",
+ " type_h = tf.nn.elu(self.stackpointer.type_h(output_dec))\n",
+ " out_arc = self.stackpointer.attention.forward(arc_h, tf.expand_dims(self.arc_c, 0))\n",
+ " out_arc = tf.squeeze(tf.squeeze(out_arc, axis = 1), axis = 1)\n",
+ " self.hyp_scores = tf.nn.log_softmax(out_arc, axis = 1)\n",
+ " self.type_h = type_h\n",
+ " self.decode_hidden = hx\n",
+ " \n",
+ " self.holder_type_h = tf.placeholder(tf.float32, (None, self.stackpointer.arc_space))\n",
+ " self.holder_type_c = tf.placeholder(tf.float32, (None, self.stackpointer.arc_space))\n",
+ " \n",
+ " out_type = self.stackpointer.bilinear.forward(self.holder_type_h, self.holder_type_c)\n",
+ " self.hyp_type_scores = tf.nn.log_softmax(out_type, axis = 1)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 17,
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "WARNING:tensorflow:From :73: LSTMCell.__init__ (from tensorflow.python.ops.rnn_cell_impl) is deprecated and will be removed in a future version.\n",
+ "Instructions for updating:\n",
+ "This class is equivalent as tf.keras.layers.LSTMCell, and will be replaced by that in Tensorflow 2.0.\n",
+ "WARNING:tensorflow:: Using a concatenated state is slower and will soon be deprecated. Use state_is_tuple=True.\n",
+ "WARNING:tensorflow:From :83: MultiRNNCell.__init__ (from tensorflow.python.ops.rnn_cell_impl) is deprecated and will be removed in a future version.\n",
+ "Instructions for updating:\n",
+ "This class is equivalent as tf.keras.layers.StackedRNNCells, and will be replaced by that in Tensorflow 2.0.\n",
+ "WARNING:tensorflow:: Using a concatenated state is slower and will soon be deprecated. Use state_is_tuple=True.\n",
+ "WARNING:tensorflow:: Using a concatenated state is slower and will soon be deprecated. Use state_is_tuple=True.\n",
+ "WARNING:tensorflow:From :111: dynamic_rnn (from tensorflow.python.ops.rnn) is deprecated and will be removed in a future version.\n",
+ "Instructions for updating:\n",
+ "Please use `keras.layers.RNN(cell)`, which is equivalent to this API\n",
+ "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/tensorflow/python/ops/rnn_cell_impl.py:961: calling Zeros.__init__ (from tensorflow.python.ops.init_ops) with dtype is deprecated and will be removed in a future version.\n",
+ "Instructions for updating:\n",
+ "Call initializer instance with the dtype argument instead of passing it to the constructor\n",
+ "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/tensorflow/python/ops/init_ops.py:1251: calling VarianceScaling.__init__ (from tensorflow.python.ops.init_ops) with dtype is deprecated and will be removed in a future version.\n",
+ "Instructions for updating:\n",
+ "Call initializer instance with the dtype argument instead of passing it to the constructor\n",
+ "(?, ?, ?) (?, ?, 52)\n",
+ "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/tensorflow/python/ops/math_grad.py:1250: add_dispatch_support..wrapper (from tensorflow.python.ops.array_ops) is deprecated and will be removed in a future version.\n",
+ "Instructions for updating:\n",
+ "Use tf.where in 2.0, which has the same broadcast rule as np.where\n"
+ ]
+ }
+ ],
+ "source": [
+ "tf.reset_default_graph()\n",
+ "sess = tf.InteractiveSession()\n",
+ "model = Model()\n",
+ "sess.run(tf.global_variables_initializer())"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 18,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "train_X = words_train\n",
+ "train_Y = labels_train\n",
+ "train_depends = depends_train\n",
+ "train_char = sentences_train\n",
+ "\n",
+ "test_X = words_test\n",
+ "test_Y = labels_test\n",
+ "test_depends = depends_test\n",
+ "test_char = sentences_test"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 19,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "prior_order = model.stackpointer.prior_order\n",
+ "\n",
+ "def decode_sentence(output_enc, arc_c, type_c, hx, beam, length, ordered, leading_symbolic):\n",
+ " def valid_hyp(base_id, child_id, head):\n",
+ " if constraints[base_id, child_id]:\n",
+ " return False\n",
+ " elif not ordered or prior_order == PriorOrder.DEPTH or child_orders[base_id, head] == 0:\n",
+ " return True\n",
+ " elif prior_order == PriorOrder.LEFT2RIGTH:\n",
+ " return child_id > child_orders[base_id, head]\n",
+ " else:\n",
+ " if child_id < head:\n",
+ " return child_id < child_orders[base_id, head] < head\n",
+ " else:\n",
+ " return child_id > child_orders[base_id, head]\n",
+ " \n",
+ " length = output_enc.shape[0] if length is None else length\n",
+ " \n",
+ " stacked_heads = [[0] for _ in range(beam)]\n",
+ " grand_parents = [[0] for _ in range(beam)]\n",
+ " siblings = [[0] for _ in range(beam)]\n",
+ " children = np.zeros((beam, 2 * length - 1))\n",
+ " stacked_types = np.zeros((beam, 2 * length - 1))\n",
+ " \n",
+ " children = np.zeros((beam, 2 * length - 1))\n",
+ " stacked_types = np.zeros((beam, 2 * length - 1))\n",
+ " hypothesis_scores = [0]\n",
+ " constraints = np.zeros([beam, length], dtype=np.bool)\n",
+ " constraints[:, 0] = True\n",
+ " child_orders = np.zeros([beam, length], dtype=np.int64)\n",
+ "\n",
+ " new_stacked_heads = [[] for _ in range(beam)]\n",
+ " new_grand_parents = [[] for _ in range(beam)]\n",
+ " new_siblings = [[] for _ in range(beam)]\n",
+ " new_skip_connects = [[] for _ in range(beam)]\n",
+ " new_children = np.zeros((beam, 2 * length - 1))\n",
+ " new_stacked_types = np.zeros((beam, 2 * length - 1))\n",
+ " num_hyp = 1\n",
+ " num_step = 2 * length - 1\n",
+ " for t in range(num_step):\n",
+ " heads = np.array([stacked_heads[i][-1] for i in range(num_hyp)])\n",
+ " gpars = np.array([grand_parents[i][-1] for i in range(num_hyp)])\n",
+ " sibs = np.array([siblings[i].pop() for i in range(num_hyp)])\n",
+ " src_encoding = output_enc[heads]\n",
+ " mask_sibs = np.expand_dims((np.array(sibs) != 0).astype(np.float32), axis = 1)\n",
+ " output_enc_sibling = output_enc[sibs] * mask_sibs\n",
+ " src_encoding = src_encoding + output_enc_sibling\n",
+ " output_enc_gpar = output_enc[gpars]\n",
+ " src_encoding = src_encoding + output_enc_gpar\n",
+ " hyp_scores, type_h, hx = sess.run([model.hyp_scores, model.type_h, model.decode_hidden],\n",
+ " feed_dict = {model.src_encoding: src_encoding,\n",
+ " model.arc_c: arc_c,\n",
+ " model.hx: hx})\n",
+ " \n",
+ " new_hypothesis_scores = np.expand_dims(hypothesis_scores[:num_hyp], axis = 1) + hyp_scores\n",
+ " new_hypothesis_scores = new_hypothesis_scores.reshape((-1))\n",
+ " hyp_index = np.argsort(new_hypothesis_scores)[::-1]\n",
+ " new_hypothesis_scores = np.sort(new_hypothesis_scores)[::-1]\n",
+ " base_index = (hyp_index // length)\n",
+ " child_index = hyp_index % length\n",
+ " cc = 0\n",
+ " ids = []\n",
+ " new_constraints = np.zeros([beam, length], dtype=np.bool)\n",
+ " new_child_orders = np.zeros([beam, length], dtype=np.int64)\n",
+ " for id_ in range(num_hyp * length):\n",
+ " base_id = base_index[id_]\n",
+ " if base_id:\n",
+ " ids.append(id_)\n",
+ " continue\n",
+ " child_id = child_index[id_]\n",
+ " head = heads[base_id]\n",
+ " new_hyp_score = new_hypothesis_scores[id_]\n",
+ " if child_id == head:\n",
+ " if head != 0 or t + 1 == num_step:\n",
+ " new_constraints[cc] = constraints[base_id]\n",
+ " new_child_orders[cc] = child_orders[base_id]\n",
+ "\n",
+ " new_stacked_heads[cc] = [stacked_heads[base_id][i] for i in range(len(stacked_heads[base_id]))]\n",
+ " new_stacked_heads[cc].pop()\n",
+ "\n",
+ " new_grand_parents[cc] = [grand_parents[base_id][i] for i in range(len(grand_parents[base_id]))]\n",
+ " new_grand_parents[cc].pop()\n",
+ "\n",
+ " new_siblings[cc] = [siblings[base_id][i] for i in range(len(siblings[base_id]))]\n",
+ "\n",
+ " new_children[cc] = children[base_id]\n",
+ " new_children[cc, t] = child_id\n",
+ "\n",
+ " hypothesis_scores[cc] = new_hyp_score\n",
+ " ids.append(id_)\n",
+ " cc += 1\n",
+ " elif valid_hyp(base_id, child_id, head):\n",
+ " new_constraints[cc] = constraints[base_id]\n",
+ " new_constraints[cc, child_id] = True\n",
+ "\n",
+ " new_child_orders[cc] = child_orders[base_id]\n",
+ " new_child_orders[cc, head] = child_id\n",
+ "\n",
+ " new_stacked_heads[cc] = [stacked_heads[base_id][i] for i in range(len(stacked_heads[base_id]))]\n",
+ " new_stacked_heads[cc].append(child_id)\n",
+ "\n",
+ " new_grand_parents[cc] = [grand_parents[base_id][i] for i in range(len(grand_parents[base_id]))]\n",
+ " new_grand_parents[cc].append(head)\n",
+ "\n",
+ " new_siblings[cc] = [siblings[base_id][i] for i in range(len(siblings[base_id]))]\n",
+ " new_siblings[cc].append(child_id)\n",
+ " new_siblings[cc].append(0)\n",
+ "\n",
+ " new_children[cc] = children[base_id]\n",
+ " new_children[cc, t] = child_id\n",
+ "\n",
+ " hypothesis_scores[cc] = new_hyp_score\n",
+ " ids.append(id_)\n",
+ " cc += 1\n",
+ " \n",
+ " if cc == beam:\n",
+ " break\n",
+ " \n",
+ " num_hyp = len(ids)\n",
+ " if num_hyp == 0:\n",
+ " return None\n",
+ " else:\n",
+ " index = np.array(ids)\n",
+ " base_index = base_index[index]\n",
+ " child_index = child_index[index]\n",
+ " hyp_type_scores = sess.run(model.hyp_type_scores,\n",
+ " feed_dict = {\n",
+ " model.holder_type_h: type_h[base_index],\n",
+ " model.holder_type_c: type_c[child_index]\n",
+ " })\n",
+ " hyp_types = np.argmax(hyp_type_scores, axis = 1)\n",
+ " hyp_type_scores = np.max(hyp_type_scores, axis = 1)\n",
+ " hypothesis_scores[:num_hyp] = hypothesis_scores[:num_hyp] + hyp_type_scores\n",
+ "\n",
+ " for i in range(num_hyp):\n",
+ " base_id = base_index[i]\n",
+ " new_stacked_types[i] = stacked_types[base_id]\n",
+ " new_stacked_types[i, t] = hyp_types[i]\n",
+ "\n",
+ " stacked_heads = [[new_stacked_heads[i][j] for j in range(len(new_stacked_heads[i]))] for i in range(num_hyp)]\n",
+ " grand_parents = [[new_grand_parents[i][j] for j in range(len(new_grand_parents[i]))] for i in range(num_hyp)]\n",
+ " siblings = [[new_siblings[i][j] for j in range(len(new_siblings[i]))] for i in range(num_hyp)]\n",
+ " constraints = new_constraints\n",
+ " child_orders = new_child_orders\n",
+ " children = np.copy(new_children)\n",
+ " stacked_types = np.copy(new_stacked_types)\n",
+ " \n",
+ " children = children[0].astype(np.int32)\n",
+ " stacked_types = stacked_types[0].astype(np.int32)\n",
+ " heads = np.zeros(length, dtype=np.int32)\n",
+ " types = np.zeros(length, dtype=np.int32)\n",
+ " stack = [0]\n",
+ " for i in range(num_step):\n",
+ " head = stack[-1]\n",
+ " child = children[i]\n",
+ " type_ = stacked_types[i]\n",
+ " if child != head:\n",
+ " heads[child] = head\n",
+ " types[child] = type_\n",
+ " stack.append(child)\n",
+ " else:\n",
+ " stacked_types[i] = 0\n",
+ " stack.pop()\n",
+ "\n",
+ " return heads, types, length, children, stacked_types \n",
+ " \n",
+ "def decode(input_word, input_char, length = None, beam = 1, leading_symbolic=0, ordered=True):\n",
+ " \n",
+ " arc_c, type_c, output, hn = sess.run([model.encode_arc_c, model.type_c, \n",
+ " model.encode_output, model.encode_hidden],\n",
+ " feed_dict = {model.words: input_word, model.chars: input_char})\n",
+ " batch, max_len_e, _ = output.shape\n",
+ "\n",
+ " heads = np.zeros([batch, max_len_e], dtype=np.int32)\n",
+ " types = np.zeros([batch, max_len_e], dtype=np.int32)\n",
+ "\n",
+ " children = np.zeros([batch, 2 * max_len_e - 1], dtype=np.int32)\n",
+ " stack_types = np.zeros([batch, 2 * max_len_e - 1], dtype=np.int32)\n",
+ " \n",
+ " for b in range(batch):\n",
+ " sent_len = None if length is None else length[b]\n",
+ " preds = decode_sentence(output[b], arc_c[b], type_c[b], [hn[b]], \n",
+ " beam, sent_len, ordered, leading_symbolic)\n",
+ " if preds is None:\n",
+ " preds = decode_sentence(output[b], arc_c[b], type_c[b], [hn[b]], beam, \n",
+ " sent_len, False, leading_symbolic)\n",
+ " hids, tids, sent_len, chids, stids = preds\n",
+ " heads[b, :sent_len] = hids\n",
+ " types[b, :sent_len] = tids\n",
+ "\n",
+ " children[b, :2 * sent_len - 1] = chids\n",
+ " stack_types[b, :2 * sent_len - 1] = stids\n",
+ "\n",
+ " return heads, types, children, stack_types"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 20,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "def generate_char_seq(batch, UNK = 2):\n",
+ " maxlen_c = max([len(k) for k in batch])\n",
+ " x = [[len(i) for i in k] for k in batch]\n",
+ " maxlen = max([j for i in x for j in i])\n",
+ " temp = np.zeros((len(batch),maxlen_c,maxlen),dtype=np.int32)\n",
+ " for i in range(len(batch)):\n",
+ " for k in range(len(batch[i])):\n",
+ " for no, c in enumerate(batch[i][k]):\n",
+ " temp[i,k,-1-no] = char2idx.get(c, UNK)\n",
+ " return temp"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 33,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "((5, 37), (5, 73))"
+ ]
+ },
+ "execution_count": 33,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "from tensorflow.keras.preprocessing.sequence import pad_sequences\n",
+ "\n",
+ "batch_x = train_X[:5]\n",
+ "batch_x = pad_sequences(batch_x,padding='post')\n",
+ "batch_char = train_char[:5]\n",
+ "batch_char = generate_char_seq(batch_char)\n",
+ "batch_y = train_Y[:5]\n",
+ "batch_y = pad_sequences(batch_y,padding='post')\n",
+ "batch_depends = train_depends[:5]\n",
+ "batch_depends = pad_sequences(batch_depends,padding='post')\n",
+ "\n",
+ "batch_stacked_heads = stacked_heads_train[:5]\n",
+ "batch_stacked_heads = pad_sequences(batch_stacked_heads,padding='post')\n",
+ "batch_children = children_train[:5]\n",
+ "batch_children = pad_sequences(batch_children,padding='post')\n",
+ "batch_siblings = siblings_train[:5]\n",
+ "batch_siblings = pad_sequences(batch_siblings,padding='post')\n",
+ "batch_stacked_types = stacked_types_train[:5]\n",
+ "batch_stacked_types = pad_sequences(batch_stacked_types,padding='post')\n",
+ "batch_e = np.zeros(batch_x.shape)\n",
+ "batch_d = np.zeros(batch_stacked_heads.shape)\n",
+ "nonzero = np.count_nonzero(batch_x, axis = 1)\n",
+ "\n",
+ "for no, i in enumerate(nonzero):\n",
+ " batch_e[no,:i] = 1.0\n",
+ "for no, i in enumerate(nonzero * 2 - 1):\n",
+ " batch_d[no,:i] = 1.0\n",
+ " \n",
+ "batch_x.shape, batch_stacked_heads.shape"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 22,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "14.264593"
+ ]
+ },
+ "execution_count": 22,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "feed_dict = {model.words: batch_x,\n",
+ " model.chars: batch_char,\n",
+ " model.heads: batch_depends,\n",
+ " model.stacked_heads: batch_stacked_heads,\n",
+ " model.childrens: batch_children,\n",
+ " model.siblings: batch_siblings,\n",
+ " model.stacked_types: batch_stacked_types,\n",
+ " model.mask_e: batch_e,\n",
+ " model.mask_d: batch_d}\n",
+ "sess.run(model.cost, feed_dict = feed_dict)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 34,
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "CPU times: user 2.27 s, sys: 251 ms, total: 2.52 s\n",
+ "Wall time: 1.32 s\n"
+ ]
+ },
+ {
+ "data": {
+ "text/plain": [
+ "(array([[ 0, 0, 1, 0, 1, 6, 1, 1, 7, 0, 0, 12, 0, 0, 15, 8,\n",
+ " 18, 18, 7, 21, 21, 18, 23, 21, 1, 28, 28, 28, 21, 1, 34, 34,\n",
+ " 31, 34, 0, 34, 0],\n",
+ " [ 0, 10, 3, 10, 7, 7, 7, 3, 10, 10, 0, 10, 10, 14, 10, 16,\n",
+ " 14, 10, 10, 23, 0, 0, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36,\n",
+ " 36, 36, 36, 36, 21],\n",
+ " [ 0, 0, 1, 4, 5, 1, 9, 9, 9, 5, 9, 13, 13, 9, 13, 16,\n",
+ " 14, 1, 0, 0, 25, 0, 0, 25, 25, 22, 0, 0, 0, 0, 0, 0,\n",
+ " 0, 0, 0, 0, 0],\n",
+ " [ 0, 6, 3, 1, 6, 6, 0, 9, 9, 6, 12, 12, 9, 15, 15, 12,\n",
+ " 6, 0, 17, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,\n",
+ " 0, 0, 0, 0, 0],\n",
+ " [ 0, 2, 6, 4, 2, 6, 0, 10, 10, 10, 6, 6, 16, 16, 16, 17,\n",
+ " 6, 16, 17, 18, 18, 22, 17, 27, 27, 27, 27, 22, 31, 31, 31, 27,\n",
+ " 35, 35, 35, 22, 6]], dtype=int32),\n",
+ " array([[ 0, 5, 7, 8, 7, 13, 35, 28, 10, 8, 44, 7, 38, 7, 3, 35,\n",
+ " 2, 3, 4, 2, 3, 14, 2, 14, 7, 2, 3, 13, 14, 7, 7, 7,\n",
+ " 0, 7, 23, 23, 45],\n",
+ " [ 0, 7, 3, 6, 2, 3, 13, 14, 18, 18, 5, 10, 10, 2, 4, 11,\n",
+ " 20, 7, 7, 0, 16, 4, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,\n",
+ " 0, 0, 0, 0, 4],\n",
+ " [ 0, 5, 7, 13, 6, 28, 11, 6, 18, 27, 36, 9, 13, 10, 20, 2,\n",
+ " 4, 7, 7, 23, 0, 23, 23, 0, 0, 4, 23, 23, 23, 23, 23, 23,\n",
+ " 23, 23, 23, 23, 45],\n",
+ " [ 0, 6, 2, 14, 18, 19, 5, 2, 9, 4, 2, 3, 14, 2, 3, 14,\n",
+ " 7, 28, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4,\n",
+ " 4, 23, 23, 23, 4],\n",
+ " [ 0, 3, 6, 2, 14, 25, 5, 2, 3, 15, 4, 7, 16, 6, 18, 0,\n",
+ " 28, 12, 6, 8, 0, 16, 27, 2, 3, 13, 15, 10, 2, 3, 15, 14,\n",
+ " 2, 2, 3, 4, 7]], dtype=int32),\n",
+ " array([[ 1, 7, 18, 21, 23, 22, 22, 23, 28, 25, 25, 26, 26, 27, 27, 28,\n",
+ " 19, 19, 20, 20, 21, 16, 16, 17, 17, 18, 8, 15, 14, 14, 15, 8,\n",
+ " 7, 6, 5, 5, 6, 2, 2, 4, 4, 24, 24, 29, 29, 1, 36, 36,\n",
+ " 34, 35, 35, 30, 30, 31, 32, 32, 31, 33, 33, 34, 3, 3, 10, 10,\n",
+ " 12, 11, 11, 12, 9, 9, 13, 13, 0],\n",
+ " [10, 3, 7, 4, 4, 5, 5, 6, 6, 7, 2, 2, 3, 14, 16, 15,\n",
+ " 15, 16, 13, 13, 14, 1, 1, 8, 8, 9, 9, 11, 11, 12, 12, 17,\n",
+ " 17, 18, 18, 10, 20, 20, 21, 36, 22, 22, 23, 19, 19, 23, 24, 24,\n",
+ " 25, 25, 26, 26, 27, 27, 28, 28, 29, 29, 30, 30, 31, 31, 32, 32,\n",
+ " 33, 33, 34, 34, 35, 35, 36, 21, 0],\n",
+ " [ 1, 5, 9, 13, 14, 16, 15, 15, 16, 14, 11, 11, 12, 12, 13, 6,\n",
+ " 6, 7, 7, 8, 8, 10, 10, 9, 4, 3, 3, 4, 5, 2, 2, 17,\n",
+ " 17, 1, 36, 36, 35, 35, 34, 34, 33, 33, 32, 32, 31, 31, 30, 30,\n",
+ " 29, 29, 28, 28, 27, 27, 26, 26, 21, 21, 22, 25, 23, 23, 24, 24,\n",
+ " 20, 20, 25, 22, 19, 19, 18, 18, 0],\n",
+ " [ 6, 9, 12, 15, 13, 13, 14, 14, 15, 10, 10, 11, 11, 12, 7, 7,\n",
+ " 8, 8, 9, 1, 3, 2, 2, 3, 1, 4, 4, 5, 5, 16, 16, 6,\n",
+ " 36, 36, 35, 35, 34, 34, 33, 33, 32, 32, 31, 31, 30, 30, 29, 29,\n",
+ " 28, 28, 27, 27, 26, 26, 25, 25, 24, 24, 23, 23, 22, 22, 21, 21,\n",
+ " 20, 20, 19, 19, 17, 18, 18, 17, 0],\n",
+ " [ 6, 16, 17, 22, 27, 31, 28, 28, 29, 29, 30, 30, 31, 23, 23, 24,\n",
+ " 24, 25, 25, 26, 26, 27, 35, 32, 32, 33, 33, 34, 34, 35, 21, 21,\n",
+ " 22, 18, 19, 19, 20, 20, 18, 15, 15, 17, 12, 12, 13, 13, 14, 14,\n",
+ " 16, 2, 4, 3, 3, 4, 1, 1, 2, 10, 7, 7, 8, 8, 9, 9,\n",
+ " 10, 5, 5, 11, 11, 36, 36, 6, 0]], dtype=int32),\n",
+ " array([[ 5, 28, 4, 14, 14, 2, 0, 0, 14, 2, 0, 3, 0, 13, 0, 0,\n",
+ " 2, 0, 3, 0, 0, 2, 0, 3, 0, 0, 10, 35, 3, 0, 0, 0,\n",
+ " 0, 35, 13, 0, 0, 7, 0, 7, 0, 7, 0, 7, 0, 0, 45, 0,\n",
+ " 23, 23, 0, 7, 0, 7, 0, 0, 0, 7, 0, 0, 8, 0, 44, 0,\n",
+ " 38, 7, 0, 0, 8, 0, 7, 0, 0],\n",
+ " [ 5, 6, 14, 2, 0, 3, 0, 13, 0, 0, 3, 0, 0, 4, 20, 11,\n",
+ " 0, 0, 2, 0, 0, 7, 0, 18, 0, 18, 0, 10, 0, 10, 0, 7,\n",
+ " 0, 7, 0, 0, 16, 0, 4, 4, 0, 0, 0, 0, 0, 0, 0, 0,\n",
+ " 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,\n",
+ " 0, 0, 0, 0, 0, 0, 0, 0, 0],\n",
+ " [ 5, 28, 27, 10, 20, 4, 2, 0, 0, 0, 9, 0, 13, 0, 0, 11,\n",
+ " 0, 6, 0, 18, 0, 36, 0, 0, 6, 13, 0, 0, 0, 7, 0, 7,\n",
+ " 0, 0, 45, 0, 23, 0, 23, 0, 23, 0, 23, 0, 23, 0, 23, 0,\n",
+ " 23, 0, 23, 0, 23, 0, 23, 0, 23, 0, 23, 4, 0, 0, 0, 0,\n",
+ " 0, 0, 0, 0, 23, 0, 7, 0, 0],\n",
+ " [ 5, 4, 14, 14, 2, 0, 3, 0, 0, 2, 0, 3, 0, 0, 2, 0,\n",
+ " 9, 0, 0, 6, 14, 2, 0, 0, 0, 18, 0, 19, 0, 7, 0, 0,\n",
+ " 4, 0, 23, 0, 23, 0, 23, 0, 4, 0, 4, 0, 4, 0, 4, 0,\n",
+ " 4, 0, 4, 0, 4, 0, 4, 0, 4, 0, 4, 0, 4, 0, 4, 0,\n",
+ " 4, 0, 4, 0, 28, 4, 0, 0, 0],\n",
+ " [ 5, 28, 12, 27, 10, 14, 2, 0, 3, 0, 15, 0, 0, 2, 0, 3,\n",
+ " 0, 13, 0, 15, 0, 0, 4, 2, 0, 2, 0, 3, 0, 0, 16, 0,\n",
+ " 0, 6, 8, 0, 0, 0, 0, 0, 0, 0, 16, 0, 6, 0, 18, 0,\n",
+ " 0, 6, 14, 2, 0, 0, 3, 0, 0, 4, 2, 0, 3, 0, 15, 0,\n",
+ " 0, 25, 0, 7, 0, 7, 0, 0, 0]], dtype=int32))"
+ ]
+ },
+ "execution_count": 34,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "%%time\n",
+ "decode(batch_x, batch_char)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 24,
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stderr",
+ "output_type": "stream",
+ "text": [
+ "train minibatch loop: 100%|██████████| 375/375 [01:01<00:00, 6.11it/s, cost=2.97]\n",
+ "test minibatch loop: 100%|██████████| 120/120 [00:06<00:00, 19.70it/s, cost=12.1]\n",
+ "train minibatch loop: 0%| | 1/375 [00:00<01:01, 6.08it/s, cost=3.28]"
+ ]
+ },
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "epoch: 0, training loss: 5.157737, valid loss: 11.861909\n",
+ "\n"
+ ]
+ },
+ {
+ "name": "stderr",
+ "output_type": "stream",
+ "text": [
+ "train minibatch loop: 100%|██████████| 375/375 [01:00<00:00, 6.15it/s, cost=2.01]\n",
+ "test minibatch loop: 100%|██████████| 120/120 [00:06<00:00, 18.53it/s, cost=13.8]\n",
+ "train minibatch loop: 0%| | 1/375 [00:00<01:07, 5.52it/s, cost=2.31]"
+ ]
+ },
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "epoch: 1, training loss: 2.576627, valid loss: 13.340673\n",
+ "\n"
+ ]
+ },
+ {
+ "name": "stderr",
+ "output_type": "stream",
+ "text": [
+ "train minibatch loop: 100%|██████████| 375/375 [01:00<00:00, 6.17it/s, cost=1.55] \n",
+ "test minibatch loop: 100%|██████████| 120/120 [00:06<00:00, 19.33it/s, cost=15.4]\n",
+ "train minibatch loop: 0%| | 1/375 [00:00<01:02, 5.99it/s, cost=1.77]"
+ ]
+ },
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "epoch: 2, training loss: 1.922838, valid loss: 14.725556\n",
+ "\n"
+ ]
+ },
+ {
+ "name": "stderr",
+ "output_type": "stream",
+ "text": [
+ "train minibatch loop: 100%|██████████| 375/375 [01:01<00:00, 6.11it/s, cost=1.36] \n",
+ "test minibatch loop: 100%|██████████| 120/120 [00:06<00:00, 19.20it/s, cost=16.4]\n",
+ "train minibatch loop: 0%| | 1/375 [00:00<01:05, 5.70it/s, cost=1.47]"
+ ]
+ },
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "epoch: 3, training loss: 1.529883, valid loss: 15.789502\n",
+ "\n"
+ ]
+ },
+ {
+ "name": "stderr",
+ "output_type": "stream",
+ "text": [
+ "train minibatch loop: 100%|██████████| 375/375 [01:00<00:00, 6.15it/s, cost=1.12] \n",
+ "test minibatch loop: 100%|██████████| 120/120 [00:06<00:00, 19.20it/s, cost=17.9]\n",
+ "train minibatch loop: 0%| | 1/375 [00:00<01:03, 5.88it/s, cost=1.2]"
+ ]
+ },
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "epoch: 4, training loss: 1.266019, valid loss: 17.307760\n",
+ "\n"
+ ]
+ },
+ {
+ "name": "stderr",
+ "output_type": "stream",
+ "text": [
+ "train minibatch loop: 100%|██████████| 375/375 [01:00<00:00, 6.16it/s, cost=1.02] \n",
+ "test minibatch loop: 100%|██████████| 120/120 [00:06<00:00, 18.87it/s, cost=19.5]\n",
+ "train minibatch loop: 0%| | 1/375 [00:00<01:03, 5.93it/s, cost=1.06]"
+ ]
+ },
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "epoch: 5, training loss: 1.066313, valid loss: 19.008535\n",
+ "\n"
+ ]
+ },
+ {
+ "name": "stderr",
+ "output_type": "stream",
+ "text": [
+ "train minibatch loop: 100%|██████████| 375/375 [01:00<00:00, 6.16it/s, cost=0.878]\n",
+ "test minibatch loop: 100%|██████████| 120/120 [00:06<00:00, 19.69it/s, cost=21.3]\n",
+ "train minibatch loop: 0%| | 1/375 [00:00<01:02, 6.03it/s, cost=0.895]"
+ ]
+ },
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "epoch: 6, training loss: 0.908035, valid loss: 20.994354\n",
+ "\n"
+ ]
+ },
+ {
+ "name": "stderr",
+ "output_type": "stream",
+ "text": [
+ "train minibatch loop: 100%|██████████| 375/375 [01:01<00:00, 6.13it/s, cost=0.748]\n",
+ "test minibatch loop: 100%|██████████| 120/120 [00:06<00:00, 18.86it/s, cost=22.6]\n",
+ "train minibatch loop: 0%| | 1/375 [00:00<01:02, 6.03it/s, cost=0.771]"
+ ]
+ },
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "epoch: 7, training loss: 0.780265, valid loss: 22.426714\n",
+ "\n"
+ ]
+ },
+ {
+ "name": "stderr",
+ "output_type": "stream",
+ "text": [
+ "train minibatch loop: 100%|██████████| 375/375 [01:01<00:00, 6.11it/s, cost=0.636]\n",
+ "test minibatch loop: 100%|██████████| 120/120 [00:06<00:00, 19.22it/s, cost=24.4]\n",
+ "train minibatch loop: 0%| | 1/375 [00:00<01:03, 5.93it/s, cost=0.615]"
+ ]
+ },
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "epoch: 8, training loss: 0.687402, valid loss: 24.419289\n",
+ "\n"
+ ]
+ },
+ {
+ "name": "stderr",
+ "output_type": "stream",
+ "text": [
+ "train minibatch loop: 100%|██████████| 375/375 [01:00<00:00, 6.17it/s, cost=0.628]\n",
+ "test minibatch loop: 100%|██████████| 120/120 [00:06<00:00, 18.42it/s, cost=26.8]\n",
+ "train minibatch loop: 0%| | 1/375 [00:00<01:05, 5.75it/s, cost=0.546]"
+ ]
+ },
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "epoch: 9, training loss: 0.609938, valid loss: 26.764641\n",
+ "\n"
+ ]
+ },
+ {
+ "name": "stderr",
+ "output_type": "stream",
+ "text": [
+ "train minibatch loop: 100%|██████████| 375/375 [01:00<00:00, 6.18it/s, cost=0.613]\n",
+ "test minibatch loop: 100%|██████████| 120/120 [00:06<00:00, 17.83it/s, cost=28.2]\n",
+ "train minibatch loop: 0%| | 1/375 [00:00<01:06, 5.65it/s, cost=0.52]"
+ ]
+ },
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "epoch: 10, training loss: 0.525183, valid loss: 28.478970\n",
+ "\n"
+ ]
+ },
+ {
+ "name": "stderr",
+ "output_type": "stream",
+ "text": [
+ "train minibatch loop: 100%|██████████| 375/375 [01:00<00:00, 6.15it/s, cost=0.538]\n",
+ "test minibatch loop: 100%|██████████| 120/120 [00:06<00:00, 18.67it/s, cost=31.2]\n",
+ "train minibatch loop: 0%| | 1/375 [00:00<01:06, 5.62it/s, cost=0.484]"
+ ]
+ },
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "epoch: 11, training loss: 0.459827, valid loss: 31.322876\n",
+ "\n"
+ ]
+ },
+ {
+ "name": "stderr",
+ "output_type": "stream",
+ "text": [
+ "train minibatch loop: 100%|██████████| 375/375 [01:00<00:00, 6.17it/s, cost=0.512] \n",
+ "test minibatch loop: 100%|██████████| 120/120 [00:06<00:00, 19.11it/s, cost=32.4]\n",
+ "train minibatch loop: 0%| | 1/375 [00:00<01:06, 5.59it/s, cost=0.367]"
+ ]
+ },
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "epoch: 12, training loss: 0.400364, valid loss: 33.366253\n",
+ "\n"
+ ]
+ },
+ {
+ "name": "stderr",
+ "output_type": "stream",
+ "text": [
+ "train minibatch loop: 100%|██████████| 375/375 [01:01<00:00, 6.14it/s, cost=0.413] \n",
+ "test minibatch loop: 100%|██████████| 120/120 [00:06<00:00, 18.76it/s, cost=34.1]\n",
+ "train minibatch loop: 0%| | 1/375 [00:00<01:02, 5.95it/s, cost=0.316]"
+ ]
+ },
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "epoch: 13, training loss: 0.357156, valid loss: 34.881569\n",
+ "\n"
+ ]
+ },
+ {
+ "name": "stderr",
+ "output_type": "stream",
+ "text": [
+ "train minibatch loop: 100%|██████████| 375/375 [01:00<00:00, 6.16it/s, cost=0.331] \n",
+ "test minibatch loop: 100%|██████████| 120/120 [00:06<00:00, 18.91it/s, cost=36.8]"
+ ]
+ },
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "epoch: 14, training loss: 0.307119, valid loss: 37.149876\n",
+ "\n"
+ ]
+ },
+ {
+ "name": "stderr",
+ "output_type": "stream",
+ "text": [
+ "\n"
+ ]
+ }
+ ],
+ "source": [
+ "from tqdm import tqdm\n",
+ "\n",
+ "batch_size = 32\n",
+ "epoch = 15\n",
+ "\n",
+ "for e in range(epoch):\n",
+ " test_loss, train_loss = [], []\n",
+ " \n",
+ " pbar = tqdm(\n",
+ " range(0, len(train_X), batch_size), desc = 'train minibatch loop'\n",
+ " )\n",
+ " for i in pbar:\n",
+ " index = min(i + batch_size, len(train_X))\n",
+ " batch_x = train_X[i: index]\n",
+ " batch_x = pad_sequences(batch_x,padding='post')\n",
+ " batch_char = train_char[i: index]\n",
+ " batch_char = generate_char_seq(batch_char)\n",
+ " batch_y = train_Y[i: index]\n",
+ " batch_y = pad_sequences(batch_y,padding='post')\n",
+ " batch_depends = train_depends[i: index]\n",
+ " batch_depends = pad_sequences(batch_depends,padding='post')\n",
+ "\n",
+ " batch_stacked_heads = stacked_heads_train[i: index]\n",
+ " batch_stacked_heads = pad_sequences(batch_stacked_heads,padding='post')\n",
+ " batch_children = children_train[i: index]\n",
+ " batch_children = pad_sequences(batch_children,padding='post')\n",
+ " batch_siblings = siblings_train[i: index]\n",
+ " batch_siblings = pad_sequences(batch_siblings,padding='post')\n",
+ " batch_stacked_types = stacked_types_train[i: index]\n",
+ " batch_stacked_types = pad_sequences(batch_stacked_types,padding='post')\n",
+ " batch_e = np.zeros(batch_x.shape)\n",
+ " batch_d = np.zeros(batch_stacked_heads.shape)\n",
+ " nonzero = np.count_nonzero(batch_x, axis = 1)\n",
+ "\n",
+ " for no, i in enumerate(nonzero):\n",
+ " batch_e[no,:i] = 1.0\n",
+ " for no, i in enumerate(nonzero * 2 - 1):\n",
+ " batch_d[no,:i] = 1.0\n",
+ " \n",
+ " feed_dict = {model.words: batch_x,\n",
+ " model.chars: batch_char,\n",
+ " model.heads: batch_depends,\n",
+ " model.stacked_heads: batch_stacked_heads,\n",
+ " model.childrens: batch_children,\n",
+ " model.siblings: batch_siblings,\n",
+ " model.stacked_types: batch_stacked_types,\n",
+ " model.mask_e: batch_e,\n",
+ " model.mask_d: batch_d}\n",
+ " cost, _ = sess.run([model.cost, model.optimizer], feed_dict = feed_dict)\n",
+ " train_loss.append(cost)\n",
+ " pbar.set_postfix(cost = cost)\n",
+ " \n",
+ " pbar = tqdm(\n",
+ " range(0, len(test_X), batch_size), desc = 'test minibatch loop'\n",
+ " )\n",
+ " for i in pbar:\n",
+ " index = min(i + batch_size, len(test_X))\n",
+ " batch_x = test_X[i: index]\n",
+ " batch_x = pad_sequences(batch_x,padding='post')\n",
+ " batch_char = test_char[i: index]\n",
+ " batch_char = generate_char_seq(batch_char)\n",
+ " batch_y = test_Y[i: index]\n",
+ " batch_y = pad_sequences(batch_y,padding='post')\n",
+ " batch_depends = test_depends[i: index]\n",
+ " batch_depends = pad_sequences(batch_depends,padding='post')\n",
+ "\n",
+ " batch_stacked_heads = stacked_heads_test[i: index]\n",
+ " batch_stacked_heads = pad_sequences(batch_stacked_heads,padding='post')\n",
+ " batch_children = children_test[i: index]\n",
+ " batch_children = pad_sequences(batch_children,padding='post')\n",
+ " batch_siblings = siblings_test[i: index]\n",
+ " batch_siblings = pad_sequences(batch_siblings,padding='post')\n",
+ " batch_stacked_types = stacked_types_test[i: index]\n",
+ " batch_stacked_types = pad_sequences(batch_stacked_types,padding='post')\n",
+ " batch_e = np.zeros(batch_x.shape)\n",
+ " batch_d = np.zeros(batch_stacked_heads.shape)\n",
+ " nonzero = np.count_nonzero(batch_x, axis = 1)\n",
+ "\n",
+ " for no, i in enumerate(nonzero):\n",
+ " batch_e[no,:i] = 1.0\n",
+ " for no, i in enumerate(nonzero * 2 - 1):\n",
+ " batch_d[no,:i] = 1.0\n",
+ " \n",
+ " feed_dict = {model.words: batch_x,\n",
+ " model.chars: batch_char,\n",
+ " model.heads: batch_depends,\n",
+ " model.stacked_heads: batch_stacked_heads,\n",
+ " model.childrens: batch_children,\n",
+ " model.siblings: batch_siblings,\n",
+ " model.stacked_types: batch_stacked_types,\n",
+ " model.mask_e: batch_e,\n",
+ " model.mask_d: batch_d}\n",
+ " cost = sess.run(model.cost, feed_dict = feed_dict)\n",
+ " test_loss.append(cost)\n",
+ " pbar.set_postfix(cost = cost)\n",
+ " \n",
+ " print(\n",
+ " 'epoch: %d, training loss: %f, valid loss: %f\\n'\n",
+ " % (e, np.mean(train_loss), np.mean(test_loss)))\n",
+ " "
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 25,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "def evaluate(heads_pred, types_pred, heads, types, lengths,\n",
+ " symbolic_root=False, symbolic_end=False):\n",
+ " batch_size, _ = heads_pred.shape\n",
+ " ucorr = 0.\n",
+ " lcorr = 0.\n",
+ " total = 0.\n",
+ " ucomplete_match = 0.\n",
+ " lcomplete_match = 0.\n",
+ "\n",
+ " corr_root = 0.\n",
+ " total_root = 0.\n",
+ " start = 1 if symbolic_root else 0\n",
+ " end = 1 if symbolic_end else 0\n",
+ " for i in range(batch_size):\n",
+ " ucm = 1.\n",
+ " lcm = 1.\n",
+ " for j in range(start, lengths[i] - end):\n",
+ "\n",
+ " total += 1\n",
+ " if heads[i, j] == heads_pred[i, j]:\n",
+ " ucorr += 1\n",
+ " if types[i, j] == types_pred[i, j]:\n",
+ " lcorr += 1\n",
+ " else:\n",
+ " lcm = 0\n",
+ " else:\n",
+ " ucm = 0\n",
+ " lcm = 0\n",
+ "\n",
+ " if heads[i, j] == 0:\n",
+ " total_root += 1\n",
+ " corr_root += 1 if heads_pred[i, j] == 0 else 0\n",
+ "\n",
+ " ucomplete_match += ucm\n",
+ " lcomplete_match += lcm\n",
+ " \n",
+ " return ucorr / total, lcorr / total, corr_root / total_root"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 27,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "(0.6045627376425855, 0.5209125475285171, 0.90625)"
+ ]
+ },
+ "execution_count": 27,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "heads, types, _, _ = decode(batch_x, batch_char)\n",
+ "arc_accuracy, type_accuracy, root_accuracy = evaluate(heads, types, batch_depends, batch_y, \n",
+ " np.count_nonzero(batch_x, axis = 1))\n",
+ "arc_accuracy, type_accuracy, root_accuracy"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 36,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "arcs, types, roots = [], [], []\n",
+ "\n",
+ "for i in range(0, len(test_X), 5):\n",
+ " index = min(i + 5, len(test_X))\n",
+ " batch_x = test_X[i: index]\n",
+ " batch_x = pad_sequences(batch_x,padding='post')\n",
+ " batch_char = test_char[i: index]\n",
+ " batch_char = generate_char_seq(batch_char)\n",
+ " batch_y = test_Y[i: index]\n",
+ " batch_y = pad_sequences(batch_y,padding='post')\n",
+ " batch_depends = test_depends[i: index]\n",
+ " batch_depends = pad_sequences(batch_depends,padding='post')\n",
+ " \n",
+ " heads, tags_seq, _, _ = decode(batch_x, batch_char)\n",
+ " \n",
+ " arc_accuracy, type_accuracy, root_accuracy = evaluate(heads, tags_seq, batch_depends, batch_y, \n",
+ " np.count_nonzero(batch_x, axis = 1))\n",
+ " arcs.append(arc_accuracy)\n",
+ " types.append(type_accuracy)\n",
+ " roots.append(root_accuracy)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 37,
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "arc accuracy: 0.6188156085110088\n",
+ "types accuracy: 0.482035002661857\n",
+ "root accuracy: 0.8939869281045753\n"
+ ]
+ }
+ ],
+ "source": [
+ "print('arc accuracy:', np.mean(arcs))\n",
+ "print('types accuracy:', np.mean(types))\n",
+ "print('root accuracy:', np.mean(roots))"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": []
+ }
+ ],
+ "metadata": {
+ "kernelspec": {
+ "display_name": "Python 3",
+ "language": "python",
+ "name": "python3"
+ },
+ "language_info": {
+ "codemirror_mode": {
+ "name": "ipython",
+ "version": 3
+ },
+ "file_extension": ".py",
+ "mimetype": "text/x-python",
+ "name": "python",
+ "nbconvert_exporter": "python",
+ "pygments_lexer": "ipython3",
+ "version": "3.6.8"
+ }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 2
+}
diff --git a/dependency-parser/8.xlnet-biaffine-attention-cross-entropy.ipynb b/dependency-parser/8.xlnet-biaffine-attention-cross-entropy.ipynb
new file mode 100644
index 0000000..dc02b25
--- /dev/null
+++ b/dependency-parser/8.xlnet-biaffine-attention-cross-entropy.ipynb
@@ -0,0 +1,1608 @@
+{
+ "cells": [
+ {
+ "cell_type": "code",
+ "execution_count": 1,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "# !wget https://raw.githubusercontent.com/UniversalDependencies/UD_English-EWT/master/en_ewt-ud-dev.conllu\n",
+ "# !wget https://raw.githubusercontent.com/UniversalDependencies/UD_English-EWT/master/en_ewt-ud-train.conllu\n",
+ "# !wget https://raw.githubusercontent.com/UniversalDependencies/UD_English-EWT/master/en_ewt-ud-test.conllu\n",
+ "# !wget https://storage.googleapis.com/xlnet/released_models/cased_L-12_H-768_A-12.zip -O xlnet.zip\n",
+ "# !unzip xlnet.zip"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 2,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "import os\n",
+ "os.environ['CUDA_VISIBLE_DEVICES'] = '1'"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 3,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "tag2idx = {'PAD': 0, 'X': 1}\n",
+ "tag_idx = 2"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 4,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "import sentencepiece as spm\n",
+ "from prepro_utils import preprocess_text, encode_ids\n",
+ "\n",
+ "sp_model = spm.SentencePieceProcessor()\n",
+ "sp_model.Load('xlnet_cased_L-12_H-768_A-12/spiece.model')\n",
+ "\n",
+ "def tokenize_fn(text):\n",
+ " text = preprocess_text(text, lower= False)\n",
+ " return encode_ids(sp_model, text)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 5,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "SEG_ID_A = 0\n",
+ "SEG_ID_B = 1\n",
+ "SEG_ID_CLS = 2\n",
+ "SEG_ID_SEP = 3\n",
+ "SEG_ID_PAD = 4\n",
+ "\n",
+ "special_symbols = {\n",
+ " \"\" : 0,\n",
+ " \"\" : 1,\n",
+ " \"\" : 2,\n",
+ " \"\" : 3,\n",
+ " \"\" : 4,\n",
+ " \"\" : 5,\n",
+ " \"\" : 6,\n",
+ " \"\" : 7,\n",
+ " \"\" : 8,\n",
+ "}\n",
+ "\n",
+ "VOCAB_SIZE = 32000\n",
+ "UNK_ID = special_symbols[\"\"]\n",
+ "CLS_ID = special_symbols[\"\"]\n",
+ "SEP_ID = special_symbols[\"\"]\n",
+ "MASK_ID = special_symbols[\"\"]\n",
+ "EOD_ID = special_symbols[\"\"]"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 6,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "def process_corpus(corpus, until = None):\n",
+ " global word2idx, tag2idx, char2idx, word_idx, tag_idx, char_idx\n",
+ " sentences, words, depends, labels, pos, sequences = [], [], [], [], [], []\n",
+ " temp_sentence, temp_word, temp_depend, temp_label, temp_pos = [], [], [], [], []\n",
+ " segments, masks = [], []\n",
+ " first_time = True\n",
+ " for sentence in corpus:\n",
+ " try:\n",
+ " if len(sentence):\n",
+ " if sentence[0] == '#':\n",
+ " continue\n",
+ " if first_time:\n",
+ " print(sentence)\n",
+ " first_time = False\n",
+ " sentence = sentence.split('\\t')\n",
+ " if sentence[7] not in tag2idx:\n",
+ " tag2idx[sentence[7]] = tag_idx\n",
+ " tag_idx += 1\n",
+ " temp_word.append(sentence[1])\n",
+ " temp_depend.append(int(sentence[6]) + 1)\n",
+ " temp_label.append(tag2idx[sentence[7]])\n",
+ " temp_sentence.append(sentence[1])\n",
+ " temp_pos.append(sentence[3])\n",
+ " else:\n",
+ " if len(temp_sentence) < 2 or len(temp_word) != len(temp_label):\n",
+ " temp_word = []\n",
+ " temp_depend = []\n",
+ " temp_label = []\n",
+ " temp_sentence = []\n",
+ " temp_pos = []\n",
+ " continue\n",
+ " bert_tokens = []\n",
+ " labels_ = []\n",
+ " depends_ = []\n",
+ " seq_ = []\n",
+ " for no, orig_token in enumerate(temp_word):\n",
+ " t = tokenize_fn(orig_token)\n",
+ " labels_.append(temp_label[no])\n",
+ " depends_.append(temp_depend[no])\n",
+ " bert_tokens.extend(t)\n",
+ " labels_.extend([1] * (len(t) - 1))\n",
+ " depends_.extend([0] * (len(t) - 1))\n",
+ " seq_.append(no + 1)\n",
+ " bert_tokens.extend([4, 3])\n",
+ " labels_.extend([0, 0])\n",
+ " depends_.extend([0, 0])\n",
+ " segment = [0] * (len(bert_tokens) - 1) + [SEG_ID_CLS]\n",
+ " input_mask = [0] * len(segment)\n",
+ " words.append(bert_tokens)\n",
+ " depends.append(depends_)\n",
+ " labels.append(labels_)\n",
+ " sentences.append(temp_sentence)\n",
+ " pos.append(temp_pos)\n",
+ " sequences.append(seq_)\n",
+ " segments.append(segment)\n",
+ " masks.append(input_mask)\n",
+ " temp_word = []\n",
+ " temp_depend = []\n",
+ " temp_label = []\n",
+ " temp_sentence = []\n",
+ " temp_pos = []\n",
+ " except Exception as e:\n",
+ " print(e, sentence)\n",
+ " return sentences[:-1], words[:-1], depends[:-1], labels[:-1], pos[:-1], sequences[:-1], segments[:-1], masks[:-1]"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 7,
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "1\tFrom\tfrom\tADP\tIN\t_\t3\tcase\t3:case\t_\n",
+ "invalid literal for int() with base 10: '_' ['10.1', 'has', 'have', 'VERB', 'VBZ', '_', '_', '_', '8:parataxis', 'CopyOf=-1']\n",
+ "invalid literal for int() with base 10: '_' ['21.1', 'has', 'have', 'VERB', 'VBZ', '_', '_', '_', '16:conj:and', 'CopyOf=-1']\n"
+ ]
+ }
+ ],
+ "source": [
+ "with open('en_ewt-ud-dev.conllu') as fopen:\n",
+ " dev = fopen.read().split('\\n')\n",
+ "\n",
+ "sentences_dev, words_dev, depends_dev, labels_dev, _, seq_dev, segments_dev, masks_dev = process_corpus(dev)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 8,
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "1\tWhat\twhat\tPRON\tWP\tPronType=Int\t0\troot\t0:root\t_\n",
+ "invalid literal for int() with base 10: '_' ['24.1', 'left', 'left', 'VERB', 'VBN', 'Tense=Past|VerbForm=Part', '_', '_', '6:parataxis', 'CopyOf=6']\n"
+ ]
+ }
+ ],
+ "source": [
+ "with open('en_ewt-ud-test.conllu') as fopen:\n",
+ " test = fopen.read().split('\\n')\n",
+ "\n",
+ "sentences_test, words_test, depends_test, labels_test, _, seq_test, segments_test, masks_test = process_corpus(test)\n",
+ "sentences_test.extend(sentences_dev)\n",
+ "words_test.extend(words_dev)\n",
+ "depends_test.extend(depends_dev)\n",
+ "labels_test.extend(labels_dev)\n",
+ "seq_test.extend(seq_dev)\n",
+ "segments_test.extend(segments_dev)\n",
+ "masks_test.extend(masks_dev)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 9,
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "1\tAl\tAl\tPROPN\tNNP\tNumber=Sing\t0\troot\t0:root\tSpaceAfter=No\n",
+ "invalid literal for int() with base 10: '_' ['8.1', 'reported', 'report', 'VERB', 'VBN', 'Tense=Past|VerbForm=Part|Voice=Pass', '_', '_', '5:conj:and', 'CopyOf=5']\n",
+ "invalid literal for int() with base 10: '_' ['22.1', 'used', 'use', 'VERB', 'VBN', 'Tense=Past|VerbForm=Part', '_', '_', '13:advcl:with|17:conj:and', 'CopyOf=17']\n",
+ "invalid literal for int() with base 10: '_' ['22.1', 'used', 'use', 'VERB', 'VBN', 'Tense=Past|VerbForm=Part', '_', '_', '13:advcl:with|17:conj:and', 'CopyOf=17']\n",
+ "invalid literal for int() with base 10: '_' ['11.1', 'called', 'call', 'VERB', 'VBN', 'Tense=Past|VerbForm=Part|Voice=Pass', '_', '_', '3:conj:and', 'CopyOf=3']\n",
+ "invalid literal for int() with base 10: '_' ['14.1', 'is', 'be', 'VERB', 'VBZ', '_', '_', '_', '1:conj:and', 'CopyOf=1']\n",
+ "invalid literal for int() with base 10: '_' ['20.1', 'reflect', 'reflect', 'VERB', 'VBP', 'Mood=Ind|Tense=Pres|VerbForm=Fin', '_', '_', '7:acl:relcl|9:conj', 'CopyOf=9']\n",
+ "invalid literal for int() with base 10: '_' ['21.1', 'recruited', 'recruit', 'VERB', 'VBD', 'Mood=Ind|Tense=Past|VerbForm=Fin', '_', '_', '9:conj:and', 'CopyOf=9']\n",
+ "invalid literal for int() with base 10: '_' ['9.1', 'wish', 'wish', 'VERB', 'VBP', 'Mood=Ind|Tense=Pres|VerbForm=Fin', '_', '_', '2:conj:and', 'CopyOf=2']\n",
+ "invalid literal for int() with base 10: '_' ['38.1', 'supplied', 'supply', 'VERB', 'VBN', 'Tense=Past|VerbForm=Part|Voice=Pass', '_', '_', '16:conj:and', 'CopyOf=16']\n",
+ "invalid literal for int() with base 10: '_' ['18.1', 'keep', 'keep', 'VERB', 'VB', 'Mood=Imp|VerbForm=Fin', '_', '_', '14:conj:and', 'CopyOf=14']\n",
+ "invalid literal for int() with base 10: '_' ['21.1', 'keep', 'keep', 'VERB', 'VB', 'Mood=Imp|VerbForm=Fin', '_', '_', '14:conj:and', 'CopyOf=14']\n",
+ "invalid literal for int() with base 10: '_' ['18.1', 'mean', 'mean', 'VERB', 'VB', 'VerbForm=Inf', '_', '_', '8:conj', 'CopyOf=8']\n",
+ "invalid literal for int() with base 10: '_' ['30.1', 'play', 'play', 'VERB', 'VBP', 'Mood=Ind|Tense=Pres|VerbForm=Fin', '_', '_', '18:acl:relcl|27:conj:but', 'CopyOf=27']\n",
+ "invalid literal for int() with base 10: '_' ['22.1', 'have', 'have', 'VERB', 'VBP', 'Mood=Ind|Tense=Pres|VerbForm=Fin', '_', '_', '17:conj', 'CopyOf=17']\n",
+ "invalid literal for int() with base 10: '_' ['27.1', 'have', 'have', 'VERB', 'VBP', 'Mood=Ind|Tense=Pres|VerbForm=Fin', '_', '_', '17:conj', 'CopyOf=17']\n",
+ "invalid literal for int() with base 10: '_' ['49.1', 'helped', 'help', 'VERB', 'VBD', '_', '_', '_', '38:conj:but', 'CopyOf=38']\n",
+ "invalid literal for int() with base 10: '_' ['7.1', 'found', 'find', 'VERB', 'VBD', 'Mood=Ind|Tense=Past|VerbForm=Fin', '_', '_', '3:conj', 'CopyOf=3']\n",
+ "invalid literal for int() with base 10: '_' ['10.1', 'excited', 'excited', 'ADJ', 'JJ', 'Degree=Pos', '_', '_', '4:advcl', 'CopyOf=4']\n",
+ "invalid literal for int() with base 10: '_' ['15.1', \"'s\", 'be', 'VERB', 'VBZ', '_', '_', '_', '2:conj:and', 'CopyOf=2']\n",
+ "invalid literal for int() with base 10: '_' ['25.1', 'took', 'take', 'VERB', 'VBD', 'Mood=Ind|Tense=Past|VerbForm=Fin', '_', '_', '17:conj:and', 'CopyOf=17']\n",
+ "invalid literal for int() with base 10: '_' ['10.1', 'loss', 'lose', 'VERB', 'VBD', 'Mood=Ind|Tense=Past|VerbForm=Fin', '_', '_', '3:conj:and', 'CopyOf=3']\n",
+ "invalid literal for int() with base 10: '_' ['11.1', 'leave', 'leave', 'VERB', 'VB', 'VerbForm=Inf', '_', '_', '7:parataxis', 'CopyOf=7']\n",
+ "invalid literal for int() with base 10: '_' ['24.1', 'charge', 'charge', 'VERB', 'VBP', 'Mood=Ind|Tense=Pres|VerbForm=Fin', '_', '_', '16:conj:and', 'CopyOf=16']\n"
+ ]
+ }
+ ],
+ "source": [
+ "with open('en_ewt-ud-train.conllu') as fopen:\n",
+ " train = fopen.read().split('\\n')\n",
+ "\n",
+ "sentences_train, words_train, depends_train, labels_train, _, _, segments_train, masks_train = process_corpus(train)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 10,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "(12000, 3824)"
+ ]
+ },
+ "execution_count": 10,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "len(sentences_train), len(sentences_test)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 11,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "idx2tag = {v:k for k, v in tag2idx.items()}"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 12,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "train_X = words_train\n",
+ "train_Y = labels_train\n",
+ "train_depends = depends_train\n",
+ "\n",
+ "test_X = words_test\n",
+ "test_Y = labels_test\n",
+ "test_depends = depends_test"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 13,
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stderr",
+ "output_type": "stream",
+ "text": [
+ "/home/husein/.local/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:516: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n",
+ " _np_qint8 = np.dtype([(\"qint8\", np.int8, 1)])\n",
+ "/home/husein/.local/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:517: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n",
+ " _np_quint8 = np.dtype([(\"quint8\", np.uint8, 1)])\n",
+ "/home/husein/.local/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:518: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n",
+ " _np_qint16 = np.dtype([(\"qint16\", np.int16, 1)])\n",
+ "/home/husein/.local/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:519: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n",
+ " _np_quint16 = np.dtype([(\"quint16\", np.uint16, 1)])\n",
+ "/home/husein/.local/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:520: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n",
+ " _np_qint32 = np.dtype([(\"qint32\", np.int32, 1)])\n",
+ "/home/husein/.local/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:525: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n",
+ " np_resource = np.dtype([(\"resource\", np.ubyte, 1)])\n",
+ "/usr/lib/python3/dist-packages/requests/__init__.py:80: RequestsDependencyWarning: urllib3 (1.25.6) or chardet (3.0.4) doesn't match a supported version!\n",
+ " RequestsDependencyWarning)\n"
+ ]
+ },
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "WARNING:tensorflow:From /home/husein/testing/model_utils.py:295: The name tf.train.Optimizer is deprecated. Please use tf.compat.v1.train.Optimizer instead.\n",
+ "\n",
+ "WARNING:tensorflow:From /home/husein/testing/xlnet.py:70: The name tf.gfile.Open is deprecated. Please use tf.io.gfile.GFile instead.\n",
+ "\n"
+ ]
+ },
+ {
+ "name": "stderr",
+ "output_type": "stream",
+ "text": [
+ "/home/husein/.local/lib/python3.6/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:541: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n",
+ " _np_qint8 = np.dtype([(\"qint8\", np.int8, 1)])\n",
+ "/home/husein/.local/lib/python3.6/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:542: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n",
+ " _np_quint8 = np.dtype([(\"quint8\", np.uint8, 1)])\n",
+ "/home/husein/.local/lib/python3.6/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:543: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n",
+ " _np_qint16 = np.dtype([(\"qint16\", np.int16, 1)])\n",
+ "/home/husein/.local/lib/python3.6/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:544: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n",
+ " _np_quint16 = np.dtype([(\"quint16\", np.uint16, 1)])\n",
+ "/home/husein/.local/lib/python3.6/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:545: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n",
+ " _np_qint32 = np.dtype([(\"qint32\", np.int32, 1)])\n",
+ "/home/husein/.local/lib/python3.6/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:550: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n",
+ " np_resource = np.dtype([(\"resource\", np.ubyte, 1)])\n"
+ ]
+ }
+ ],
+ "source": [
+ "import xlnet\n",
+ "import model_utils\n",
+ "import tensorflow as tf\n",
+ "import numpy as np\n",
+ "\n",
+ "kwargs = dict(\n",
+ " is_training=True,\n",
+ " use_tpu=False,\n",
+ " use_bfloat16=False,\n",
+ " dropout=0.1,\n",
+ " dropatt=0.1,\n",
+ " init='normal',\n",
+ " init_range=0.1,\n",
+ " init_std=0.05,\n",
+ " clamp_len=-1)\n",
+ "\n",
+ "xlnet_parameters = xlnet.RunConfig(**kwargs)\n",
+ "xlnet_config = xlnet.XLNetConfig(json_path='xlnet_cased_L-12_H-768_A-12/xlnet_config.json')"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 14,
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "5625 562\n"
+ ]
+ }
+ ],
+ "source": [
+ "epoch = 15\n",
+ "batch_size = 32\n",
+ "warmup_proportion = 0.1\n",
+ "num_train_steps = int(len(train_X) / batch_size * epoch)\n",
+ "num_warmup_steps = int(num_train_steps * warmup_proportion)\n",
+ "print(num_train_steps, num_warmup_steps)\n",
+ "\n",
+ "training_parameters = dict(\n",
+ " decay_method = 'poly',\n",
+ " train_steps = num_train_steps,\n",
+ " learning_rate = 2e-5,\n",
+ " warmup_steps = num_warmup_steps,\n",
+ " min_lr_ratio = 0.0,\n",
+ " weight_decay = 0.00,\n",
+ " adam_epsilon = 1e-8,\n",
+ " num_core_per_host = 1,\n",
+ " lr_layer_decay_rate = 1,\n",
+ " use_tpu=False,\n",
+ " use_bfloat16=False,\n",
+ " dropout=0.0,\n",
+ " dropatt=0.0,\n",
+ " init='normal',\n",
+ " init_range=0.1,\n",
+ " init_std=0.02,\n",
+ " clip = 1.0,\n",
+ " clamp_len=-1,)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 15,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "class Parameter:\n",
+ " def __init__(self, decay_method, warmup_steps, weight_decay, adam_epsilon, \n",
+ " num_core_per_host, lr_layer_decay_rate, use_tpu, learning_rate, train_steps,\n",
+ " min_lr_ratio, clip, **kwargs):\n",
+ " self.decay_method = decay_method\n",
+ " self.warmup_steps = warmup_steps\n",
+ " self.weight_decay = weight_decay\n",
+ " self.adam_epsilon = adam_epsilon\n",
+ " self.num_core_per_host = num_core_per_host\n",
+ " self.lr_layer_decay_rate = lr_layer_decay_rate\n",
+ " self.use_tpu = use_tpu\n",
+ " self.learning_rate = learning_rate\n",
+ " self.train_steps = train_steps\n",
+ " self.min_lr_ratio = min_lr_ratio\n",
+ " self.clip = clip\n",
+ " \n",
+ "training_parameters = Parameter(**training_parameters)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 16,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "class BiAAttention:\n",
+ " def __init__(self, input_size_encoder, input_size_decoder, num_labels):\n",
+ " self.input_size_encoder = input_size_encoder\n",
+ " self.input_size_decoder = input_size_decoder\n",
+ " self.num_labels = num_labels\n",
+ " \n",
+ " self.W_d = tf.get_variable(\"W_d\", shape=[self.num_labels, self.input_size_decoder],\n",
+ " initializer=tf.contrib.layers.xavier_initializer())\n",
+ " self.W_e = tf.get_variable(\"W_e\", shape=[self.num_labels, self.input_size_encoder],\n",
+ " initializer=tf.contrib.layers.xavier_initializer())\n",
+ " self.U = tf.get_variable(\"U\", shape=[self.num_labels, self.input_size_decoder, self.input_size_encoder],\n",
+ " initializer=tf.contrib.layers.xavier_initializer())\n",
+ " \n",
+ " def forward(self, input_d, input_e, mask_d=None, mask_e=None):\n",
+ " batch = tf.shape(input_d)[0]\n",
+ " length_decoder = tf.shape(input_d)[1]\n",
+ " length_encoder = tf.shape(input_e)[1]\n",
+ " out_d = tf.expand_dims(tf.matmul(self.W_d, tf.transpose(input_d, [0, 2, 1])), 3)\n",
+ " out_e = tf.expand_dims(tf.matmul(self.W_e, tf.transpose(input_e, [0, 2, 1])), 2)\n",
+ " output = tf.matmul(tf.expand_dims(input_d, 1), self.U)\n",
+ " output = tf.matmul(output, tf.transpose(tf.expand_dims(input_e, 1), [0, 1, 3, 2]))\n",
+ " \n",
+ " output = output + out_d + out_e\n",
+ " \n",
+ " if mask_d is not None:\n",
+ " d = tf.expand_dims(tf.expand_dims(mask_d, 1), 3)\n",
+ " e = tf.expand_dims(tf.expand_dims(mask_e, 1), 2)\n",
+ " output = output * d * e\n",
+ " \n",
+ " return output\n",
+ " \n",
+ "class BiLinear:\n",
+ " def __init__(self, left_features, right_features, out_features):\n",
+ " self.left_features = left_features\n",
+ " self.right_features = right_features\n",
+ " self.out_features = out_features\n",
+ " \n",
+ " self.U = tf.get_variable(\"U-bi\", shape=[out_features, left_features, right_features],\n",
+ " initializer=tf.contrib.layers.xavier_initializer())\n",
+ " self.W_l = tf.get_variable(\"Wl\", shape=[out_features, left_features],\n",
+ " initializer=tf.contrib.layers.xavier_initializer())\n",
+ " self.W_r = tf.get_variable(\"Wr\", shape=[out_features, right_features],\n",
+ " initializer=tf.contrib.layers.xavier_initializer())\n",
+ " \n",
+ " def forward(self, input_left, input_right):\n",
+ " left_size = tf.shape(input_left)\n",
+ " output_shape = tf.concat([left_size[:-1], [self.out_features]], axis = 0)\n",
+ " batch = tf.cast(tf.reduce_prod(left_size[:-1]), tf.int32)\n",
+ " input_left = tf.reshape(input_left, (batch, self.left_features))\n",
+ " input_right = tf.reshape(input_right, (batch, self.right_features))\n",
+ " tiled = tf.tile(tf.expand_dims(input_left, axis = 0), (self.out_features,1,1))\n",
+ " output = tf.transpose(tf.reduce_sum(tf.matmul(tiled, self.U), axis = 2))\n",
+ " output = output + tf.matmul(input_left, tf.transpose(self.W_l))\\\n",
+ " + tf.matmul(input_right, tf.transpose(self.W_r))\n",
+ " \n",
+ " return tf.reshape(output, output_shape)\n",
+ "\n",
+ "class Attention:\n",
+ " def __init__(self, word_dim, num_words, char_dim, num_chars, num_filters, kernel_size,\n",
+ " hidden_size, encoder_layers, num_labels, arc_space, type_space):\n",
+ " \n",
+ " def cells(size, reuse=False):\n",
+ " return tf.nn.rnn_cell.LSTMCell(size,\n",
+ " initializer=tf.orthogonal_initializer(),reuse=reuse)\n",
+ " \n",
+ " self.word_embedd = tf.Variable(tf.random_uniform([num_words, word_dim], -1, 1))\n",
+ " self.char_embedd = tf.Variable(tf.random_uniform([num_chars, char_dim], -1, 1))\n",
+ " self.conv1d = tf.layers.Conv1D(num_filters, kernel_size, 1, padding='VALID')\n",
+ " self.num_labels = num_labels\n",
+ " self.encoder = tf.nn.rnn_cell.MultiRNNCell([cells(hidden_size) for _ in range(encoder_layers)])\n",
+ "\n",
+ " \n",
+ " \n",
+ " def encode(self, input_word, input_char):\n",
+ " word = tf.nn.embedding_lookup(self.word_embedd, input_word)\n",
+ " char = tf.nn.embedding_lookup(self.char_embedd, input_char)\n",
+ " b = tf.shape(char)[0]\n",
+ " wl = tf.shape(char)[1]\n",
+ " cl = tf.shape(char)[2]\n",
+ " d = char.shape[3]\n",
+ " char = tf.reshape(char, [b * wl, cl, d])\n",
+ " char = tf.reduce_max(self.conv1d(char), axis = 1)\n",
+ " char = tf.nn.tanh(char)\n",
+ " d = char.shape[-1]\n",
+ " char = tf.reshape(char, [b, wl, d])\n",
+ " \n",
+ " src_encoding = tf.concat([word, char], axis=2)\n",
+ " output, hn = tf.nn.dynamic_rnn(self.encoder, src_encoding, dtype = tf.float32,\n",
+ " scope = 'encoder')\n",
+ " arc_h = tf.nn.elu(self.arc_h(output))\n",
+ " arc_c = tf.nn.elu(self.arc_c(output))\n",
+ " \n",
+ " type_h = tf.nn.elu(self.type_h(output))\n",
+ " type_c = tf.nn.elu(self.type_c(output))\n",
+ " \n",
+ " return (arc_h, arc_c), (type_h, type_c), hn\n",
+ " \n",
+ " def forward(self, input_word, input_char, mask):\n",
+ " arcs, types, _ = self.encode(input_word, input_char)\n",
+ " \n",
+ " out_arc = tf.squeeze(self.attention.forward(arcs[0], arcs[1], mask_d=mask, mask_e=mask), axis = 1)\n",
+ " return out_arc, types, mask\n",
+ " \n",
+ " def loss(self, input_word, input_char, mask, heads, types):\n",
+ " out_arc, out_type, _ = self.forward(input_word, input_char, mask)\n",
+ " type_h, type_c = out_type\n",
+ " batch = tf.shape(out_arc)[0]\n",
+ " max_len = tf.shape(out_arc)[1]\n",
+ " batch_index = tf.range(0, batch)\n",
+ " t = tf.transpose(heads)\n",
+ " broadcasted = tf.broadcast_to(batch_index, tf.shape(t))\n",
+ " concatenated = tf.transpose(tf.concat([tf.expand_dims(broadcasted, axis = 0), \n",
+ " tf.expand_dims(t, axis = 0)], axis = 0))\n",
+ " type_h = tf.gather_nd(type_h, concatenated)\n",
+ " out_type = self.bilinear.forward(type_h, type_c)\n",
+ " minus_inf = -1e8\n",
+ " minus_mask = (1 - mask) * minus_inf\n",
+ " out_arc = out_arc + tf.expand_dims(minus_mask, axis = 2) + tf.expand_dims(minus_mask, axis = 1)\n",
+ " loss_arc = tf.nn.log_softmax(out_arc, dim=1)\n",
+ " loss_type = tf.nn.log_softmax(out_type, dim=2)\n",
+ " loss_arc = loss_arc * tf.expand_dims(mask, axis = 2) * tf.expand_dims(mask, axis = 1)\n",
+ " loss_type = loss_type * tf.expand_dims(mask, axis = 2)\n",
+ " num = tf.reduce_sum(mask) - tf.cast(batch, tf.float32)\n",
+ " child_index = tf.tile(tf.expand_dims(tf.range(0, max_len), 1), [1, batch])\n",
+ " t = tf.transpose(heads)\n",
+ " broadcasted = tf.broadcast_to(batch_index, tf.shape(t))\n",
+ " concatenated = tf.transpose(tf.concat([tf.expand_dims(broadcasted, axis = 0),\n",
+ " tf.expand_dims(t, axis = 0),\n",
+ " tf.expand_dims(child_index, axis = 0)], axis = 0))\n",
+ " loss_arc = tf.gather_nd(loss_arc, concatenated)\n",
+ " loss_arc = tf.transpose(loss_arc, [1, 0])\n",
+ " \n",
+ " t = tf.transpose(types)\n",
+ " broadcasted = tf.broadcast_to(batch_index, tf.shape(t))\n",
+ " concatenated = tf.transpose(tf.concat([tf.expand_dims(broadcasted, axis = 0),\n",
+ " tf.expand_dims(child_index, axis = 0),\n",
+ " tf.expand_dims(t, axis = 0)], axis = 0))\n",
+ " loss_type = tf.gather_nd(loss_type, concatenated)\n",
+ " loss_type = tf.transpose(loss_type, [1, 0])\n",
+ " return tf.reduce_sum(-loss_arc) / num, tf.reduce_sum(-loss_type) / num\n",
+ " \n",
+ " def decode(self, input_word, input_char, mask, leading_symbolic=0):\n",
+ " out_arc, out_type, _ = self.forward(input_word, input_char, mask)\n",
+ " batch = tf.shape(out_arc)[0]\n",
+ " max_len = tf.shape(out_arc)[1]\n",
+ " sec_max_len = tf.shape(out_arc)[2]\n",
+ " out_arc = out_arc + tf.linalg.diag(tf.fill([max_len], -np.inf))\n",
+ " minus_mask = tf.expand_dims(tf.cast(1 - mask, tf.bool), axis = 2)\n",
+ " minus_mask = tf.tile(minus_mask, [1, 1, sec_max_len])\n",
+ " out_arc = tf.where(minus_mask, tf.fill(tf.shape(out_arc), -np.inf), out_arc)\n",
+ " heads = tf.argmax(out_arc, axis = 1)\n",
+ " type_h, type_c = out_type\n",
+ " batch = tf.shape(type_h)[0]\n",
+ " max_len = tf.shape(type_h)[1]\n",
+ " batch_index = tf.range(0, batch)\n",
+ " t = tf.cast(tf.transpose(heads), tf.int32)\n",
+ " broadcasted = tf.broadcast_to(batch_index, tf.shape(t))\n",
+ " concatenated = tf.transpose(tf.concat([tf.expand_dims(broadcasted, axis = 0), \n",
+ " tf.expand_dims(t, axis = 0)], axis = 0))\n",
+ " type_h = tf.gather_nd(type_h, concatenated)\n",
+ " out_type = self.bilinear.forward(type_h, type_c)\n",
+ " out_type = out_type[:, :, leading_symbolic:]\n",
+ " types = tf.argmax(out_type, axis = 2)\n",
+ " return heads, types\n",
+ " \n",
+ "class Model:\n",
+ " def __init__(\n",
+ " self,\n",
+ " learning_rate,\n",
+ " hidden_size_word,\n",
+ " cov = 0.0):\n",
+ " \n",
+ " self.words = tf.placeholder(tf.int32, (None, None))\n",
+ " self.segment_ids = tf.placeholder(tf.int32, [None, None])\n",
+ " self.input_masks = tf.placeholder(tf.float32, [None, None])\n",
+ " self.heads = tf.placeholder(tf.int32, (None, None))\n",
+ " self.types = tf.placeholder(tf.int32, (None, None))\n",
+ " self.mask = tf.cast(tf.math.not_equal(self.words, 0), tf.float32)\n",
+ " self.maxlen = tf.shape(self.words)[1]\n",
+ " self.lengths = tf.count_nonzero(self.words, 1)\n",
+ " mask = self.mask\n",
+ " heads = self.heads\n",
+ " types = self.types\n",
+ " \n",
+ " self.arc_h = tf.layers.Dense(hidden_size_word)\n",
+ " self.arc_c = tf.layers.Dense(hidden_size_word)\n",
+ " self.attention = BiAAttention(hidden_size_word, hidden_size_word, 1)\n",
+ "\n",
+ " self.type_h = tf.layers.Dense(hidden_size_word)\n",
+ " self.type_c = tf.layers.Dense(hidden_size_word)\n",
+ " self.bilinear = BiLinear(hidden_size_word, hidden_size_word, len(tag2idx))\n",
+ " \n",
+ " xlnet_model = xlnet.XLNetModel(\n",
+ " xlnet_config=xlnet_config,\n",
+ " run_config=xlnet_parameters,\n",
+ " input_ids=tf.transpose(self.words, [1, 0]),\n",
+ " seg_ids=tf.transpose(self.segment_ids, [1, 0]),\n",
+ " input_mask=tf.transpose(self.input_masks, [1, 0]))\n",
+ " output_layer = xlnet_model.get_sequence_output()\n",
+ " output_layer = tf.transpose(output_layer, [1, 0, 2])\n",
+ " \n",
+ " arc_h = tf.nn.elu(self.arc_h(output_layer))\n",
+ " arc_c = tf.nn.elu(self.arc_c(output_layer))\n",
+ " \n",
+ " type_h = tf.nn.elu(self.type_h(output_layer))\n",
+ " type_c = tf.nn.elu(self.type_c(output_layer))\n",
+ " \n",
+ " out_arc = tf.squeeze(self.attention.forward(arc_h, arc_h, mask_d=self.mask, \n",
+ " mask_e=self.mask), axis = 1)\n",
+ " \n",
+ " batch = tf.shape(out_arc)[0]\n",
+ " max_len = tf.shape(out_arc)[1]\n",
+ " sec_max_len = tf.shape(out_arc)[2]\n",
+ " batch_index = tf.range(0, batch)\n",
+ " \n",
+ " decode_arc = out_arc + tf.linalg.diag(tf.fill([max_len], -np.inf))\n",
+ " minus_mask = tf.expand_dims(tf.cast(1 - mask, tf.bool), axis = 2)\n",
+ " minus_mask = tf.tile(minus_mask, [1, 1, sec_max_len])\n",
+ " decode_arc = tf.where(minus_mask, tf.fill(tf.shape(decode_arc), -np.inf), decode_arc)\n",
+ " self.heads_seq = tf.argmax(decode_arc, axis = 1)\n",
+ " \n",
+ " t = tf.cast(tf.transpose(self.heads_seq), tf.int32)\n",
+ " broadcasted = tf.broadcast_to(batch_index, tf.shape(t))\n",
+ " concatenated = tf.transpose(tf.concat([tf.expand_dims(broadcasted, axis = 0), \n",
+ " tf.expand_dims(t, axis = 0)], axis = 0))\n",
+ " type_h = tf.gather_nd(type_h, concatenated)\n",
+ " out_type = self.bilinear.forward(type_h, type_c)\n",
+ " self.tags_seq = tf.argmax(out_type, axis = 2)\n",
+ " \n",
+ " batch = tf.shape(out_arc)[0]\n",
+ " max_len = tf.shape(out_arc)[1]\n",
+ " batch_index = tf.range(0, batch)\n",
+ " t = tf.transpose(heads)\n",
+ " broadcasted = tf.broadcast_to(batch_index, tf.shape(t))\n",
+ " concatenated = tf.transpose(tf.concat([tf.expand_dims(broadcasted, axis = 0), \n",
+ " tf.expand_dims(t, axis = 0)], axis = 0))\n",
+ " type_h = tf.gather_nd(type_h, concatenated)\n",
+ " out_type = self.bilinear.forward(type_h, type_c)\n",
+ " minus_inf = -1e8\n",
+ " minus_mask = (1 - mask) * minus_inf\n",
+ " out_arc = out_arc + tf.expand_dims(minus_mask, axis = 2) + tf.expand_dims(minus_mask, axis = 1)\n",
+ " loss_arc = tf.nn.log_softmax(out_arc, dim=1)\n",
+ " loss_type = tf.nn.log_softmax(out_type, dim=2)\n",
+ " loss_arc = loss_arc * tf.expand_dims(mask, axis = 2) * tf.expand_dims(mask, axis = 1)\n",
+ " loss_type = loss_type * tf.expand_dims(mask, axis = 2)\n",
+ " num = tf.reduce_sum(mask) - tf.cast(batch, tf.float32)\n",
+ " child_index = tf.tile(tf.expand_dims(tf.range(0, max_len), 1), [1, batch])\n",
+ " t = tf.transpose(heads)\n",
+ " broadcasted = tf.broadcast_to(batch_index, tf.shape(t))\n",
+ " concatenated = tf.transpose(tf.concat([tf.expand_dims(broadcasted, axis = 0),\n",
+ " tf.expand_dims(t, axis = 0),\n",
+ " tf.expand_dims(child_index, axis = 0)], axis = 0))\n",
+ " loss_arc = tf.gather_nd(loss_arc, concatenated)\n",
+ " loss_arc = tf.transpose(loss_arc, [1, 0])\n",
+ " \n",
+ " t = tf.transpose(types)\n",
+ " broadcasted = tf.broadcast_to(batch_index, tf.shape(t))\n",
+ " concatenated = tf.transpose(tf.concat([tf.expand_dims(broadcasted, axis = 0),\n",
+ " tf.expand_dims(child_index, axis = 0),\n",
+ " tf.expand_dims(t, axis = 0)], axis = 0))\n",
+ " loss_type = tf.gather_nd(loss_type, concatenated)\n",
+ " loss_type = tf.transpose(loss_type, [1, 0])\n",
+ " self.cost = (tf.reduce_sum(-loss_arc) / num) + (tf.reduce_sum(-loss_type) / num)\n",
+ " self.optimizer = tf.train.AdamOptimizer(\n",
+ " learning_rate = learning_rate\n",
+ " ).minimize(self.cost)\n",
+ " \n",
+ " mask = tf.sequence_mask(self.lengths, maxlen = self.maxlen)\n",
+ " \n",
+ " self.prediction = tf.boolean_mask(self.tags_seq, mask)\n",
+ " mask_label = tf.boolean_mask(self.types, mask)\n",
+ " correct_pred = tf.equal(tf.cast(self.prediction, tf.int32), mask_label)\n",
+ " correct_index = tf.cast(correct_pred, tf.float32)\n",
+ " self.accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32))\n",
+ " \n",
+ " self.prediction = tf.cast(tf.boolean_mask(self.heads_seq, mask), tf.int32)\n",
+ " mask_label = tf.boolean_mask(self.heads, mask)\n",
+ " correct_pred = tf.equal(self.prediction, mask_label)\n",
+ " correct_index = tf.cast(correct_pred, tf.float32)\n",
+ " self.accuracy_depends = tf.reduce_mean(tf.cast(correct_pred, tf.float32))"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 17,
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/tensorflow/python/util/deprecation.py:507: calling count_nonzero (from tensorflow.python.ops.math_ops) with axis is deprecated and will be removed in a future version.\n",
+ "Instructions for updating:\n",
+ "reduction_indices is deprecated, use axis instead\n",
+ "WARNING:tensorflow:\n",
+ "The TensorFlow contrib module will not be included in TensorFlow 2.0.\n",
+ "For more information, please see:\n",
+ " * https://github.com/tensorflow/community/blob/master/rfcs/20180907-contrib-sunset.md\n",
+ " * https://github.com/tensorflow/addons\n",
+ " * https://github.com/tensorflow/io (for I/O related ops)\n",
+ "If you depend on functionality not listed there, please file an issue.\n",
+ "\n",
+ "WARNING:tensorflow:From /home/husein/testing/xlnet.py:253: The name tf.variable_scope is deprecated. Please use tf.compat.v1.variable_scope instead.\n",
+ "\n",
+ "WARNING:tensorflow:From /home/husein/testing/xlnet.py:253: The name tf.AUTO_REUSE is deprecated. Please use tf.compat.v1.AUTO_REUSE instead.\n",
+ "\n",
+ "WARNING:tensorflow:From /home/husein/testing/modeling.py:686: The name tf.logging.info is deprecated. Please use tf.compat.v1.logging.info instead.\n",
+ "\n",
+ "INFO:tensorflow:memory input None\n",
+ "INFO:tensorflow:Use float type \n",
+ "WARNING:tensorflow:From /home/husein/testing/modeling.py:693: The name tf.get_variable is deprecated. Please use tf.compat.v1.get_variable instead.\n",
+ "\n",
+ "WARNING:tensorflow:From /home/husein/testing/modeling.py:797: dropout (from tensorflow.python.layers.core) is deprecated and will be removed in a future version.\n",
+ "Instructions for updating:\n",
+ "Use keras.layers.dropout instead.\n",
+ "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n",
+ "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n",
+ "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n",
+ "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n",
+ "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n",
+ "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n",
+ "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n",
+ "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n",
+ "WARNING:tensorflow:From /home/husein/testing/modeling.py:99: dense (from tensorflow.python.layers.core) is deprecated and will be removed in a future version.\n",
+ "Instructions for updating:\n",
+ "Use keras.layers.dense instead.\n",
+ "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n",
+ "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n",
+ "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n",
+ "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n",
+ "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n",
+ "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting